 |
 |
 |
 |
 |
 |
 |
|
|
| |
Generic Visual Categorizer
Automatically classify images based on visual content.
Technology Description
Generic visual categorization (GVC) assigns one or multiple labels to an image
based on its semantic content. "Generic" highlights the goal of classifying a
wide range of objects and scenes.
Xerox has developed a technique that is sufficiently generic to work with
several object types simultaneously and which can be readily extended to new
object types. It can also handle variations in view, imaging, lighting and
occlusion (partial visibility), typical of the real world, as well as the
intra-class variations typical of semantic classes of everyday objects (e.g.
size, shape, form, color).
Xerox’s breakthrough GVC technology is the result of combining Xerox
scientists' expertise in image processing, computer vision and machine learning.
The Xerox visual categorisation method first extracts and describes patches
found in an image. It then maps patch descriptors to "visual vocabularies"
which are sets of predetermined clusters of patches called "visual words".
Visual vocabularies are learned automatically from training sets and provide an
intermediate representation (hidden layer) bridging the semantic gap between
the low-level features extracted from an image and the high-level concepts to
be categorized.
Since one universal vocabulary made of the most frequent visual words across
all the considered classes is not sufficient, Xerox borrowed a technique from
speech recognition known as "vocabulary adaptation" to derive class-specific
vocabularies from the universal vocabulary.
For each class, an image is characterized by a histogram of visual word
occurences, which determines whether the image content is best modeled by the
universal vocabulary or the corresponding class vocabulary.
Performance
Advantages of the Xerox method include simplicity, computational efficiency,
scalability, robustness to variations, and applicability to all types of
classes and objects. Presently, GVC has been trained for about 100 categories.
Rigorous tests involving more than 30 simultaneous categories have demonstrated
state-of-the-art categorization performance:
- Classification run-time of 0.2 to 0.5 sec per image (depending on processor
performance)
- ~0.1 msec computational increment per added class
- Equal error rate ranging from 2% to 10% (depending on class, independent of
number of classes)
Applications
- Automatic tagging of images (e.g., images in documents, photographic
archives, consumer photo albums, online shopping catalogues)
- Content-based image retrieval
Generic Visual Categorizer Training Tool
Train your own Visual Categorizer from a collection of tagged images
Presently, GVC has been trained for about 100 categories. To meet customers'
needs beyond the current coverage, Xerox has developed a beta version GVC
Training Tool with a graphical user interface which is simple enough to be used
by any holder of tagged images, while sufficiently sophisticated to give
performance feedback and offer an iterative training process.
Tagging of the Training Material
The training material is a collection of tagged images. There is no need to
associate tags with corresponding regions in the image (no segmentation). The
collection should be representative of what the visual categorizer is expected
to recognize at run time. It should also be diverse. An image in the training
set can be labelled with multiple tags. Training iterations are likely to
detect the wrong and the missing labels, thus giving the user a chance to
improve performance.
Training Iteration Settings
Starting from such a tagged collection, the user can define the following
settings:
- Subset: models can be trained for a subset of all categories
- Class aggregation: classes can be aggregated at training time. For example,
if the training dataset includes distinct labels for "cats" and "dogs", the
user may however prefer to combine them as "pets"
- Class-to-class neutrality: when two categories are semantically close to
each other (e.g. Forest / Trees), the user can instruct the Training Tool to
neutralize some tags during training. By default, images tagged anything else
than "Forest" are negative examples of the "Forest" class. It is wise to handle
"Trees" as neutral examples vis a vis the "Forest" category
Training Process
A progress bar helps visualizing the process steps. Some intermediate results
are saved for later reuse. Training time is dependent on the computer hardware
(processor, memory, hard disk). Training 40 categories from 30,000 labelled
pictures is completed in a few hours on a PentiumŪ 4 or Athlon™ 64 processor
computer.
Interactive Performance Feedback
The Training Tool can measure the performance of the models just trained, by
using the technique of N-fold cross-validation, where N is set by the user.
This randomly splits the training set in N subsets of equal size, and
iteratively subtracts one subset from the training dataset and uses it as test
data. The performance can be visualized in several ways incl. precision,
recall, F1, accuracy, confusion matrix.
Performance Measure (percision & recall)
The user can easily correct gaps in the tagging. Color coding indicates how
each image was tagged, while the scores result from the models just trained.
Ranking by scores and looking at the highest and lowest ones help to identify
mismatches in the tagging, which can be easily corrected.
Assistance in Tagging
Based on the performance measure the user can decide on appropriate next steps,
which may include:
- Remove, aggregate or redefine certain categories
- Add, remove or edit some tags or images
- Neutralize certain tags vis a vis certain categories
- Launch a new training iteration
- Import the trained models into the GVC run-time
Intellectual Property Summary
Xerox Intellectual Property includes patents, patent applications, and know-how.
For Licensing Information
To learn more about licensing the Generic Visual Categorizer technology.
|