Generic Visual Categorizer
Automatically classify images based on visual content.
Technology DescriptionGeneric visual categorization (GVC) assigns one or multiple labels to an image based on its semantic content. "Generic" highlights the goal of classifying a wide range of objects and scenes.
Xerox has developed a technique that is sufficiently generic to work with several object types simultaneously and which can be readily extended to new object types. It can also handle variations in view, imaging, lighting and occlusion (partial visibility), typical of the real world, as well as the intra-class variations typical of semantic classes of everyday objects (e.g. size, shape, form, color).
Xerox’s breakthrough GVC technology is the result of combining Xerox scientists' expertise in image processing, computer vision and machine learning.
The Xerox visual categorisation method first extracts and describes patches found in an image. It then maps patch descriptors to "visual vocabularies" which are sets of predetermined clusters of patches called "visual words".
Visual vocabularies are learned automatically from training sets and provide an intermediate representation (hidden layer) bridging the semantic gap between the low-level features extracted from an image and the high-level concepts to be categorized.
Since one universal vocabulary made of the most frequent visual words across all the considered classes is not sufficient, Xerox borrowed a technique from speech recognition known as "vocabulary adaptation" to derive class-specific vocabularies from the universal vocabulary.
For each class, an image is characterized by a histogram of visual word occurences, which determines whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary.
PerformanceAdvantages of the Xerox method include simplicity, computational efficiency, scalability, robustness to variations, and applicability to all types of classes and objects. Presently, GVC has been trained for about 100 categories. Rigorous tests involving more than 30 simultaneous categories have demonstrated state-of-the-art categorization performance:
Generic Visual Categorizer Training ToolTrain your own Visual Categorizer from a collection of tagged images
Presently, GVC has been trained for about 100 categories. To meet customers' needs beyond the current coverage, Xerox has developed a beta version GVC Training Tool with a graphical user interface which is simple enough to be used by any holder of tagged images, while sufficiently sophisticated to give performance feedback and offer an iterative training process.
Tagging of the Training MaterialThe training material is a collection of tagged images. There is no need to associate tags with corresponding regions in the image (no segmentation). The collection should be representative of what the visual categorizer is expected to recognize at run time. It should also be diverse. An image in the training set can be labelled with multiple tags. Training iterations are likely to detect the wrong and the missing labels, thus giving the user a chance to improve performance.
Training Iteration SettingsStarting from such a tagged collection, the user can define the following settings:
Training ProcessA progress bar helps visualizing the process steps. Some intermediate results are saved for later reuse. Training time is dependent on the computer hardware (processor, memory, hard disk). Training 40 categories from 30,000 labelled pictures is completed in a few hours on a PentiumŪ 4 or Athlon™ 64 processor computer.
Interactive Performance FeedbackThe Training Tool can measure the performance of the models just trained, by using the technique of N-fold cross-validation, where N is set by the user. This randomly splits the training set in N subsets of equal size, and iteratively subtracts one subset from the training dataset and uses it as test data. The performance can be visualized in several ways incl. precision, recall, F1, accuracy, confusion matrix.
Performance Measure (percision & recall)The user can easily correct gaps in the tagging. Color coding indicates how each image was tagged, while the scores result from the models just trained. Ranking by scores and looking at the highest and lowest ones help to identify mismatches in the tagging, which can be easily corrected.
Assistance in TaggingBased on the performance measure the user can decide on appropriate next steps, which may include:
Intellectual Property SummaryXerox Intellectual Property includes patents, patent applications, and know-how.
For Licensing InformationTo learn more about licensing the Generic Visual Categorizer technology.