2009年5月6日

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

"Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary," Duygulu, ECCV, 2002

In order to learn a lexicon for a fixed image vocabulary, there are three major object recognition problem which should be discussed :
1. What counts as an object? All words.
2. Which objects are easy to recognize? The words attached to image regions.
3. Which objects are indistinguishable using the features? The words with little difference between their posterior probability.

They construct a recognition model using EM method. First, segment images into regions. Second, use the k-means to vector quantize the region representation. The label associated with each region are called “blob”. Finally, run EM algorithm for finding the correspondence between blobs and words.

After obtaining the probability table, it is possible to choose the words with the highest probability given the blob and annotate the corresponding region with this word. Establish a threshold for reducing the incredible prediction. Also, it is more effective to improve performance by clustering the indistinguishable words.

0 comments: