2009年3月3日

Image Retrieval: Ideas, Influences, and Trends of the New Age

Title: Image Retrieval: Ideas, Influences, and Trends of the New Age
Authors: RITENDRA DATTA, DHIRAJ JOSHI, JIA LI, and JAMES Z. WANG
Year of Publication: 2008
Publisher: ACM

Content-based image retrieval is an technology helps to organize digital picture archives by their visual content. There are two gaps, sensory gap and semantic gap, which define and motivate most of the related problems. The sensory gap is a gap between real object and the descriptive information, and the semantic gap is about the lack of coincidence between the information from the visual data and the interpretation from a user. Therefore, how to solve the gaps and satisfy users is the goal of CBIR.

The first important thing is to clarify user-system interaction. A user perspective involves what the user wants and what is the form used in query, and a system perspective is about how to interact with user. Simply, human-center based system is required for different kinds of user intent with several query types, including keywords, free-text, image, graphics, and composite. For example, it is possible for a composite query method to provide a system involving gestures and speech for querying, or help user refine the queries by hints. If a system can collect manual tags for pictures, not only facilitating text-based querying, but also building reliable training datasets. Moreover, how to design a retrieval system on portable devices which have many constraints, such as limited size and color depth of display, is one of the issues of visualization.

The two core problems of CBIR are (1) how to define a mathematical description or a signature of an image, (2) how to decide the similarity between a pair of images.

For region-based visual signatures, the first step is image segmentation. With k-means clustering or normalized cut criteria method, segmentation helps image understanding and extracts several types of features. A feature capture a specific property of an image, either globally for the entire image with higher speed for computation, or locally for a small group of pixels with more specific identification of important visual characteristics. Color features are usually summarized into histogram. Texture features are used to capture granularity and repetitive patterns. Shape is a key to specify regions. Spatial modeling and matching are regarding to local image entities. Interest points that can deal with significant affine transformation and illumination changes are based on local invariants. When constructing signatures from features, histograms is easy but tend to be sparse in multidimensional space. A region-based signature allow representative vector to adapt images and the region of color and texture is likely corresponding to an object in an image.

There are three types of signatures, feature vector, summary of local feature vectors , and region-based signature. Each of them has different appropriate similarity measures. Using the geodesic distances for a single vector may be better. Summaries of local feature vectors such as codebook and probability density functions are generated by vector quantization and KL distance separately. The region-based signature can form a histogram, and calculate the similarity from the pair-wise distances between individual vectors. More matching methods improve the basic idea from region weights, speed, or segmentation.

Due to faster retrieval, clustering and classification is practical and useful. Classification is treated as a preprocessing step and improve accuracy but require prior training data. Clustering helps visualization and retrieval efficiency but may not representative enough or accurate for visualization. Besides, in order to capture user’s precise needs, relevance feedback system which does iterative feedback and refinement is designed. That is, relevance feedback let users give feedback after querying, and the system learns case-specific query semantics dynamically according to the feedback.

There are some offshoots of CBIR. First, automated annotation attempts at automated concept discovery; what is more is that deciding on an appropriate picture set for a given story. Second, ranking or similarity of images is usually sorted by size, color depth, or shape; however, aesthetics may be another higher-level basis which involves the feelings or emotions of people. Moreover, CBIR may concern with possible security attack or image copy protection.

Finally, evaluation benchmark of CBIR must some key points: coverage, unbiasedness, and user focus. Ideally, it should be subjective, context-specific, and community-based.

0 comments: