2009年5月27日

Lazy Snapping

"Lazy Snapping," Li, et. al., ACM SIGGRAPH, 2004

The paper presents a coarse-to-fine UI design for image cutout which is the technique of removing an object in an image from the background. There are two steps : (1) object marking (2) boundary editing.
In the first step, user can specify the foreground and background region respectively. Then, the uncertain region can be labeled by calculating the likelihood energy function. In order to get likelihood energy of each region, first cluster the colors in seeds F and B by the K-means method. And compute the minimum distance for each node. Moreover, add a penalty term to specify the gradient effect. Finally, minimize the energy and get the result. To improve the efficiency, apply watershed algorithm on the graph cut formulation to dealt with the segmented regions, and possibly get the reasonable result in a significantly improved speed.
The fine boundary can be adjusted in the second step. But the prior energy is fixed by adding the polygon locations as soft constraints.
In conclusion, they succeed in develop an interactive system to cut out the foreground object from the background in an image. However, they want to do better at the thin and branch structures.

2009年5月20日

Learning Low-Level Vision

"Learning Low-Level Vision," Freeman, IJCV, 2000.

The paper describes a learning-based method for low-level vision problems, such as motion analysis, inferring shape and reflectance from a photograph, or extrapolating image detail, that is, how to estimate scenes from images is the goal. Given training sets, they succeed in enumerating a coarse sampling of all input patch values by preprocessing or restricting to some classes. Breaking the scenes into a Markov network, the algorithm can find the optimal scene explanation if given any image data. It shows that applying machine learning methods has the benefits to problems of visual interpretations.

An introduction to graphical models

"An introduction to graphical models," Kevin Murphy, 2001.

Graphical models are the combination of probability theory and graph theory. They can be directed or undirected models. The directed graphical models are known as Bayesian networks, where the ancestor/parent relationship is with respect to some fixed topological ordering of the nodes, and the undirected graphical models are known as Markov networks. There are some hidden causes in the graph, and their values must be estimated from observation, that is inference. There are some popular approximate inference methods: sampling(Monte Carlo) methods, variational methods, and loopy brief propagation. According to the structure and the observation, Learning methods are simply classified into four category:

  observability
structure\ full partial
known closed form EM
unknown local search structural EM

Finally, computing the optimal actions to perform to get the maximum expected utility and making decisions under uncertainty. The decision algorithm is similar to inference algorithm.

2009年5月13日

Rapid object detection using a boosted cascade of simple features

"Rapid object detection using a boosted cascade of simple features," Paul Viola and Michael Jones, CVPR, 2001

The paper proposes a rapid object detection method and constructs a frontal face detection system to prove the idea. First, they brings the “integral image” calculation which lets feature evaluation faster. That is, do first pass to sum the region of pixels above and to the left of each location, and it is possible to get the sum of each rectangle in the image by using the values of four corners. Second, use Adaboost method to select important features as classifiers. Third, construct a cascade structure combining complex classifiers to increase the detection speed. In other words, simpler classifiers are used to reject most illegal instances, and then more complex classifiers are called to reduce false positive rates

In conclusion, the paper presents a really useful integral image method that speed up the following corresponding calculation, and the face system they constructs is the basis of face detection and cause more applications and improved techniques.

2009年5月7日

Names and Faces in the News Abstract

"Names and Faces in the News Abstract," Miller, CVPR, 2004

The paper presents a clustering method to deal with ambiguities in labeling and identify incorrectly labeled faces.

Before clustering, they perform kernel PCA to reduce the dimensionality and linear discriminant analysis to project data for discriminant analysis.

The clustering process goes as following:
1.Randomly assign each image to one of its extracted names.
2.For each distinct name (cluster), calculate the mean of image vectors assigned to that name.
3. Reassign each image to the closest mean of its extracted names.
4. Repeat 2-3 until convergence (i.e. no image changes names during an iteration)

Next, do cluster pruning for meaningless clusters according to likelihood. If there are some clusters which have similar compositions, merge them and get the final clusters. In the experiment, they success in cleaning up noisy unsupervised data and use entropy to evaluate the result.

2009年5月6日

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

"Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary," Duygulu, ECCV, 2002

In order to learn a lexicon for a fixed image vocabulary, there are three major object recognition problem which should be discussed :
1. What counts as an object? All words.
2. Which objects are easy to recognize? The words attached to image regions.
3. Which objects are indistinguishable using the features? The words with little difference between their posterior probability.

They construct a recognition model using EM method. First, segment images into regions. Second, use the k-means to vector quantize the region representation. The label associated with each region are called “blob”. Finally, run EM algorithm for finding the correspondence between blobs and words.

After obtaining the probability table, it is possible to choose the words with the highest probability given the blob and annotate the corresponding region with this word. Establish a threshold for reducing the incredible prediction. Also, it is more effective to improve performance by clustering the indistinguishable words.

On Spectral Clustering: Analysis and an algorithm

"On Spectral Clustering: Analysis and an algorithm", Andrew Y. Ng, Michael I. Jordan, Yair Weiss, NIPS 2001

Based on the spectral methods which use the top eigenvectors of a matrix corresponding to the similarity between some features, the paper proposes a simple spectral clustering algorithm to utilize the k eigenvectors simultaneously.
There are several steps:
1.Construct an affinity matrix defined by Gaussian kernel.
2.Set the covariance matrix by the affinity matrix.
3.Find the eigenvectors of the covariance matrix and do normalization.
4.Cluster the eigenvectors via k-means method.
5.Assign the original points to corresponding clusters.

Moreover, under some assumption and analyses, they say that the final clusters will be tight and the k well-separated points will be on the surface of the k-sphere according to their “true” clusters.

Normalized Cuts and Image Segmentation

"Normalized Cuts and Image Segmentation", Jianbo Shi and Jitendra Malik, Trans. PAMI 2000

The paper proposes a general framework for image segmentation. Generally, the result of each partition method is affected by the coherence of brightness, color, texture, or motion, and the hierarchical partition should form a tree structure. Therefore, the authors generate a graph theoretic formulation of grouping.

The grouping algorithm consists of several steps:
1.Given an image, set up a weighted graph, and set the weight on the edge according to the similarity.
2.solve the equation for eigenvectors with the smaller eigenvalues.
3.Here they use the eigenvector with the second smallest eigenvalue to bipartition the graph and solve the normalized cut problem.
4.Recursively partition if necessary.

If necessary, one can use all of the to eigenvectors to obtain a K-way partition and follow the modified algorithm.
The computational approach that they have developed uses matrix theory and linear algebra which are based on the concepts from spectral graph theory.