2009年7月24日

Hands-free vision-based interface for computer accessibility

"Hands-free vision-based interface forcomputer accessibility",
Javier Varona, Cristina Manresa-Yee, Francisco J. Perales, JNCA08'

In order to draw disabled people to new technology, the paper presents a hands-free vision-based interface. First, tracking the facial feature in few frames to initialize the model. No special lighting or static background is important, any orientation must be avoided during initialization. They select nose and eye zone for tracking by color distribution. The symmetry of nose feature points may affect the tracking precision. Locating user's eyes, they focus on eyes and eyebrows by color. Similarly, wearing glasses may result in certain lighting condition and cause error. They apply a weighting function according to the distance between pixels and the eye center. Then, using mean-shift algorithm to tracking. A linear regression method is used to smooth the positions. When recognizing facial gesture, wink recognition is taken into consideration. If the (vertical) iris contours are detected in the image, the eye will be considered as open, otherwise, close.

There may be two different forms to replace the mouse for hands-free computer accessibility. One is directly mapping the nose position onto the screen. Another uses relative head motion which has a predictable tendency and is not as sensitive to the racking accuracy.

More head and facial gestures are planned for improving the system.

2009年6月18日

Support vector learning for ordinal regression

"Support vector learning for ordinal regression," R. Herbrich, ICANN, 1999

The paper presents a method to solve ordinal regression by support vector. They reformulate the ranking problem into binary classification problem. That is, given the pairs of instances, output their relative ranking according to their classified labels. The idea about applying SVM on ranking is really impressive. However, because the learning process is based on pairs of objects, it may be time-consuming.

2009年6月3日

The structure and function of complex networks

"The structure and function of complex networks," Newman, 2003.

Generally, there are some terms with regard to a graph, such as vertex, edge, directed/undirected, degree, component, geodesic path, and diameter. Besides, there are also some kinds of networks in the real world. For example, social networks is groups of people with some pattern of interactions between them. The information networks is the network of citations between academic papers or the World Wide Web.
One of the properties of networks is the small-world effect. The effect shows that most pairs of vertices in most networks seem to be connected by a short path through the network, and the information may spread very fast in few steps. Another property is network resilience. Networks vary when vertices are removed or added. That is, the typical path of some paths may increase and the communication between some pairs may become impossible if vertices are removed from the network.
A random graph consists of vertices and the edges with probability. The assigned probabilities are the major study of many papers.

2009年5月27日

Lazy Snapping

"Lazy Snapping," Li, et. al., ACM SIGGRAPH, 2004

The paper presents a coarse-to-fine UI design for image cutout which is the technique of removing an object in an image from the background. There are two steps : (1) object marking (2) boundary editing.
In the first step, user can specify the foreground and background region respectively. Then, the uncertain region can be labeled by calculating the likelihood energy function. In order to get likelihood energy of each region, first cluster the colors in seeds F and B by the K-means method. And compute the minimum distance for each node. Moreover, add a penalty term to specify the gradient effect. Finally, minimize the energy and get the result. To improve the efficiency, apply watershed algorithm on the graph cut formulation to dealt with the segmented regions, and possibly get the reasonable result in a significantly improved speed.
The fine boundary can be adjusted in the second step. But the prior energy is fixed by adding the polygon locations as soft constraints.
In conclusion, they succeed in develop an interactive system to cut out the foreground object from the background in an image. However, they want to do better at the thin and branch structures.

2009年5月20日

Learning Low-Level Vision

"Learning Low-Level Vision," Freeman, IJCV, 2000.

The paper describes a learning-based method for low-level vision problems, such as motion analysis, inferring shape and reflectance from a photograph, or extrapolating image detail, that is, how to estimate scenes from images is the goal. Given training sets, they succeed in enumerating a coarse sampling of all input patch values by preprocessing or restricting to some classes. Breaking the scenes into a Markov network, the algorithm can find the optimal scene explanation if given any image data. It shows that applying machine learning methods has the benefits to problems of visual interpretations.

An introduction to graphical models

"An introduction to graphical models," Kevin Murphy, 2001.

Graphical models are the combination of probability theory and graph theory. They can be directed or undirected models. The directed graphical models are known as Bayesian networks, where the ancestor/parent relationship is with respect to some fixed topological ordering of the nodes, and the undirected graphical models are known as Markov networks. There are some hidden causes in the graph, and their values must be estimated from observation, that is inference. There are some popular approximate inference methods: sampling(Monte Carlo) methods, variational methods, and loopy brief propagation. According to the structure and the observation, Learning methods are simply classified into four category:

  observability
structure\ full partial
known closed form EM
unknown local search structural EM

Finally, computing the optimal actions to perform to get the maximum expected utility and making decisions under uncertainty. The decision algorithm is similar to inference algorithm.

2009年5月13日

Rapid object detection using a boosted cascade of simple features

"Rapid object detection using a boosted cascade of simple features," Paul Viola and Michael Jones, CVPR, 2001

The paper proposes a rapid object detection method and constructs a frontal face detection system to prove the idea. First, they brings the “integral image” calculation which lets feature evaluation faster. That is, do first pass to sum the region of pixels above and to the left of each location, and it is possible to get the sum of each rectangle in the image by using the values of four corners. Second, use Adaboost method to select important features as classifiers. Third, construct a cascade structure combining complex classifiers to increase the detection speed. In other words, simpler classifiers are used to reject most illegal instances, and then more complex classifiers are called to reduce false positive rates

In conclusion, the paper presents a really useful integral image method that speed up the following corresponding calculation, and the face system they constructs is the basis of face detection and cause more applications and improved techniques.

2009年5月7日

Names and Faces in the News Abstract

"Names and Faces in the News Abstract," Miller, CVPR, 2004

The paper presents a clustering method to deal with ambiguities in labeling and identify incorrectly labeled faces.

Before clustering, they perform kernel PCA to reduce the dimensionality and linear discriminant analysis to project data for discriminant analysis.

The clustering process goes as following:
1.Randomly assign each image to one of its extracted names.
2.For each distinct name (cluster), calculate the mean of image vectors assigned to that name.
3. Reassign each image to the closest mean of its extracted names.
4. Repeat 2-3 until convergence (i.e. no image changes names during an iteration)

Next, do cluster pruning for meaningless clusters according to likelihood. If there are some clusters which have similar compositions, merge them and get the final clusters. In the experiment, they success in cleaning up noisy unsupervised data and use entropy to evaluate the result.

2009年5月6日

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

"Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary," Duygulu, ECCV, 2002

In order to learn a lexicon for a fixed image vocabulary, there are three major object recognition problem which should be discussed :
1. What counts as an object? All words.
2. Which objects are easy to recognize? The words attached to image regions.
3. Which objects are indistinguishable using the features? The words with little difference between their posterior probability.

They construct a recognition model using EM method. First, segment images into regions. Second, use the k-means to vector quantize the region representation. The label associated with each region are called “blob”. Finally, run EM algorithm for finding the correspondence between blobs and words.

After obtaining the probability table, it is possible to choose the words with the highest probability given the blob and annotate the corresponding region with this word. Establish a threshold for reducing the incredible prediction. Also, it is more effective to improve performance by clustering the indistinguishable words.

On Spectral Clustering: Analysis and an algorithm

"On Spectral Clustering: Analysis and an algorithm", Andrew Y. Ng, Michael I. Jordan, Yair Weiss, NIPS 2001

Based on the spectral methods which use the top eigenvectors of a matrix corresponding to the similarity between some features, the paper proposes a simple spectral clustering algorithm to utilize the k eigenvectors simultaneously.
There are several steps:
1.Construct an affinity matrix defined by Gaussian kernel.
2.Set the covariance matrix by the affinity matrix.
3.Find the eigenvectors of the covariance matrix and do normalization.
4.Cluster the eigenvectors via k-means method.
5.Assign the original points to corresponding clusters.

Moreover, under some assumption and analyses, they say that the final clusters will be tight and the k well-separated points will be on the surface of the k-sphere according to their “true” clusters.

Normalized Cuts and Image Segmentation

"Normalized Cuts and Image Segmentation", Jianbo Shi and Jitendra Malik, Trans. PAMI 2000

The paper proposes a general framework for image segmentation. Generally, the result of each partition method is affected by the coherence of brightness, color, texture, or motion, and the hierarchical partition should form a tree structure. Therefore, the authors generate a graph theoretic formulation of grouping.

The grouping algorithm consists of several steps:
1.Given an image, set up a weighted graph, and set the weight on the edge according to the similarity.
2.solve the equation for eigenvectors with the smaller eigenvalues.
3.Here they use the eigenvector with the second smallest eigenvalue to bipartition the graph and solve the normalized cut problem.
4.Recursively partition if necessary.

If necessary, one can use all of the to eigenvectors to obtain a K-way partition and follow the modified algorithm.
The computational approach that they have developed uses matrix theory and linear algebra which are based on the concepts from spectral graph theory.

2009年4月7日

Algorithms for fast vector quantization

"Algorithms for fast vector quantization," Sunil. Arya and David. M. Mount, 1993

The paper presents three algorithms, standard k-d tree search, priority k-d tree search, and neighborhood graph search, for nearest neighbor matching problem which is important in many applications, including vector quantization.

Standard k-d tree is a binary search tree in high dimension. Each internal node of the k-d tree corresponds to a hyperplane orthogonal to one of the coordinate axis which splits a rectangle into two parts, and the leaf nodes store the data points. The algorithm works with incremental distance calculation. That is, compute the squared distance at each leaf node and update the nearest neighbor by finding the corresponding nodes recursively. Priority k-d tree involves a heuristic idea and reduces the complexity by interrupting the search before it terminates. Relative neighborhood graph(RNG) lets two point are adjacent if there is no point that is simultaneously closer to both points than they are to one another. The experiment shows that RNG*-search is fastest of the three algorithm but require twice as much storage than the other, and its complexity is a little higher.

2009年3月31日

Latent Dirichlet allocation

"Latent Dirichlet allocation," D. Blei, A. Ng, and M. Jordan. . Journal of Machine Learning Research, 3:993–1022, January 2003

Latent Dirichlet allocation (LDA) is a generative probabilistic model for collections of discrete data such as text corpora. The goal is to find short descriptions of the members of a collection that enable efficient processing of large collections while preserving the essential statistical relationships that are useful for basic tasks such as classification, novelty detection, summarization, and similarity and relevance judgments. The basic idea of LDA is that documents are represented as random mixture distributions over latent topics, where topics generate words by fixed conditional distribution and those topics are infinitely exchangeable within a document. Therefore, compared with other latent topic models, LDA overcomes limiting assumption in mixture of unigrams and overfitting problem in pLSI by treating the topic mixture weights as a k-parameter hidden random variable and gets a smooth distribution on the topic simplex. Besides, LDA finds the optimal variational parameters by KL-divergence and applies variational EM algorithm to get approximate empirical Bayes estimates, alpha and beta. In conclusion, LDA is a simple model for dimensionality reduction and has modularity and extensibility for more application.

Probabilistic Latent Semantic Indexing

“Probabilistic Latent Semantic Indexing”, Thomas Hofmann, SIGIR, 1999

Probabilistic Latent Semantic Analysis(PLSA) is a novel method for automated indexing based on the likelihood principle. PLSA defines a statistical latent class model called aspect model fitting with tempered EM algorithm. In contrast to LSA which determines the optimal decomposition by L2-norm, PLSA relies on the likelihood function of multinomial sampling and aims at an explicit maximization of the predictive power of the model which has a clear probabilistic meaning in terms of mixture component distributions. PLSA succeeds in dealing with the potential impreciseness of user queries by detecting the synonyms and polysemous words.

2009年3月27日

Contour and Texture Analysis for Image Segmentation

Title: Contour and Texture Analysis for Image Segmentation
Author: JITENDRA MALIK, SERGE BELONGIE, THOMAS LEUNG∗AND JIANBO SHI†
Publisher: IJCV
Date of Publication: Feb 23, 2001

Because texture provides a good local descriptor of image patches, this paper proposes a general algorithm for image segmentation by using texture. They apply different filters to the input images and get the filter responses which can be combined into response vectors, which is called texton. Each pixel is mapped into exactly one texton, and using K-means to cluster the corresponding pixels. A descriptor also considers the orientation and scale and construct the windowed texton histogram in a local region. Finally, exploiting the cues of contour and texture differences to succeed in image segmentation.

2009年3月22日

Shape Matching and Object Recognition Using Shape Contexts

Title: Shape Matching and Object Recognition Using Shape Contexts
Author: Serge Belongie, Jitendra Malik, Jan Puzincha
Publisher: IEEE
Month of Publication: April 2002

The paper proposes a stable and simple algorithm for finding corresponding between shapes. They introduce a shape descriptor, shape context, and maximize the similarity in the bipartite graph. The demonstration of 2D objects, e.g., handwritten digits, silhouettes, and trademarks, and 3D objects from Columbia COIL data set shows the improved performance.

First, a rich local descriptor, shape context, is proposed in order to match easier. It considers the set of vectors originating from a point to all other sample points on a shape. For a point on the shape, compute a coarse histogram of the relative coordinates of the remaining points according to the bins in log-polar space, making nearer points have more weighting. Use chi-square test statistic to be the cost C and identify the similarity. Second, when minimizing the cost of bipartite graph matching, they consider the scale invariance by normalizing all radial distance and rotation invariance by turning relative frames with the tangent angle. Moreover, one can add “dummy” nodes to get robust handling of outliers. Third, in the modeling transformation, they use the thin plate spline (TPS) model which includes the affine model, it is possible to estimate transformations in few iterations. Finally, estimate shape distances as the weighted sum of three terms: shape context distance, image appearance distance, and bending energy. And then apply the prototype-based approach. That is, use a variant of K-means, K-medoids, to select a ideal example for each category, and classify the query shape based on the minimal cost.

In conclusion, it is able to retrieval the objects which have similar shapes with the query, and the performance is really improved.

In my opinion, it is thoughtful that they consider several key points and construct the distance weighting function based on three term. However, it is possible that more parameters may cause biased query results. Maybe they should show individual statistic for each terms and convey us that the weighting function is really believable. Besides, I think that the modified K-means may be helpful in some cases. The outlier removal and warped transformation seems useful to remove noises according to the figure 4.

2009年3月13日

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Title : Nonlinear Dimensionality Reduction by Locally Linear Embedding
Author : S. T. Roweis and L. K. Saul
Year of Publication: 2000

The paper introduces an unsupervised learning algorithm, locally linear embedding(LLE). The following is the steps of locally linear embedding.

1.Find the neighbors of each point which close to a locally linear patch the manifold. (kNN in this paper)
2.Reconstruct each data point from its neighbors with the constrained least-square error.
3.Compute the low-dimensional embedding vector by minimizing the embedded cost function.

In summary, unlike PCA or MDS, LLE succeeds in mapping nearby data points in the low dimensional system. LLE avoids solving large dynamic programming problems and accumulate sparse matrices; consequently, LLE saves a lot of time and space in the computation.

In my opinion, LLE is easy to figure out because of three simple steps. However, the first time I read the paper, I have no idea about what LLE is. That is, the paper seems to organize many research results together and hide the details, so it may expect the readers to have relative background knowledge or survey the references when reading.

2009年3月9日

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

Title: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection
Author: Peter N. Belhumeur, Joao P. Hespanha, and David J. kriegman
Year of Publication: 1997
Publisher: IEEE

In this paper, the authors proposed the method, Fisherfaces, for face recognition. Fisherfaces is insensitive to large variations in illumination and facial expressions. Compared with eigenfaces which uses PCA for dimensionality reduction and maximize the total scatter across all classes, fisherfaces maximize the ratio of between-class scatter to that of the within-in class scatter. They compare four methods for recognition under variation in lighting and facial expression: correlation, a variant of the linear subspace method, the Eigenface method, and the Fisherface method.
(1) Correlation is the simplest method but need variant lighting training data and require large time complexity and storage.
(2) Eigenfaces apply PCA and reduces time complexity a lot. However, when it maximizes between-class scatter, it also maximizes with-in class scatter which is unwanted information for face recognition. It have been suggested that by discarding the three most significant principal components, the effects of variant illumination may be reduced, but it may also result in unexpected consequence.
(3) The linear subspace algorithm take the normal vector to the surface and the albedo of the surface into consideration. That is, the algorithm can easily recognize Lambertian surfaces and be insensitive to a wide range of lighting conditions. Nevertheless, it has to learn where the good regions for recognition are, and its computation and storage are higher than the Eigenfaces method.
(4) Fisherfaces use the Fisher’s Linear Discriminant method which is class specific. The approach maximize the ratio of the between-class scatter and the within-class scatter, that is, achieve greater between-class scatter and decrease within-class scatter.

In conclusion, it shows that fisherfaces perform better than other three methods in the several experiments, variant lighting, facial expression, and glasses recognition. It is based on more reasonable dimensionality reduction, and it requires lower computation by modifying the original equation with PCA. Also, it doesn’t need storage as much as the Linear Subspace method. What it can be improved is how to deal with extreme lighting condition and , maybe, side face recognition.

2009年3月8日

Eigenfaces for Recognition

Title: Eigenfaces for Recognition
Author: Matthew Turk and Alex Pentland
Year of Publication: 1991

Eigenfaces is a approach that decomposes face images into a small set of characteristic feature images and is based on information theory. In other words, the approach extracts the information contained in a collection of face images and uses the variation of these images to encode and compare individual face images to do face recognition.

In mathematical terms, treat an image as a vector in a very high dimensional space and regard the eigenvectors as a set of features that characterize the variation between face images. By using principal component analysis(PCA), it is possible to construct the subspace of face images, called “face space”, with lower dimension. Each individual face can be represented in terms of linear combination of the vectors which are referred to as “eigenfaces” because of face-like appearance. Simply, there are operations as following.

(1) Collect several face images for each person.
(2) Calculate the eigenfaces with the highest associated eigenvalues.
(3) For each known individual, project their face images onto the “face space”. Choose a threshold that defines the maximum allowable distance from any face class.
(4) Calculate the pattern vector for each new face images and the distances to each know class.
(5) If the input image is near face space, it is recognized. And if it is near a known face class, it is known and added to the original set of similar face images to recalculate the eigenfaces; otherwise, it is unknown and it may be used to a new face class.

Moreover, it is possible to detect motion after filtering and rescaling the input image appropriately. Calculate the orientation of the motion of the head or use simple symmetric operators can benefit the recognition of the face rotation.

In my opinion, the approach applying PCA to reduce the dimension is not very difficult, and it is really a good method to construct the eigenfaces. That is, face recognition is nothing but extract the features of face images and decide whether a face is efficiently, and this approach reaches the goal. However, the variant background, the scale of the input images, or the illumination still affect the recognition result a lot. The training set also has a significant effect for the precision of the recognition. How to decide the tradeoff between remained dimensions and the precision rate is another problem.

2009年3月3日

Scale & Affine Invariant Interest Point Detectors

Title:Scale & Affine Invariant Interest Point Detectors
Author: KRYSTIAN MIKOLAJCZYK AND CORDELIA SCHMID
Date of Publication: January 22, 2004

The paper describes two approaches for scale and affine invariant interest point detection. The scale invariant interest point detector, Harris-Laplace detector, combines the Harris detector with automatic scale selection. The algorithm involves a multi-scale point detection and an iterative selection of the scale and the location. The affine invariant interest point detector is initialized by the multi-scale Harris detector. Compute integration and differentiation scale to obtain shape matrix for each interest point. Finally, converge to a local structure in the iterative procedure.

What a pity is that there is a trade off between scale detection and affine detection because of different effects of the two kinds of detector.

Distinctive Image Featuresfrom Scale-Invariant Keypoints

Title:Distinctive Image Featuresfrom Scale-Invariant Keypoints
Author:David G. Lowe
Date of Publication: January 5, 2004

This paper describes an approach, Scale Invariant Feature Transform(SIFT), which transforms image data into scale-invariant coordinates relative to local features. There are several major stages of computation.

(1) The scale-space extrema detection stage searches all scales and image location and uses a cascade filtering approach to identify potential interest points that are invariant to scale and orientation. That is, compute the scale space of an image, a function produced from convolution of a variable-scale Gaussian with an input image, for scale feature description across all possible scales, and then use the function to compute the difference-of-Gaussian and find the maxima and minima of the DOG which can produce the most stable image features.

(2) The keypoint localization stage fits detailed model to determine location and scale at each candidate location and selects keypoints based on measures of their stability.

(3) The orientation assignment stage assigns one or more orientations to each keypoint location based on local image gradient directions; therefore, achieve invariance to image rotation.

(4) The keypoint descriptor stage measures the local image gradients at the selected scale in the region around each keypoint, which allows for significant levels of local shape distortion and change in illumination. In other words, a keypoint descriptor is created by computing the gradient magnitude and orientation at each image sample in a region. After weighted by a Gaussian window, the samples are accumulated into orientation histograms summarizing the contents over subregions. To improve the effects of shift and illumination change, trilinear interpolation, vector normalization and thresholding are taken into consideration.

In experiment on object recognition, an approximation algorithm, Best-Bin-First(BBF), is used for efficient nearest neighbor indexing to find minimum Euclidean distance in matching. Take Hough transform to cluster features. After solving affine parameters by least-square, accept a model if final probability is higher than a threshold.

Image Retrieval: Ideas, Influences, and Trends of the New Age

Title: Image Retrieval: Ideas, Influences, and Trends of the New Age
Authors: RITENDRA DATTA, DHIRAJ JOSHI, JIA LI, and JAMES Z. WANG
Year of Publication: 2008
Publisher: ACM

Content-based image retrieval is an technology helps to organize digital picture archives by their visual content. There are two gaps, sensory gap and semantic gap, which define and motivate most of the related problems. The sensory gap is a gap between real object and the descriptive information, and the semantic gap is about the lack of coincidence between the information from the visual data and the interpretation from a user. Therefore, how to solve the gaps and satisfy users is the goal of CBIR.

The first important thing is to clarify user-system interaction. A user perspective involves what the user wants and what is the form used in query, and a system perspective is about how to interact with user. Simply, human-center based system is required for different kinds of user intent with several query types, including keywords, free-text, image, graphics, and composite. For example, it is possible for a composite query method to provide a system involving gestures and speech for querying, or help user refine the queries by hints. If a system can collect manual tags for pictures, not only facilitating text-based querying, but also building reliable training datasets. Moreover, how to design a retrieval system on portable devices which have many constraints, such as limited size and color depth of display, is one of the issues of visualization.

The two core problems of CBIR are (1) how to define a mathematical description or a signature of an image, (2) how to decide the similarity between a pair of images.

For region-based visual signatures, the first step is image segmentation. With k-means clustering or normalized cut criteria method, segmentation helps image understanding and extracts several types of features. A feature capture a specific property of an image, either globally for the entire image with higher speed for computation, or locally for a small group of pixels with more specific identification of important visual characteristics. Color features are usually summarized into histogram. Texture features are used to capture granularity and repetitive patterns. Shape is a key to specify regions. Spatial modeling and matching are regarding to local image entities. Interest points that can deal with significant affine transformation and illumination changes are based on local invariants. When constructing signatures from features, histograms is easy but tend to be sparse in multidimensional space. A region-based signature allow representative vector to adapt images and the region of color and texture is likely corresponding to an object in an image.

There are three types of signatures, feature vector, summary of local feature vectors , and region-based signature. Each of them has different appropriate similarity measures. Using the geodesic distances for a single vector may be better. Summaries of local feature vectors such as codebook and probability density functions are generated by vector quantization and KL distance separately. The region-based signature can form a histogram, and calculate the similarity from the pair-wise distances between individual vectors. More matching methods improve the basic idea from region weights, speed, or segmentation.

Due to faster retrieval, clustering and classification is practical and useful. Classification is treated as a preprocessing step and improve accuracy but require prior training data. Clustering helps visualization and retrieval efficiency but may not representative enough or accurate for visualization. Besides, in order to capture user’s precise needs, relevance feedback system which does iterative feedback and refinement is designed. That is, relevance feedback let users give feedback after querying, and the system learns case-specific query semantics dynamically according to the feedback.

There are some offshoots of CBIR. First, automated annotation attempts at automated concept discovery; what is more is that deciding on an appropriate picture set for a given story. Second, ranking or similarity of images is usually sorted by size, color depth, or shape; however, aesthetics may be another higher-level basis which involves the feelings or emotions of people. Moreover, CBIR may concern with possible security attack or image copy protection.

Finally, evaluation benchmark of CBIR must some key points: coverage, unbiasedness, and user focus. Ideally, it should be subjective, context-specific, and community-based.

2009年2月28日

How to give a good research talk

Reading : "How to give a good research talk," Jones et. al.

For a person who gives a 30-60 minutes talk, there are several suggestions to follow.

First, prepare appropriate content according to background knowledge of the audience. Omit unnecessary contents and remain the things that can convince the listeners of the primary topic. It may be more persuasive with precisely motivating examples. Second, use an overhead projector is effective. Put what is about to be explained on the slides, which not only saves the visual bandwidth but also gives the audience emphasis. Prepare the slides just one day before the talk may help us have fresh materials in our minds. Third, overcome nerves, show clear slides, and do not overrun. Try deep breathing or exercise may be effective to reduce nerves. Do not reveal a slide line by line or block people’s view. It is limited for a person to follow in a constant period of time, so take whether the listeners get the point into consideration before jump to the next section. Moreover, it is quite helpful to reorient the audience with a slide for each part.

In my opinion, prepare slides by typesetting with computer rather than writing by hand also has advantages for me. It saves time and does nice if someone is familiar with some software, such as Powerpoint. Besides, rehearsal several times before giving a talk is really helpful in my experience.

How to Read a Paper

Reading : "How to Read a Paper," Keshav, ACM SIGCOMM Computer Communication Review 2007

This writer shares an efficient method to read a paper. Because researchers may spent a lot of hours reading papers and it is terrible if wasting much effort in the process, he introduces the three-pass approach.

First of all, skim through the target paper quickly. Read only titles, abstract, introduction, and conclusion to get the main idea and contribution of that paper. After that, it is enough for us to decide whether we need to read further. Second, get the key points from figures, diagrams, and other illustrations. Graphs usually help us to know the thrust more clearly and show the results of the paper. After grasp more contents, it is time to skip the paper if abstruse or useless; otherwise, enter the third pass. In the third pass, do a deep reading to fully understand a paper. Pay attention to details, such as proves, assumptions, and particular techniques. After this pass, it should be able for us to reconstruct the overall idea and give practical comments. Besides, the writer suggests doing a literature survey iteratively. That is, after reading a paper, we may select relative papers to study according to citations and references.

The writer follows the discipline for many years. He is able to adjusts the depth of paper evaluation depending on his needs and how much time he has, and the three-pass approach really helps him to read a paper efficiently.