Youngwoon Lee's homepage

Feature Detection and Matching

For CS576 project #1, I implemented a Harris keypoint detector and a descriptor with gradient histogram on the provided skeleton code. To detect a keypoint, Harris corner detector is used in this project. To represent a keypoint, I computed histograms of gradient of 16 by 16 surrounding pixel values and normalized the histograms using a major orientation.

Detection

In detection of sailent keypoints, Harris corner detector is used in this project. Before computing Harris value, I reduced the noise using 3 by 3 mean filter. M is computed from 5 by 5 grid. I used a gaussian window function whose variance is 3.0. Harris value is computed by det(M) - 0.04 * trace(M) ^ 2 and I chose pixels, whose Harris values are higher than 0.0009 or 0.001, as keypoints. To take distinctive keypoints, I only chose pixels that are local-maxima within 5 by 5 grid.

Descriptor

For a discriminative represention of a keypoint, I adopt the idea of SIFT descriptor. I computed histograms of gradient of 16 by 16 surrounding pixel values. Gradient helps us to make a descriptor robust to illumination change. I made 4 histograms for upper-left, upper-right, lower-left, and lower-right 8 by 8 sub-grids. For each histogram, I collected an orientation of each pixel into 8 orientation and shifted the histograms until the major orientation places at first. Through normalization, my descriptor becomes robust to small rotation (up to 60 degree).

Results

Harris values



Yosemite image	Graf image

ROC curves

	Yosemite image	Graf image
SSD Distance
AUC	My = 0.925773 Simple Window = 0.624433	My = 0.683874 Simple Window = 0.486007
Ratio Test Distance
AUC	My = 0.871048 Simple Window = 0.666154	My = 0.629409 Simple Window = 0.614402

Test on benchmarks

Harris corner detector + My descriptor

Benchmark	Average AUC (SSD / Ratio Test)
Bikes	0.389670 / 0.473977
Graf	0.581375 / 0.550690
Leuven	0.341940 / 0.497064
Wall	0.476932 / 0.515706

Discussion

My implementation of detector works well in ordinary scene, but it is sensitive to the resolution and illumination. It is hard to decide threshold for a harris corner detector since we want to get small number of meaningful keypoints and, at the same time, we want to get enough number of keypoints to find correspondence of two images. Our threshold get only 5 keypoints for the "Bikes" benchmark, but we get more than 3000 keypoints from "Wall" benchmark.

Also, my descriptor works better for "Graf" and "Wall" benchmarks which have rotations and view point changes. On the other hand, it works poor for "Bikes" which has various resolution. In colclusion, it is not robust to illumination and scale.

References

FLTK