CS576 Computer Vision: Project #1

Feature Detection and Matching

Youngwoon Lee

For CS576 project #1, I implemented a Harris keypoint detector and a descriptor with gradient histogram on the provided skeleton code. To detect a keypoint, Harris corner detector is used in this project. To represent a keypoint, I computed histograms of gradient of 16 by 16 surrounding pixel values and normalized the histograms using a major orientation.


In detection of sailent keypoints, Harris corner detector is used in this project. Before computing Harris value, I reduced the noise using 3 by 3 mean filter. M is computed from 5 by 5 grid. I used a gaussian window function whose variance is 3.0. Harris value is computed by det(M) - 0.04 * trace(M) ^ 2 and I chose pixels, whose Harris values are higher than 0.0009 or 0.001, as keypoints. To take distinctive keypoints, I only chose pixels that are local-maxima within 5 by 5 grid.


For a discriminative represention of a keypoint, I adopt the idea of SIFT descriptor. I computed histograms of gradient of 16 by 16 surrounding pixel values. Gradient helps us to make a descriptor robust to illumination change. I made 4 histograms for upper-left, upper-right, lower-left, and lower-right 8 by 8 sub-grids. For each histogram, I collected an orientation of each pixel into 8 orientation and shifted the histograms until the major orientation places at first. Through normalization, my descriptor becomes robust to small rotation (up to 60 degree).


Harris values

Yosemite image Graf image

ROC curves

Yosemite image Graf image
SSD Distance
AUC My = 0.925773
Simple Window = 0.624433
My = 0.683874
Simple Window = 0.486007
Ratio Test Distance
AUC My = 0.871048
Simple Window = 0.666154
My = 0.629409
Simple Window = 0.614402

Test on benchmarks

Harris corner detector + My descriptor

Benchmark Average AUC (SSD / Ratio Test)
Bikes 0.389670 / 0.473977
Graf 0.581375 / 0.550690
Leuven 0.341940 / 0.497064
Wall 0.476932 / 0.515706


My implementation of detector works well in ordinary scene, but it is sensitive to the resolution and illumination. It is hard to decide threshold for a harris corner detector since we want to get small number of meaningful keypoints and, at the same time, we want to get enough number of keypoints to find correspondence of two images. Our threshold get only 5 keypoints for the "Bikes" benchmark, but we get more than 3000 keypoints from "Wall" benchmark.
Also, my descriptor works better for "Graf" and "Wall" benchmarks which have rotations and view point changes. On the other hand, it works poor for "Bikes" which has various resolution. In colclusion, it is not robust to illumination and scale.


