Department of Electrical & Computer Engineering
Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by: Yan Fang
Department of Electrical & Computer Engineering
What are Local Features? • Image Pattern differs from its immediate neighborhood • Associate with change of image property • Used for image descriptors
Department of Electrical & Computer Engineering
Why Local Features? • Specific semantic interpretation in the limited context of a certain application. -
Edge -> road Blob -> impurities
• Limited set of well localized and individually identifiable anchor points. -
Motion tracking 3D reconstruction Image alignment or mosaicing
• A robust image representation, no need for segmentation -
Scene recognition Object recognition Needs no meaning in features
Department of Electrical & Computer Engineering
Some Terms • Detector - Tools that extract features from images • Descriptor – Instance of feature representation • Invariant - A function is invariant under a certain family of transformations if its value does not change when a transformation from this family is applied to its argument. • Local feature – (ideal) location in space, no spatial extent -
interest points Regions edge segments
Department of Electrical & Computer Engineering
Ideal Local Features •
Repeatability: found in different condition images of same object or scene. - Invariance - Robustness
•
Distinctiveness/informativeness: The intensity patterns underlying the detected features should show a lot of variation.
•
Locality: reduce the probability of occlusion and to allow simple model approximations of the geometric and photometric deformations.
•
Quantity: sufficiently large number, even on small objects.
•
Accuracy: The detected features should be accurately localized, both in image location, as with respect to scale and possibly shape.
•
Efficiency: time, fast, easy for computation
Department of Electrical & Computer Engineering
Discussion on Local Features •
Repeatability: depends on invariance, robustness, quantity.
•
Distinctiveness v.s. Locality - More local, less information, harder to match - In some case (mosaicing), locality can be scarified
•
Distinctiveness v.s. Invariance - Degree of freedom of transformation
•
Distinctiveness v.s. Robustness - Information Loss for robustness - Denoise v.s. Detail
Department of Electrical & Computer Engineering
Compare with other features • Global Features -
Describe content with color histogram Usage: Segmentation, Object recognition Fail in distinguishing foreground and background Image clutter and occlusion are problems
• Image Segments -
Difficult by itself, require much information from image Search for blob, based on texture/color
• Sampled Features -
Exhaustively Sample from subparts with sliding window Solve background problem, not partial occlusion Fixed grid sampling, difficult for invariance. Random sampling, better localization, poor repeatability, not used alone Sampling from edge, good with wiry objects
Department of Electrical & Computer Engineering
Corner Detector – Harris Detector • Distinguish “flat”, “Edge”, “Corner” • Auto-Correlation Matrix, describe gradient distribution of local neighborhood -
Smooth with Gaussian kernel Two eigenvalues indicate image signal change in two direction Large eigenvalue in both means potential corner
• Measure the cornerness
Department of Electrical & Computer Engineering
Corner Detector – Harris Detector
For interest points detection, extract local minimum of cornerness function with non-maximum suppressions
Department of Electrical & Computer Engineering
Example of Harris Detector
Results on rotated image examples Notice T-junctions also be found other than true corners
Department of Electrical & Computer Engineering
Select Feature Detector • Select feature detectors based on image content and category • Do not use more invariance than need. Notice the tradeoff between invariance and distinguish power. • Consider other properties depend on application scenario -
localization accuracy for Camera Calibration or 3D modeling Efficiency for large dataset
Department of Electrical & Computer Engineering
Introduction to SIFT • Problem: Object Recognition in cluttered real world scene • Challenge & Difficulty: Finding image features resist to object variation
• Proposed Method: Scale Invariant Feature Transform(SIFT)
Department of Electrical & Computer Engineering
Invariance • • • •
Illumination Scale Rotation Affine
Department of Electrical & Computer Engineering
Previous Work • Candidate feature types – line segments
– groupings of edges – regions
• Zhang et al – Harris Corner Detection – Detect peaks in local image variation
• Schmid and Mohr – Harris Corner Detection for interesting points – Orientation-invariant vector of derivative-of-Gaussian image measurements
Department of Electrical & Computer Engineering
Motivation & Improvement Limitation of related work: • Examine image only on a single scale • Difficult to extend to other circumstance • Focus on feature detection, overlook the descriptor This work: • Identify key location in scale-space • Selected feature vectors invariant to scaling, stretching, rotation and other variation • Improvement on feature descriptor • Efficient, less than 2 second with clutter and occlusion
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion • • • •
Feature Detection Local Image Description Indexing and Matching Model Verification
Department of Electrical & Computer Engineering
Scale Space Proper scaling of objects in new image is unknown Exploring features in different scales is helpful to recognize different objects.
Department of Electrical & Computer Engineering
Difference of Gaussian (DoG) • A = Convolve image with vertical and horizontal 1D Gaussians, 𝜎 = 2 • B = Convolve A with vertical and horizontal 1D Gaussians, 𝜎 = 2 • DOG (Difference of Gaussian) = A – B • Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels) 𝐷 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝑘𝜎 − 𝐺 𝑥, 𝑦, 𝜎
∗ 𝐼 𝑥, 𝑦 , 𝑘 = 2
Department of Electrical & Computer Engineering
Image Pyramid of DoG A3-B3
B3
G
A3
A2-B2
Downsample B2
G
A2
A1-B1
Downsample
B1
G G
A1
DOG1 DOG Pyramid1
Department of Electrical & Computer Engineering
Pyramid of DoG (Octave)
2k2σ 2kσ 2σ kσ σ David G. Lowe, IJCV 2004
2kσ 2σ kσ
σ
Department of Electrical & Computer Engineering
DoG Example A3
A2
A1 Ashley L. Kapron
B3
DoG3
B2
DoG2
B1
DoG1
Department of Electrical & Computer Engineering
Feature Detection • Find maxima and minima of scale space • For each point on a DOG level: – Compare to 26 neighbors at adjacent level
• Repeat for each DOG level • Key points remains
David G. Lowe, IJCV 2004
Department of Electrical & Computer Engineering
SIFT key stability - Illumination • For all levels, compute – Gradient Magnitude – 𝑀𝑖𝑗 =
(𝐴𝑖𝑗 − 𝐴𝑖+1,𝑗 )2 +(𝐴𝑖𝑗 − 𝐴𝑖,𝑗+1 )2
• Threshold gradient magnitudes: – Remove all key points with MIJ less than 0.1 times the max gradient value
• Motivation: Low contrast is generally less reliable than high for feature points
Department of Electrical & Computer Engineering
SIFT key stability - Orientation • For all levels, compute – Gradient Orientation – 𝑅𝑖𝑗 = 𝑎𝑡𝑎𝑛2(𝐴𝑖𝑗 − 𝐴𝑖−1,𝑗 , 𝐴𝑖𝑗+1 − 𝐴𝑖,𝑗 )
+
Gaussian Smoothed Image Ashley L. Kapron
Gradient Orientation
Gradient Magnitude
Department of Electrical & Computer Engineering
SIFT key stability - Orientation • Gradient magnitude weighted by 2D gaussian
=
* Gradient Magnitude
Ashley L. Kapron
2D Gaussian
Weighted Magnitude
Department of Electrical & Computer Engineering
SIFT key stability - Orientation • Identify peak
Weighted Magnitude
Gradient Orientation
Sum of Weighted Magnitudes
• Assign orientation and sum of magnitude to key point
Peak
Gradient Orientation Ashley L. Kapron
Department of Electrical & Computer Engineering
Example of Key Points
Max/mins from DOG pyramid
Ashley L. Kapron
Filter for illumination
Filter for edge orientation
Department of Electrical & Computer Engineering
Stability Test 78% of the keys survive from rotation, scaling, stretching, change of brightness and contrast, and addition of pixel noise.
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion • • • •
Feature Detection Local Image Description Indexing and Matching Model Verification
Department of Electrical & Computer Engineering
Local Image Description • SIFT keys each assigned: – Location
– Scale (analogous to level it was detected) – Orientation (assigned in previous canonical orientation steps)
• Now: Describe local image region invariant to the above transformations
Department of Electrical & Computer Engineering
SIFT Key Example
Department of Electrical & Computer Engineering
Local Image Description For each key point: •
Identify 8x8 neighborhood (from DOG level it was detected)
•
Align orientation to x-axis (subtracted by the orientation of key points)
Department of Electrical & Computer Engineering
Local Image Description •
Calculate gradient magnitude and orientation map and weight by Gaussian
Department of Electrical & Computer Engineering
Local Image Description •
Calculate gradient magnitude and orientation map and weight by Gaussian
•
Sum the weighted gradient magnitude at near direction. Calculate histogram of each 4x4 region. 8 bins for gradient orientation.
Department of Electrical & Computer Engineering
Local Image Description •
Calculate gradient magnitude and orientation map and weight by Gaussian
•
Sum the weighted gradient magnitude at near direction.Calculate histogram of each 4x4 region. 8 bins for gradient orientation.
•
This histogram array is the image descriptor.
Ashley L. Kapron
Department of Electrical & Computer Engineering
Orientations Numbers
David G. Lowe, IJCV 2004
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion • • • •
Feature Detection Local Image Description Indexing and Matching Model Verification
Department of Electrical & Computer Engineering
Image Matching
Database
Input Image
Department of Electrical & Computer Engineering
Image Matching • Find all key points identified in target image – Each key point will have 2D location, scale and orientation, as well as invariant descriptor vector
• For each key point, search similar descriptor vectors in reference image database. – Descriptor vector may match more than one reference pose database
– The key point “votes” for pose(s)
• Use best-bin-first algorithm
Department of Electrical & Computer Engineering
Hough Transform Clustering • Create 4D Hough Transform (HT) Space for each reference pose 1. Orientation bin = 30° 2. Scale bin = 2 3. X location bin = 0.25*ref image width 4. Y location bin = 0.25*ref image height
• If key point “votes” for reference pose, count the vote which gives estimate of location and pose • Keep list of which key points vote for a bin
Department of Electrical & Computer Engineering
Stage of SIFT Object Recogntion • • • •
Feature Detection Local Image Description Indexing and Matching Model Verification
Department of Electrical & Computer Engineering
Verification • Identify bins with largest votes (must have at least 3). • Using list of key points which voted for a cell, compute affine transformation parameters (M, T)
• Use corresponding coordinates of reference model (x,y) and target image (u,v). • If more than three points, solve in least-squares sense
Department of Electrical & Computer Engineering
Remove Outliers • After applying affine transformation to key points, determine difference between calculated location and actual target image location • Candidate must meet: – Orientation within 15° – Scale changed within 2 – X,Y location within 0.2*model size
• Repeat least-squares solution until no points are removed • Fewer than 3 points remain lead to rejection
Department of Electrical & Computer Engineering
Object Recognition Example
Department of Electrical & Computer Engineering
Object Recognition Example
Department of Electrical & Computer Engineering
Pros & Cons • Numerous keys can be generated from scaling space for even small objects • Partial occlusion/image clutter can be dealt with • Object models can undergo limited affine projection.
• Individual features can be matched to a large database of objects • Robust recognition can be performed fast • Fully affine transformations require additional steps • Method was not evaluated by large data set with various case.
Department of Electrical & Computer Engineering
Future Works • Deeper exploration in scale space with octave of incremental Gaussian filtering • Sub-pixel localization with 3D curve fitting • Filter edge and low contrast points • More?
Department of Electrical & Computer Engineering
Questions?