sliding - Semantic Scholar

Report 3 Downloads 104 Views
Department of Electrical & Computer Engineering

Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by: Yan Fang

Department of Electrical & Computer Engineering

What are Local Features? • Image Pattern differs from its immediate neighborhood • Associate with change of image property • Used for image descriptors

Department of Electrical & Computer Engineering

Why Local Features? • Specific semantic interpretation in the limited context of a certain application. -

Edge -> road Blob -> impurities

• Limited set of well localized and individually identifiable anchor points. -

Motion tracking 3D reconstruction Image alignment or mosaicing

• A robust image representation, no need for segmentation -

Scene recognition Object recognition Needs no meaning in features

Department of Electrical & Computer Engineering

Some Terms • Detector - Tools that extract features from images • Descriptor – Instance of feature representation • Invariant - A function is invariant under a certain family of transformations if its value does not change when a transformation from this family is applied to its argument. • Local feature – (ideal) location in space, no spatial extent -

interest points Regions edge segments

Department of Electrical & Computer Engineering

Ideal Local Features •

Repeatability: found in different condition images of same object or scene. - Invariance - Robustness



Distinctiveness/informativeness: The intensity patterns underlying the detected features should show a lot of variation.



Locality: reduce the probability of occlusion and to allow simple model approximations of the geometric and photometric deformations.



Quantity: sufficiently large number, even on small objects.



Accuracy: The detected features should be accurately localized, both in image location, as with respect to scale and possibly shape.



Efficiency: time, fast, easy for computation

Department of Electrical & Computer Engineering

Discussion on Local Features •

Repeatability: depends on invariance, robustness, quantity.



Distinctiveness v.s. Locality - More local, less information, harder to match - In some case (mosaicing), locality can be scarified



Distinctiveness v.s. Invariance - Degree of freedom of transformation



Distinctiveness v.s. Robustness - Information Loss for robustness - Denoise v.s. Detail

Department of Electrical & Computer Engineering

Compare with other features • Global Features -

Describe content with color histogram Usage: Segmentation, Object recognition Fail in distinguishing foreground and background Image clutter and occlusion are problems

• Image Segments -

Difficult by itself, require much information from image Search for blob, based on texture/color

• Sampled Features -

Exhaustively Sample from subparts with sliding window Solve background problem, not partial occlusion Fixed grid sampling, difficult for invariance. Random sampling, better localization, poor repeatability, not used alone Sampling from edge, good with wiry objects

Department of Electrical & Computer Engineering

Corner Detector – Harris Detector • Distinguish “flat”, “Edge”, “Corner” • Auto-Correlation Matrix, describe gradient distribution of local neighborhood -

Smooth with Gaussian kernel Two eigenvalues indicate image signal change in two direction Large eigenvalue in both means potential corner

• Measure the cornerness

Department of Electrical & Computer Engineering

Corner Detector – Harris Detector

For interest points detection, extract local minimum of cornerness function with non-maximum suppressions

Department of Electrical & Computer Engineering

Example of Harris Detector

Results on rotated image examples Notice T-junctions also be found other than true corners

Department of Electrical & Computer Engineering

Select Feature Detector • Select feature detectors based on image content and category • Do not use more invariance than need. Notice the tradeoff between invariance and distinguish power. • Consider other properties depend on application scenario -

localization accuracy for Camera Calibration or 3D modeling Efficiency for large dataset

Department of Electrical & Computer Engineering

Introduction to SIFT • Problem: Object Recognition in cluttered real world scene • Challenge & Difficulty: Finding image features resist to object variation

• Proposed Method: Scale Invariant Feature Transform(SIFT)

Department of Electrical & Computer Engineering

Invariance • • • •

Illumination Scale Rotation Affine

Department of Electrical & Computer Engineering

Previous Work • Candidate feature types – line segments

– groupings of edges – regions

• Zhang et al – Harris Corner Detection – Detect peaks in local image variation

• Schmid and Mohr – Harris Corner Detection for interesting points – Orientation-invariant vector of derivative-of-Gaussian image measurements

Department of Electrical & Computer Engineering

Motivation & Improvement Limitation of related work: • Examine image only on a single scale • Difficult to extend to other circumstance • Focus on feature detection, overlook the descriptor This work: • Identify key location in scale-space • Selected feature vectors invariant to scaling, stretching, rotation and other variation • Improvement on feature descriptor • Efficient, less than 2 second with clutter and occlusion

Department of Electrical & Computer Engineering

Stage of SIFT Object Recogntion • • • •

Feature Detection Local Image Description Indexing and Matching Model Verification

Department of Electrical & Computer Engineering

Scale Space Proper scaling of objects in new image is unknown Exploring features in different scales is helpful to recognize different objects.

Department of Electrical & Computer Engineering

Difference of Gaussian (DoG) • A = Convolve image with vertical and horizontal 1D Gaussians, 𝜎 = 2 • B = Convolve A with vertical and horizontal 1D Gaussians, 𝜎 = 2 • DOG (Difference of Gaussian) = A – B • Downsample B with bilinear interpolation with pixel spacing of 1.5 (linear combination of 4 adjacent pixels) 𝐷 𝑥, 𝑦, 𝜎 = 𝐺 𝑥, 𝑦, 𝑘𝜎 − 𝐺 𝑥, 𝑦, 𝜎

∗ 𝐼 𝑥, 𝑦 , 𝑘 = 2

Department of Electrical & Computer Engineering

Image Pyramid of DoG A3-B3

B3

G

A3

A2-B2

Downsample B2

G

A2

A1-B1

Downsample

B1

G G

A1

DOG1 DOG Pyramid1

Department of Electrical & Computer Engineering

Pyramid of DoG (Octave)

2k2σ 2kσ 2σ kσ σ David G. Lowe, IJCV 2004

2kσ 2σ kσ

σ

Department of Electrical & Computer Engineering

DoG Example A3

A2

A1 Ashley L. Kapron

B3

DoG3

B2

DoG2

B1

DoG1

Department of Electrical & Computer Engineering

Feature Detection • Find maxima and minima of scale space • For each point on a DOG level: – Compare to 26 neighbors at adjacent level

• Repeat for each DOG level • Key points remains

David G. Lowe, IJCV 2004

Department of Electrical & Computer Engineering

SIFT key stability - Illumination • For all levels, compute – Gradient Magnitude – 𝑀𝑖𝑗 =

(𝐴𝑖𝑗 − 𝐴𝑖+1,𝑗 )2 +(𝐴𝑖𝑗 − 𝐴𝑖,𝑗+1 )2

• Threshold gradient magnitudes: – Remove all key points with MIJ less than 0.1 times the max gradient value

• Motivation: Low contrast is generally less reliable than high for feature points

Department of Electrical & Computer Engineering

SIFT key stability - Orientation • For all levels, compute – Gradient Orientation – 𝑅𝑖𝑗 = 𝑎𝑡𝑎𝑛2(𝐴𝑖𝑗 − 𝐴𝑖−1,𝑗 , 𝐴𝑖𝑗+1 − 𝐴𝑖,𝑗 )

+

Gaussian Smoothed Image Ashley L. Kapron

Gradient Orientation

Gradient Magnitude

Department of Electrical & Computer Engineering

SIFT key stability - Orientation • Gradient magnitude weighted by 2D gaussian

=

* Gradient Magnitude

Ashley L. Kapron

2D Gaussian

Weighted Magnitude

Department of Electrical & Computer Engineering

SIFT key stability - Orientation • Identify peak

Weighted Magnitude

Gradient Orientation

Sum of Weighted Magnitudes

• Assign orientation and sum of magnitude to key point

Peak

Gradient Orientation Ashley L. Kapron

Department of Electrical & Computer Engineering

Example of Key Points

Max/mins from DOG pyramid

Ashley L. Kapron

Filter for illumination

Filter for edge orientation

Department of Electrical & Computer Engineering

Stability Test 78% of the keys survive from rotation, scaling, stretching, change of brightness and contrast, and addition of pixel noise.

Department of Electrical & Computer Engineering

Stage of SIFT Object Recogntion • • • •

Feature Detection Local Image Description Indexing and Matching Model Verification

Department of Electrical & Computer Engineering

Local Image Description • SIFT keys each assigned: – Location

– Scale (analogous to level it was detected) – Orientation (assigned in previous canonical orientation steps)

• Now: Describe local image region invariant to the above transformations

Department of Electrical & Computer Engineering

SIFT Key Example

Department of Electrical & Computer Engineering

Local Image Description For each key point: •

Identify 8x8 neighborhood (from DOG level it was detected)



Align orientation to x-axis (subtracted by the orientation of key points)

Department of Electrical & Computer Engineering

Local Image Description •

Calculate gradient magnitude and orientation map and weight by Gaussian

Department of Electrical & Computer Engineering

Local Image Description •

Calculate gradient magnitude and orientation map and weight by Gaussian



Sum the weighted gradient magnitude at near direction. Calculate histogram of each 4x4 region. 8 bins for gradient orientation.

Department of Electrical & Computer Engineering

Local Image Description •

Calculate gradient magnitude and orientation map and weight by Gaussian



Sum the weighted gradient magnitude at near direction.Calculate histogram of each 4x4 region. 8 bins for gradient orientation.



This histogram array is the image descriptor.

Ashley L. Kapron

Department of Electrical & Computer Engineering

Orientations Numbers

David G. Lowe, IJCV 2004

Department of Electrical & Computer Engineering

Stage of SIFT Object Recogntion • • • •

Feature Detection Local Image Description Indexing and Matching Model Verification

Department of Electrical & Computer Engineering

Image Matching

Database

Input Image

Department of Electrical & Computer Engineering

Image Matching • Find all key points identified in target image – Each key point will have 2D location, scale and orientation, as well as invariant descriptor vector

• For each key point, search similar descriptor vectors in reference image database. – Descriptor vector may match more than one reference pose database

– The key point “votes” for pose(s)

• Use best-bin-first algorithm

Department of Electrical & Computer Engineering

Hough Transform Clustering • Create 4D Hough Transform (HT) Space for each reference pose 1. Orientation bin = 30° 2. Scale bin = 2 3. X location bin = 0.25*ref image width 4. Y location bin = 0.25*ref image height

• If key point “votes” for reference pose, count the vote which gives estimate of location and pose • Keep list of which key points vote for a bin

Department of Electrical & Computer Engineering

Stage of SIFT Object Recogntion • • • •

Feature Detection Local Image Description Indexing and Matching Model Verification

Department of Electrical & Computer Engineering

Verification • Identify bins with largest votes (must have at least 3). • Using list of key points which voted for a cell, compute affine transformation parameters (M, T)

• Use corresponding coordinates of reference model (x,y) and target image (u,v). • If more than three points, solve in least-squares sense

Department of Electrical & Computer Engineering

Remove Outliers • After applying affine transformation to key points, determine difference between calculated location and actual target image location • Candidate must meet: – Orientation within 15° – Scale changed within 2 – X,Y location within 0.2*model size

• Repeat least-squares solution until no points are removed • Fewer than 3 points remain lead to rejection

Department of Electrical & Computer Engineering

Object Recognition Example

Department of Electrical & Computer Engineering

Object Recognition Example

Department of Electrical & Computer Engineering

Pros & Cons • Numerous keys can be generated from scaling space for even small objects • Partial occlusion/image clutter can be dealt with • Object models can undergo limited affine projection.

• Individual features can be matched to a large database of objects • Robust recognition can be performed fast • Fully affine transformations require additional steps • Method was not evaluated by large data set with various case.

Department of Electrical & Computer Engineering

Future Works • Deeper exploration in scale space with octave of incremental Gaussian filtering • Sub-pixel localization with 3D curve fitting • Filter edge and low contrast points • More?

Department of Electrical & Computer Engineering

Questions?