Multiple-Instance Learning With Generalized Support Vector Machines Stuart Andrews Department of Computer Science Brown University www.cs.brown.edu/~stu
Joint work with:
Ioannis Tsochantaridis & Thomas Hofmann
1
Brown University 05/12/2005
Multiple Instance Learning (MIL) q
Informal Definition: "Classification problem where labels are not directly associated with patterns, but with sets of patterns." § Incomplete association (ambiguity) between patterns & labels § Less information than in supervised classification, more than in unsupervised learning: semi-supervised learning Training sample: {( ,-1), ( ,-1), ( ,1), ( 1)}
2
AAAI Confernece, July 28th-August 31th, 2002
Multiple Instance Learning (MIL) q q
Semantics: a set or bag is a member of a concept, if it contains a pattern which is a member Asymmetry: one positive patterns makes a bag positive, negative bags only contain negative patterns true pattern level discriminant
+ _ Training sample: {( ,-1), ( ,-1), ( ,1), ( 1)}
3
AAAI Confernece, July 28th-August 31th, 2002
Applications of MIL q
Drug design: § Predict efficiency of a drug, problem: different conformations of the same molecule. § Bag = molecule, pattern = conformation
q
Content-based image retrieval: § Image annotations reference objects in the image, but are typically not associated with particular parts of the image. § Bag = image, pattern = region or blob
q
Text categorization: § Documents may be categorized/filtered based on a relevant passage. § Bag = document, pattern = passage or paragraph
4
AAAI Confernece, July 28th-August 31th, 2002
Applications of MIL Drug Design § At any time, a molecule may have one of many different low enery conformations, or shapes
§ Shape plays an important role in chemical reactions, however such events are difficult to observe or measure
!! !!
!!
§ Classification by shape can help predict chemical reactions
5
AAAI Confernece, July 28th-August 31th, 2002
Applications of MIL Drug Design q
Approach: Estimate molecule conformations and learn from experimentally tested molecules
{(
, +/-1)}i Conformations of ith molecule
6
AAAI Confernece, July 28th-August 31th, 2002
Applications of MIL Content-Based Image Retrieval § From text to images § Support of a closed vocabulary
(at
least several thousand keywords)
§ Ultimately: richer types of text queries
(“a tiger next to a tree”, “a tiger looking to the right”, “a lazy tiger”, …)
tiger
7
AAAI Confernece, July 28th-August 31th, 2002
Applications of MIL Content-Based Image Retrieval q
Approach: Automatic image indexing/classification starting from a seed set of annotated images
Hand labeled training set of “Tigers” Automatically annotated
8
…
q Ultimately: Using only annotations from text “surrounding” an image - for example, on the WWW AAAI Confernece, July 28th-August 31th, 2002
Applications of MIL Text Categorization Multiple Instance Learning
Kernel Methods
Support Vector Machines
9
AAAI Confernece, July 28th-August 31th, 2002
MIL – Previous Work q
Dietterich, Lathrop & Lozano-Perez 1997 § Concepts modeled by axis parallel rectangles (APR) § Explicit feature selection § Good accuracy on drug prediction, but "custom-built" solution
q
Maron & Lozano-Perez 1999 § Diverse density (DD): circular region in pattern space close to examples from many positive bags and far from most negative bags
q
Zhang & Goldman 2002 § EM-DD: efficient extension of DD that searches for circular region after nominating examples to represent each bag
q
Gärtner, Flach, Kowalczyk & Smola 2002 § MI-Kernels: kernels defined at the bag level used to separate bags
10
q
Our Contributions: § Generalize Support Vector Machine learning to MIL July 28th-August 31th, 2002 AAAI Confernece,
MIL Definitions Input patterns x1 , K , x n ∈ R d grouped into bags B1 , K , B m with B I = {x i : i ∈ I } for given index sets I ⊆ {1, K , n} Labels YI = {−1, + 1} are associated with bags (B I , YI ) True pattern level labels yi are only indirectly accessble through the bag labels and the constraint YI = max yi i∈I
The goal is to induce a classifier f : X → Y f : X → R is called MI - separating w.r.t. a multiple instance data set if sgn max f(x i ) = YI for all bags B i i∈I
11
AAAI Confernece, July 28th-August 31th, 2002
Max. Pattern Margin Formulation q Standard SVM formulation with additional constraints on unknown labels y(mixed i integer problem) § Given an assignment to the unknown variables, the margin of every instance has influence on the solution
12
AAAI Confernece, July 28th-August 31th, 2002
Max. Pattern Margin Formulation q Primal form - joint optimization over labels and hyperplane:
1 2 mi - SVM : min min w + C ∑ ξ i s.t. { yi } w ,b 2 i
Slack variable constraints
yi ( w, x i + b )≥ 1 − ξ i
Bag constraints
ξi ≥ 0 YI = max yi
Unknown integer variables
yi ∈ {−1, + 1}
13
i∈I
∀i
AAAI Confernece, July 28th-August 31th, 2002
Max. Bag Margin Formulation q Alternative bag-centered formulation q Define functional margin of a bag as:
γ I = YI max( w, x i + b ) i∈I
q Only the most positive instance, or witness x , s (I ) matters in determining the bag margin
14
AAAI Confernece, July 28th-August 31th, 2002
Max. Bag Margin Formulation q Primal form - joint optimization over bag witnesses and hyperplane:
1 21 2 min min w∑+ξC ξI MI - SVM : min w +C s.t. ∑ I {s w(,Ib)} 2 w ,b 2 I I Bag and slack variable constraints
YI max( w, x i + b )≥ 1 − ξ I
Re-written for negative bags
− w, x i − b ≥ 1 − ξ I ∀I
And positive bags
15
i∈I
ξI ≥ 0
∀I
w, x s (I ) + b ≥ 1 − ξ I
∀I
AAAI Confernece, July 28th-August 31th, 2002
MIL-SVM Optimization q
Solve approximately using the following general alternating optimization scheme:
q
Loop until integer variables have converged: § For given integer variables (labels or bag-witnesses), solve SVM-QP to find optimal discriminant § For given discriminant, update integer variables in a way that (locally) minimizes the objective
q
16
Problems may be relaxed and EM-like update applied
AAAI Confernece, July 28th-August 31th, 2002
Synthetic Data Set in 2D (One)
17
AAAI Confernece, July 28th-August 31th, 2002
Synthetic Data Set in 2D (Two)
18
AAAI Confernece, July 28th-August 31th, 2002
Result on Synthetic Data Sets q
Accuracy on bag (training) and pattern level (testing): 120 100 80
SVM SVM-MIL SVM*
60 40 20 0 Bag AccPat AccBag AccPat Acc
19
Example One
Example Two AAAI Confernece, July 28th-August 31th, 2002
MUSK Molecule Representation q
Molecule § A molecular sample may be thought of as a bag containing multiple conformations of the molecule
q
Representation § Individual conformations are described by surface shape descriptors
q
Datasets § § § §
20
MUSK1: 92 bags, 476 instances, 166 features MUSK2: 102 bags, 6600 instances, 166 features Molecules labeled as having odor, or not By Dietterich, Lathrop and Lozano-Perez AAAI Confernece, July 28th-August 31th, 2002
Results on MUSK Data Set ≅ 100 bags ≅ 476/6600 instances ≅ 166 features
95 90
EM-DD DD MI-NN IAPR mi-SVM MI-SVM
85 80 75 70
21
MUSK 1
MUSK 2
AAAI Confernece, July 28th-August 31th, 2002
Blobworld Image Representation q
Bags § Each image is treated as a bag of segments § Segments obtained by clustering color and texture attributes using Gaussian mixture model § Used Blobworld code from group at UC Berkeley: [Carson, Belongie, Greenspan, Malik ‘99]
q
Sub-Instance representation § Segments represented by texture, color and shape features
q
22
Dataset § Corel Photo 1M database CD 4 (1000 photos) § 200 bags, 1300 instances, 230 features § Images annotated by categories (“elephant”, “fox”, “tiger”) AAAI Confernece, July 28th-August 31th, 2002
Examples: Blobworld Representation TIGER
x
23
. . . 1 . .
q Color histogram q Texture features q Region shape descriptors
x
. . 2 . . .
AAAI Confernece, July 28th-August 31th, 2002
More examples … SNOW
WOLF
ELEPHANT
24
AAAI Confernece, July 28th-August 31th, 2002
Automatic Image Annotation ≅ 200 bags ≅ 1300 instances ≅ 230 features
100 EM-DD mi-SVM linear mi-SVM poly mi-SVM rbf MI-SVM linear MI-SVM poly MI-SVM rbf
80 60 40 20 0 Elephant Fox
25
Tiger
AAAI Confernece, July 28th-August 31th, 2002
MEDLINE Document Representation q
Bags § Documents are treated as a bag comprised of consecutive, overlapping passages of 50 words (windows)
q
Sub-Instances § Term frequency vectors encode each window
q
Dataset § TREC9/OHSUMED data set § 400 bags, 3300 instances, 6800 features § Documents are annotated by medical subject heading (MeSH) categories (we tested the first 7 pre-test categories that each contained 100 positive bags)
26
AAAI Confernece, July 28th-August 31th, 2002
Text Categorization ≅ 400 bags ≅ 3300 instances ≅ 6800 features
100 80 60 40 20 0 TST1 2
3
EM-DD mi-SVM poly MI-SVM poly
27
4
7
9
10
mi-SVM linear MI-SVM linear MI-SVM rbf AAAI Confernece, July 28th-August 31th, 2002
Contributions
28
q
Presented novel maximum margin formulations of MIL (pattern & bag)
q
Generalized SVM thereby rendering kernel methods for MIL
q
Outlined EM-like optimization heuristics
q
Created new and varied MIL datasets AAAI Confernece, July 28th-August 31th, 2002
Future Work
29
q
Simulated / deterministic annealing
q
Larger problems
q
Rigorous testing and evaluation
q
CBIR application
AAAI Confernece, July 28th-August 31th, 2002
Conclusions
30
q
Competitive results across wide range of problems shows promise of maximum margin formulations
q
Modified optimization techniques should improve classification accuracy
q
Learning is still possible in domains where labeled data is difficult or impossible to obtain AAAI Confernece, July 28th-August 31th, 2002