slides - Stanford NLP Group

Report 2 Downloads 305 Views
Model Combination for Event Extraction in BioNLP 2011 Sebastian Riedel,a David McClosky,b Mihai Surdeanu,b Andrew McCallum,a and Christopher D. Manningb a University

of Massachusetts at Amherst and b Stanford University

BioNLP 2011 — June 24th, 2011 1

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

2

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

I

Netflix challenge: winning entry relies on model combination (Bennett et al., 2007)

2

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

I

Netflix challenge: winning entry relies on model combination (Bennett et al., 2007)

I

CoNLL 2007: winning entry relies on model combination (Hall et al., 2007)

2

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

I

Netflix challenge: winning entry relies on model combination (Bennett et al., 2007)

I

CoNLL 2007: winning entry relies on model combination (Hall et al., 2007)

I

CoNLL 2003: winning entry relies on model combination (Florian et al., 2003)

2

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

I

Netflix challenge: winning entry relies on model combination (Bennett et al., 2007)

I

CoNLL 2007: winning entry relies on model combination (Hall et al., 2007)

I

CoNLL 2003: winning entry relies on model combination (Florian et al., 2003)

I

etc. etc. etc.

2

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

I

Netflix challenge: winning entry relies on model combination (Bennett et al., 2007)

I

CoNLL 2007: winning entry relies on model combination (Hall et al., 2007)

I

CoNLL 2003: winning entry relies on model combination (Florian et al., 2003)

I

etc. etc. etc.

I

Most of these use stacking—so do we

2

Previous work / Motivation I

BioNLP 2009: model combination led to 4% F1 improvement over best individual system (Kim et al., 2009)

I

Netflix challenge: winning entry relies on model combination (Bennett et al., 2007)

I

CoNLL 2007: winning entry relies on model combination (Hall et al., 2007)

I

CoNLL 2003: winning entry relies on model combination (Florian et al., 2003)

I

etc. etc. etc.

I

Most of these use stacking—so do we

I

Stacked model’s output as features in stacking model

2

Stacking Model Maximize

s (e, a, b) =

� i

under global constrains

si (ei ) +



si,j (ai,j ) +

sp,q (bp,q )

p,q

i,j

�  s( Binding )= -0.1 1  0     s( Regulation )=  3.2   si (ei ) =       s( Phosphor. )= 0.5 1  



 0.2 −1.2      −2.3  1.5

0 of TRAF2 � inhibits binding � to the CD40 domain � phosphorylation � =) s (ei ) + s (ai,j ) = s (ei ) + s (e,sa) (ai,j

i

i

i,j

i,j

� s( None )=-2.2 � � s( Theme s( )=0.2 s (e, a, b) = s (e ) + s (a ) + s (bp,q ) i i,j i

s(

Cause

)=1.3 i,j

p,q

s(

)=-2.2 )=3.2

3

Scores

s (e, a, b) =

� i

� � � � � s (e, a, b) = s (e ) + s (a ) + sp,q si (ei ) + + i sp,q (bi,j p,q ) i,j i,j (ai,j s(sRegulation )=i) 3.2 i

i,j

p,q

i,j

p,q

�   −2.1 e = Reg e= Reg         si (ei ) =   (ei ) =     ee= and∧ww1.3 ==”inhibit”e = Reg ∧ w = 1 1.3 Reg 1=Reg 

1

� 

�  1  1      (ei ) =    s  i  1 

−2.1



1 Reg e=

 �   −2.1 e= Reg −2.1 1 e = Reg     1.2  e = Reg ∧ y = Reg 1 1.2 e = Reg ∧y =R     (ei ) =        1.3 e1= Reg ∧ w1.3 = e = Reg ∧ w = 4

� � � s (e, a, b) = s (e ) + s (a ) + sp,q (bp,q ) i i,j i i,j Stacked Features i



 (ei ) =  

1 1

�      � 

1  1    (ei ) =     1 

� 

p,q

i,j

−2.1

  

e = Reg

� � � 1.3a, b) eRegulation = Reg s( = )= 3.2 s (e, si ∧ (ewi )= + si,j (ai,j ) + sp,q i

i,j

p,q

 −2.1 e = Reg � e = Reg   1.2  e =1 Reg e = Reg Regy−2.1 ∧ y = Reg = Reg   e=and     si (ei ) =     ee= and ==”inhibit”e = Reg ∧ w = 1=Reg 1.3 Reg ∧ ww1.3 



1 −2.1   1   1.2       si (e    i ) =   =       

�   −2.1 1 e = Reg e = Reg    1 e= Reg 1.2y =Reg e = Reg ∧ y = R  ∧       5

Stacked model

I

Stanford Event Parsing system

I

Recall: Four different decoders: (1st, 2nd-order features) × (projective, non-projective)

I

Only used the parser for stacking (1-best outputs)

I

Different segmentation/tokenization

I

Different trigger detection

6

Performance of individual components

System UMass

F1 54.8

(Genia development section, Task 1)

7

Performance of individual components

System UMass Stanford (1N) Stanford (1P) Stanford (2N) Stanford (2P)

F1 54.8 49.9 49.0 46.5 49.5

(Genia development section, Task 1)

7

Performance of individual components

System UMass Stanford (1N) Stanford (1P) Stanford (2N) Stanford (2P)

F1 54.8 49.9 49.0 46.5 49.5

with reranker — 50.2 49.4 47.9 50.5

(Genia development section, Task 1)

8

Model combination strategies

System UMass Stanford (2P, reranked)

F1 54.8 50.5

(Genia development section, Task 1)

9

Model combination strategies

System UMass Stanford (2P, reranked) Stanford (all, reranked)

F1 54.8 50.5 50.7

(Genia development section, Task 1)

9

Model combination strategies

System UMass Stanford (2P, reranked) Stanford (all, reranked) UMass←2N UMass←1N UMass←1P UMass←2P

F1 54.8 50.5 50.7 54.9 55.6 55.7 55.7

(Genia development section, Task 1)

9

Model combination strategies

System UMass Stanford (2P, reranked) Stanford (all, reranked) UMass←2N UMass←1N UMass←1P UMass←2P UMass←all

F1 54.8 50.5 50.7 54.9 55.6 55.7 55.7 55.9

(Genia development section, Task 1)

9

Model combination strategies

System UMass Stanford (2P, reranked) Stanford (all, reranked) UMass←2N UMass←1N UMass←1P UMass←2P UMass←all (FAUST)

F1 54.8 50.5 50.7 54.9 55.6 55.7 55.7 55.9

(Genia development section, Task 1)

9

Ablation analysis for stacking

System UMass Stanford (2P, reranked) UMass←all

F1 54.8 50.5 55.9

(Genia development section, Task 1)

10

Ablation analysis for stacking

System UMass Stanford (2P, reranked) UMass←all UMass←all (triggers) UMass←all (arguments)

F1 54.8 50.5 55.9 54.9 55.1

(Genia development section, Task 1)

10

Conclusions I

Stacking: easy, effective method of model combination

11

Conclusions I

Stacking: easy, effective method of model combination I

...even if base models differ significantly in performance

11

Conclusions I

Stacking: easy, effective method of model combination I

I

...even if base models differ significantly in performance

Variability in models critical for success

11

Conclusions I

Stacking: easy, effective method of model combination I

...even if base models differ significantly in performance

I

Variability in models critical for success

I

Tree structure best provided by projective decoder

11

Conclusions I

Stacking: easy, effective method of model combination I

...even if base models differ significantly in performance

I

Variability in models critical for success

I

Tree structure best provided by projective decoder I

Incorporated in UMass model via 2P stacking

11

Conclusions I

Stacking: easy, effective method of model combination I

...even if base models differ significantly in performance

I

Variability in models critical for success

I

Tree structure best provided by projective decoder I

I

Incorporated in UMass model via 2P stacking

Future work: Incorporate projectivity constraint directly

Questions? 11

Backup slides

12

� � � s (e, a, b) = s (e ) + s (a ) + sp,q (bp,q ) i i,j i i,j Stacked Features i



 (ei ) =  

1 1

�      � 

1  1    (ei ) =     1 

� 

p,q

i,j

−2.1

  

e = Reg

� � � 1.3a, b) eRegulation = Reg s( = )= 3.2 s (e, si ∧ (ewi )= + si,j (ai,j ) + sp,q i

i,j

p,q

 −2.1 e = Reg � e = Reg   1.2  e =1 Reg e = Reg Regy−2.1 ∧ y = Reg = Reg   e=and     si (ei ) =     ee= and ==”inhibit”e = Reg ∧ w = 1=Reg 1.3 Reg ∧ ww1.3 



1 −2.1   1   1.2       si (e    i ) =   =       

�   −2.1 1 e = Reg e = Reg    1 e= Reg 1.2y =Reg e = Reg ∧ y = R  ∧       13

1

e = Reg ∧ w =

1.3

Conjoined Features

�   −2.1 1 e = Reg  1   1.2  e = Reg ∧ y = Reg     si (ei ) =         �e = Reg ∧ � � 1 1.3 w= Regulation )= 3.2 s( = s (e, a, b) si (ei ) + si,j (ai,j ) + sp,q 



� 

1  1      (ei ) =      1  1 

  i) = 

1

�       

i

i,j

p,q

 −2.1 e = Reg e =Reg �  −2.1  1.2  e =1 Reg and e = Reg ∧ y = Reg e = Reg y = Reg        si (ei ) =        1.3  e =1 Rege and w1.3 =∧ ”inhibit” = Reg w = e = Reg ∧ w = ee==Reg and 3.2 Reg ∧ ww==”inhibit” and y∧=yReg = Reg  �   −2.1 1 e = Reg   1   1.2     e = 0.2 Reg e = Reg ∧ y = R si (e ) =     i        14

Results on Genia

System UMass Stanford 1N Stanford 1P Stanford 2N Stanford 2P UMass←All UMass←1N UMass←1P UMass←2N UMass←2P UMass←All (triggers) UMass←All (arguments)

Simple 74.7 71.4 70.8 69.1 72.0 76.9 76.4 75.8 74.9 75.7 76.4 76.1

Binding 47.7 38.6 35.9 35.0 36.2 43.5 45.1 43.1 42.8 46.0 41.2 41.7

Regulation 42.8 32.8 31.1 27.8 32.2 44.0 43.8 44.6 43.8 44.1 43.1 43.6

Total 54.8 47.8 46.5 44.3 47.4 55.9 55.6 55.7 54.9 55.7 54.9 55.1

15

Results on Infectious Diseases System UMass Stanford 1N Stanford 1P Stanford 2N Stanford 2P UMass←All UMass←1N UMass←1P UMass←2N UMass←2P UMass←2P (conjoined)

Rec 46.2 43.1 40.8 41.6 42.8 47.6 45.8 47.6 45.4 49.1 48.0

Prec 51.1 49.1 46.7 53.9 48.1 54.3 51.6 52.8 52.4 52.6 53.2

F1 48.5 45.9 43.5 46.9 45.3 50.7 48.5 50.0 48.6 50.7 50.4

16

Results on test

(Task 1) (Task 2) EPI (Full task) EPI (Core task) ID (Full task) ID (Core task) GE

GE

Rec 48.5 43.9 28.1 57.0 46.9 49.5

UMass Prec 64.1 60.9 41.6 73.3 62.0 62.1

F1 55.2 51.0 33.5 64.2 53.4 55.1

UMass←All Rec Prec F1 49.4 64.8 56.0 46.7 63.8 53.9 28.9 44.5 35.0 59.9 80.3 68.6 48.0 66.0 55.6 50.6 66.1 57.3

17