Targeted Retrieval of Gene Expression ... - Semantic Scholar

Report 2 Downloads 142 Views
Targeted Retrieval of Gene Expression Measurements Using Regulatory Models ¨ ´ Elisabeth Georgii, Jarkko Salojarvi, Mikael Brosche, ¨ Jaakko Kangasjarvi, Samuel Kaski Funding: Tekes, Academy of Finland, PASCAL2 ()

MLSB 2012

July 22, 2012

0/0

Motivation I I I

Large repositories of measurement data =⇒ use them! Goal: automated search for relevant experiments Considered task: given a gene expression profile, find “similar” profiles from a database !"#$%& '$()*#&

'$()*#& +,-,.,/#&

000&

MLSB 2012 09.09.2012 2/14

What is a suitable similarity measure? I

Shared keywords in the annotation (= knowledge-driven) (+) reliable, state of the art; (-) excludes new findings (Zhu et al., Bioinformatics, 2008)

I

Correlation of profiles (= data-driven) (+) easy to compute; (-) ignores gene dependencies (Engreitz et al., BMC Bioinformatics, 2010)

I

Model-based similarity measure (= data-driven) (+) learns from database; (-) computationally expensive (Caldas et al., Bioinformatics, 2009, 2012)

MLSB 2012 09.09.2012 3/14

This approach: Model-based targeted retrieval

I

Two main aspects

I

Targeted focus: guide the model by genes of interest e.g. genes known to be related to a certain disease → adapt to users’ needs, reduce computational effort

I

Similarity based on gene regulatory network models: potential similarity of conditions at detailed biological level → improved interpretability by network activation patterns

MLSB 2012 09.09.2012 4/14

System for targeted retrieval USER INTERACTION

METHODS

DATABASE

Genes of interest

Query: measurement of interest

Ranked list of compendium measurements

I I

Targeted regulatory model

Ranking by model-based similarity: Fisher kernel

Gene expression measurement compendium

First step: learn regulatory model for user-provided genes Second step: retrieve measurements related to a query MLSB 2012 09.09.2012 5/14

Targeted gene expression model I

Conditional model: expression of target genes, given expression of other genes P(XT |X−T )

I

Pseudo-likelihood approach: Y ˜ T |X−T ) = P(X P(Xj |X−{j} ; θj ) j∈T

I

i.e., independent model for each target gene Gene-specific model: Gaussian linear regression model Xj = X−{j} β + ,

 ∼ N (0, σ 2 )

sparse β estimate by L1 -norm regularization → target gene neighbors MLSB 2012 09.09.2012 6/14

Model-based similarity measure I

Fisher score representation of data point: sθˆ(x (i) ): gradient of its log-likelihood at learned model parameters → direction in which to update the parameters after adding x (i) to the dataset (→ summary of dataset D + x (i) )

I

Simple Fisher kernel:

(Jaakkola and Haussler, NIPS 1998: using HMMs in classifiers)

Kθˆ(x (i1 ) , x (i2 ) ) = sθˆ(x (i1 ) )T sθˆ(x (i2 ) ) → similarity of datasets D + x (i1 ) and D + x (i2 ) regarding model-based summary statistics I

Parameters of biological interest in our model: coefficients of target gene neighbors

MLSB 2012 09.09.2012 7/14

Case study on plant osmotic stress I I I I

Osmotic stress: dehydration of plant Causes: drought, salt, or cold conditions Relevance: abiotic stress forcold: crop productivity Regulatoryimportant network of stress responses to drought, salt, and specificity and cross-talk. Cellular response:

Boudsocq M , Laurière C Plant Physiol. 2005;138:1185-1194

©2005 by American Society of Plant Biologists

MLSB 2012 09.09.2012 8/14

Case study on plant stress I

Data: 141 differential expression profiles from 38 A. thaliana stress datasets, 6658 diff. expr. genes

I

Task: retrieval of osmotic stress experiments (31 profiles from 5 datasets, ≥ 6 profiles per dataset) Target gene lists from two sources:

I

I

10 water-stress related genes (TF DREB2A + targets) (Sakuma et al., PNAS, 2006)

I

41 genes annotated as ‘drought-salt-cold’ (STIFDB, Shameer et al., Int J Plant Genomics, 2009)

I

I

overlap: 4 genes

Experimental setup: I I

One left-out dataset as queries (cross-validation) Unsupervised model training with all other profiles (including osmotic and non-osmotic)

MLSB 2012 09.09.2012 9/14

Precision-recall analysis Target list: STIFDB 1.0

1.0

Target list: Sakuma-water

0.6

Precision

0.8

REx method Corr. (all genes) Corr. (predictors) Eucl. (all genes) Eucl. (predictors)

0.2

0.4

0.6 0.2

0.4

Precision

0.8

REx method Corr. (all genes) Corr. (predictors) Eucl. (all genes) Eucl. (predictors)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

Recall

I

0.4

0.6

0.8

1.0

Recall

Modeling targeted gene relationships helps MLSB 2012 09.09.2012 10/14

abi1td O3 6h 2 H2O2 2 Fig. 1. Osmotic stress retrieval performance ofUV-B several327nm methods 1hfor three 2 different gene lists of interest (see text for details). For the meaning of method abbreviations, see Table 1. B.graminis 1 ABA Exp2 1

Osmotic stress network analysis RD29A-Centered Network

GASA3

RD29B

AT2G21820

AT2G37870

AT3G17520

LEA4-5

AT3G02480

AFP3

HVA22D

TSPO

SPSA2

RD29A

GolS2

AT2G12400

LEA7

ZAT12

NAC102

DGK1

AT1G16850

LTI45

XERO2

RD17

COR15B

COR15A

DREB2A-Centered Network AT3G62260

STO1

ERD14

AT2G23120

LSR3

AT2G46140-Centered Network AT2G41820

HSP101

AT2G20560

DREB2A

WAKL7

AT1G22410

NEK5

SGP2

AT4G28550

AT2G46140

Table 3: Top-ranked predictor-target pairs in a bootstrapping experiment (AGI codes of gene symbols: COR15A: AT2G42540, COR15B: AT2G42530, DREB2A: AT5G05410, ERD14: AT1G76180, LEA7: AT1G52690, LSR3: AT1G01470, LTI45: AT1G20450, RD17: AT1G20440, RD29A: AT5G52310, RD29B: AT5G52300, XERO2: Fig. 2. OsmoticIstress network learned around Sakuma-water targets (box-shaped). Arrows point from predictors to targets. The dashed edge indicates a AT3G50970, ZAT12: AT5G59820). negative relationship. Black edges are increased in weight for a majority of osmotic stress samples, compared to the background model. See text for details.

Top edges in bootstrapping Target RD17

Predictor LTI45

LSR3 LSR3 AT3G17520 RD17 DREB2A DREB2A

XERO2 ERD14 RD29B ERD14 AT3G62260 ZAT12

Stress-related annotation of predictor? yes (also included in STIFDB) We checked the significance of the inferred relationships in a (also included in STIFDB) experiment (see Suppl. Material). The most preva(alsobootstrapping included in Sakuma-water) (alsolent included in STIFDB) relationships were well supported by functional annotations. (alsoInincluded Sakuma-water) addition,inconcordant expression between the transcription factor (alsoDREB2A included and in Sakuma-water) the predictors AT3G62260 and ZAT12 was validated yes (also included in Sakuma-water) in two independent datasets, giving rise to interesting biologiyes (also included in STIFDB) cal hypotheses (see Suppl. Material). While further experimental yes (response to water deprivation) yes (also included in STIFDB) – (protein phosphatase 2C) yes (involved in cold acclimation)

DREB2A as regulator in osmotic stress and heat shock responseyes is COR15A COR15B well established (Yoshida et al., 2008). The smallest network, centeXERO2 LSR3 yes RD29Asuggesting LTI45 yes red around AT2G46140, is a novel finding that DREB2A LEA7to pathogen yes and some of its targets also have aAT3G02480 role in responses AT1G52690 AT3G02480 yes infection and pathogen elicitors.

5

MLSB 2012 09.09.2012 11/14

0

100

Cou

Suppl. Fig. 4a shows test data (averaged acro AT2G46140 get list. Target lists of Network of size 10. The exact c the cross-validation fol (in decreasing order): R LSR3. Remarkably, a very well, except in the the selected gene (respo ted subsets of targets ca performance decreases target list (see Suppl. Fi

Model-based comparison of measurements Column Z−Score 0

−5

DREB2A Network

5

Column Z−Score

Osmotic Osmotic Stress Stress Samples Samples

DREB2A Network

RD29A Network Osmotic

RD29A Network

AT2G46140 Network

Stress Samples

Other Samples

Salt root 3h Salt root 6h Salt root 12h Salt root 24h Salt root 30min Salt root 1h Cold shoot 6h Cold shoot 3h Osmotic shoot 30min Cold shoot 1h Drought shoot 6h Drought shoot 12h Salt shoot 30min Drought shoot 24h Osmotic shoot 1h Salt shoot 1h Drought shoot 15min Cold shoot 30min Drought shoot 3h Drought shoot 30min Drought shoot 1h Salt shoot 3h Osmotic shoot 3h Osmotic shoot 6h Salt shoot 6h Salt shoot 12h Salt shoot 24h Osmotic shoot 12h Osmotic shoot 24h Cold shoot 12h Cold shoot 24h ABA_3h B.graminis NPP1_4h HrpZ_4h B.cinerea_48h UV−B 295nm 6h UV−B 305nm 6h UV−B 327nm 6h abi1td_O3_6h UV−B 327nm 1h ABA_Exp2 BTH_Exp1 BTH_Exp2 Col senescence NahG ein2 Flg22_4h AgNO3 Norflurazon ET 3hr Avr_48h E.cichoracearum 17days leaf12 Vir_24h Avr_24h Vir_48h Ibuprofen ACC 3hr UV−B 305nm 1h jin1−9 shoot_hypoxia_pKAT1 MeJA Vir_8h Vir_4h UV−B 295nm 1h jin1−9 MeJA shoot_hypoxia_pGL2 shoot_hypoxia_pRBCS Avr_16h Vir_16h abi1td ACC 30min ACC 1hr shoot_hypoxia_total shoot_hypoxia_p35S Syringolin_1h ABA_30min GST_4h LPS_4h Mg_4h PQ_6h ahg2 ahg2sid2 17days leaf10 coi1 Rotenone_12h Rotenone_3h PQ_12h AVG MeJA_30min MeJA_2h MeJA_6h Ozone abi1td_O3_3h PQ_Exp2 root_hypoxia_pSHR root_hypoxia_pRPL11C root_hypoxia_pPEP roottip_hypoxia_p35S root_hypoxia_pWOL root_hypoxia_pSUC2 PQ_1h root_hypoxia_p35S LPS_1h GST_1h Mg_1h PQ_30min ANAC92 PQ_3h ANAC59 Avr_4h Avr_8h shoot_hypoxia_pCER5 BTH_npr1_Exp2 root_hypoxia_pSCR root_hypoxia_pCO2 roottip_hypoxia_total root_hypoxia_total root_hypoxia_pGL2 roottip_hypoxia_pSHR roottip_hypoxia_pWOL OPDA PQ_24h shoot_hypoxia_pSUC2 ABA_1h ZAT12 NPP1_1h Flg22_1h HrpZ_1h B.cinerea_18h roottip_hypoxia_pSCR Syringolin_12h H2O2 Phytoprostane ctr1 35days 17days leaf2 17days leaf4 17days leaf6 17days leaf8

Salt root 3h Salt root 6h Salt root 12h Salt root 24h Salt root 30min Salt root 1h Cold shoot 6h Cold shoot 3h Osmotic shoot 30min Cold shoot 1h Drought shoot 6h Drought shoot 12h Salt shoot 30min Drought shoot 24h Osmotic shoot 1h Salt shoot 1h Drought shoot 15min Cold shoot 30min Drought shoot 3h Drought shoot 30min Drought shoot 1h Salt shoot 3h Osmotic shoot 3h Osmotic shoot 6h Salt shoot 6h Salt shoot 12h Salt shoot 24h Osmotic shoot 12h Osmotic shoot 24h Cold shoot 12h Cold shoot 24h ABA_3h B.graminis NPP1_4h HrpZ_4h B.cinerea_48h UV−B 295nm 6h UV−B 305nm 6h UV−B 327nm 6h abi1td_O3_6h UV−B 327nm 1h ABA_Exp2 BTH_Exp1 BTH_Exp2 Col senescence NahG ein2 Flg22_4h AgNO3 Norflurazon ET 3hr Avr_48h E.cichoracearum 17days leaf12 Vir_24h Avr_24h Vir_48h Ibuprofen ACC 3hr UV−B 305nm 1h jin1−9 shoot_hypoxia_pKAT1 MeJA Vir_8h Vir_4h UV−B 295nm 1h jin1−9 MeJA shoot_hypoxia_pGL2 shoot_hypoxia_pRBCS Avr_16h Vir_16h abi1td ACC 30min ACC 1hr shoot_hypoxia_total shoot_hypoxia_p35S Syringolin_1h ABA_30min GST_4h LPS_4h Mg_4h PQ_6h ahg2 ahg2sid2 17days leaf10 coi1 Rotenone_12h Rotenone_3h PQ_12h AVG MeJA_30min MeJA_2h MeJA_6h Ozone abi1td_O3_3h PQ_Exp2 root_hypoxia_pSHR root_hypoxia_pRPL11C root_hypoxia_pPEP roottip_hypoxia_p35S root_hypoxia_pWOL root_hypoxia_pSUC2 PQ_1h root_hypoxia_p35S LPS_1h GST_1h Mg_1h PQ_30min ANAC92 PQ_3h ANAC59 Avr_4h Avr_8h shoot_hypoxia_pCER5 BTH_npr1_Exp2 root_hypoxia_pSCR root_hypoxia_pCO2 roottip_hypoxia_total root_hypoxia_total root_hypoxia_pGL2 roottip_hypoxia_pSHR roottip_hypoxia_pWOL OPDA PQ_24h shoot_hypoxia_pSUC2 ABA_1h ZAT12 NPP1_1h Flg22_1h HrpZ_1h B.cinerea_18h roottip_hypoxia_pSCR Syringolin_12h H2O2 Phytoprostane ctr1 35days 17days leaf2 17days leaf4 17days leaf6 17days leaf8

Robustness Against Nui whether errors in the tar this, we added randoml chosen among the other ferential expression in a We did fifty repeats on Fig. 4c shows the aver larger number of added in precision. However, that a reasonable numb well as long as the targe

HSP101−DREB2A AT2G20560−DREB2A AT2G41820−DREB2A AT3G62260−DREB2A ZAT12−DREB2A NAC102−DREB2A TSPO−LEA7 AT3G02480−LEA7 LEA4−5−LEA7 AT2G21820−AT3G17520 AT2G37870−AT3G17520 GASA3−AT3G17520 RD29B−AT3G17520 LSR3−XERO2 AT1G16850−XERO2 DGK1−XERO2 ERD14−LSR3 AT2G23120−LSR3 XERO2−LSR3 AT2G12400−COR15A COR15B−COR15A XERO2−COR15A RD29A−COR15A LTI45−RD29A GolS2−RD29A STO1−RD29A AFP3−RD29A HVA22D−RD29A LEA4−5−RD29A SPSA2−RD29A LEA7−AT3G02480 AT2G37870−AT3G02480 LEA4−5−AT3G02480 LTI45−RD17 ERD14−RD17 WAKL7−AT2G46140 AT1G22410−AT2G46140 NEK5−AT2G46140 SGP2−AT2G46140 AT4G28550−AT2G46140

0

5

HSP101−DREB2A AT2G20560−DREB2A AT2G41820−DREB2A AT3G62260−DREB2A ZAT12−DREB2A NAC102−DREB2A TSPO−LEA7 AT3G02480−LEA7 LEA4−5−LEA7 AT2G21820−AT3G17520 AT2G37870−AT3G17520 GASA3−AT3G17520 RD29B−AT3G17520 LSR3−XERO2 AT1G16850−XERO2 DGK1−XERO2 ERD14−LSR3 AT2G23120−LSR3 XERO2−LSR3 AT2G12400−COR15A COR15B−COR15A XERO2−COR15A RD29A−COR15A LTI45−RD29A GolS2−RD29A STO1−RD29A AFP3−RD29A HVA22D−RD29A LEA4−5−RD29A SPSA2−RD29A LEA7−AT3G02480 AT2G37870−AT3G02480 LEA4−5−AT3G02480 LTI45−RD17 ERD14−RD17 WAKL7−AT2G46140 AT1G22410−AT2G46140 NEK5−AT2G46140 SGP2−AT2G46140 AT4G28550−AT2G46140

−5

0

4 DISCUSSION

MLSB 2012 We introduced a novel 09.09.2012 12/14 val of gene expression

Discriminative target genes !

!

!

b) Si 1.0

a) Subset selection of size k Test performance of optimal subsets 1.0

I

10 3 1

!

!

!

!

!

0.8

0.8

!

0.4

0.6

Precision

0.6

!

0.4

Precision

!

!

!

!

0.0

0.2

0.4

0.6

0.8

!

1.0

0.2

0.2

!

0.0

0.2

Recall

1.0

I

c) Addition of (responsive nuisance genes to dehydration) Best subset of size 1: RD29A Best subset of size 3: RD29A, LEA7, COR15A !

!

!

!

0.8

I

!

+0% +10% +20% +50% +80% +100%

MLSB 2012 09.09.2012 13/14

Discussion I

Summary: targeted retrieval using regulatory model

I

Purpose: investigating specific commonalities between biological conditions based on (putative) gene relationships

I

Efficiency: gene-specific models can be pre-computed

I

Open questions: I

Given promising performance with simple model, what is the most suitable model for retrieval? (also supervised options, prior knowledge, . . . )

I

Is the conceptual idea feasible for applications with heterogeneous data? (different platforms, species, measurement types, . . . )

MLSB 2012 09.09.2012 14/14

Recommend Documents