Online Supplementary Material
External factors accelerate expression divergence between duplicate genes Misook Ha1, Wen-Hsiung Li2 and Z. Jeffrey Chen1 1
Section of Molecular Cell and Developmental Biology and Institute for Cellular and Molecular Biology, One University Station, A-4800, University of Texas, Austin, TX 78712-0159, USA 2 Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL 60637, USA Corresponding author: Chen, Z.J. (
[email protected]). We examined the evolution of expression of duplicate genes in Arabidopsis thaliana, by analyzing 512 data sets of gene expression microarrays and 2022 recent duplicate gene pairs. Expression divergence between gene duplicates is significantly greater in response to environmental stress than to developmental processes. A slow rate of expression divergence during development might offer dosage-dependent selective advantage, whereas rapid expression divergence in response to external changes might accelerate adaptation. Methods Recent and old duplicate genes in A. thaliana
The sequence data and annotation were obtained from the TIGR database (ftp://ftp.tigr.org/pub/data/a_thaliana/ath1/) and NCBI (ftp://ftp.ncbi.nih.gov/genomes/Arabidopsis_thaliana/). Except noted otherwise, we used well-characterized gene duplicates that arose from the most recent WGD event ~20-40 million years ago 1-3. The “recent” and “old” gene duplicates that were used for gene expression analyses are given in online Supplementary Material Table S5 and Table S6, respectively. We excluded the genes that were annotated as pseudogenes or had no detectable expression levels. In addition, 577 genes were excluded because more than one gene was assigned to a single array element that may cause a potential of cross-hybridization 4. We detected expression of 2,022 recent gene duplicates 6 in a total of 21,298 annotated genes. We also analyzed the expression data of 2,573 recent gene duplicates inferred by Bowers et al. (2003) 7 (online Supplementary Material Figure S3 and Table S6), which included 1,798 gene duplicates matching both copies and 277 pairs matching one copy inferred by Blanc et al. (2003) 6 and additional 498 duplicates 5 (online Supplementary Material Table S5 and Table S6). The two sets of duplicate genes were qualitatively similar 5, and the results obtained using the duplicate gene dataset of Blanc et al. (2003) 6 (Figure 1A) were consistent with those for the dataset of Bowers et al. (2003) 7 (online Supplementary Material Figure S3). Analysis of microarray data and detection of up-regulated genes
We obtained the Affymetrix ATH1 data from the AtGenExpress expression atlas at TAIR (http://www.arabidopsis.org/info/expression/ATGenExpress.jsp). We compiled 512 microarray datasets 6 for various conditions, including 79 different developmental stages and 250 datasets for 37 abiotic and biotic treatments. We classified the datasets into three groups: environmental (abiotic and biotic) factors, developmental changes (excluding data obtained in the mutants), and others (online Supplementary Material Table S1). The data in the “others” group were not used in the analysis because it may include both environmental and developmental factors. We tested gene expression divergence affected by developmental and environmental factors separately, using expression data obtained from 63 different developmental stages and 63 sets of treatment and time-course combinations under abiotic or biotic stress (online Supplementary Material Table S1). We obtained expression estimates using GC-RMA method 7. To minimize cross-hybridization 4, we used the individual values obtained from each probe set with a perfect match for t-test 8. Affymetrix detection algorithms in the MAS5 library implemented in R 9 were also used to normalize the data and to estimate expression values, and the background levels and PM/MM ratios were corrected according to the Affymetrix Statistical Algorithms 10. There was no significant difference in the overall results obtained using the two data normalization methods. In each test, the expression data consist of one control (no treatment or a specific tissue) and a series of expression data after the treatments. We used the t-test to determine if the expression of a gene after the treatment (Ga) is greater than that before the treatment (Gb). The null hypothesis was H0 = Ga – Gb ≤0. A gene is considered to be up-regulated if H0 is rejected (P ≤ 0.01) in at least one of the several treatments. Except noted otherwise, the up-regulated genes were used in statistical tests because up-regulation of genes in defensive mechanisms is a general response to various stress responses 11-13 . References 1 Blanc, G., et al. (2003) A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res 13, 137–144 2 Bowers, J.E., et al. (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 3 Vision, T.J., et al. (2000) The origins of genomic duplications in Arabidopsis. Science 290, 2114–2117 4 Millenaar, F.F., et al. (2006) How to decide? Different methods of calculating gene expression from short oligonucleotide array data will give different results. BMC Bioinformatics 7, 137
1
5 6 7 8 9 10 11 12 13 14 15
Blanc, G., and Wolfe, K.H. (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691 Schmid, M., et al. (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37, 501–506 Wu, Z., et al. (2004) Johns Hopkins University Department of Biostatistics Draghici, S., et al. (2006) Reliability and reproducibility issues in DNA microarray measurements. Trends Genet 22, 101–109 Ihaka, R., and Gentleman, R. (1996) A language for data analysis and graphics. J Comput Graphical Statist 5, 299–314 Affymetrix (2002) Statistical Algorithms Description Document. Affymetrix, Inc. Xiong, L., et al. (2002) Cell signaling during cold, drought, and salt stress. Plant Cell 14 Suppl, S165–183 Seki, M., et al. (2001) Monitoring the Expression Pattern of 1300 Arabidopsis Genes under Drought and Cold Stresses by Using a Full-Length cDNA Microarray. Plant Cell 13, 61–72. Chisholm, S.T., et al. (2006) Host-microbe interactions: shaping the evolution of the plant immune response. Cell 124, 803–814 Massey, F.J.J. (1951) The Kolmogorov-Smirnov test of goodness of fit. Journal of the American Statistical Association 46, 68–78 Wilcoxon, F. (1945) Individual comparisons by ranking methods. Biometrics Bulletin 1, 80–83
2
Table S1. List of ATH1 microarray data setsa Treatments
Expression measurements
TAIR accession number
Environmental factors (abiotic stress, 2 replicates) Cold stress Genotixic stress Osmotic stress Salt stress UV-B Wound Droought stress Heat Oxidative stress
Root, Shoot, 0, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.25h, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.25h, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.25h, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment Root, Shoot, 0, 0.5h, 1h, 3h, 6h, 12h, 24h after treatment
ME00325 ME00326 ME00327 ME00328 ME00329 ME00330 ME00338 ME00339 ME00340
Environmental factors (biotic stress, 2 replicates) Botrytis cinerea infection Pseudomonas syringae ES4326 infection Pseudomonas syringae pv. Tomato DC3000 HrpZ GST-NPP1 Flg22 LPS Phytophthora infestans infection Erysiphe orontii infection
Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf
ME00341 ME00353 ME00331 ME00332 ME00332 ME00332 ME00332 ME00342 ME00354
7 days 7 days 7 days 7 days 7 days 7 days 7 days 14 days 17 days 10 days 17 days 17 days 17 days 17 days 17 days 17 days 17 days 17 days 17 days
ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319
21 days 22 days 23 days 35 days 21+ days 21+ days 21+ days 21 days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days 21+ days
ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319
8 wk 8 wk 8 wk 8 wk
ME00319 ME00319 ME00319 ME00319
8 wk
ME00319
Internal factors (developmental stages, 3 replicates) Cotyledons Hypocotyl Roots Shoot apex, vegetative + young leaves Leaves 1 + 2 Shoot apex, vegetative Seedling, green parts Shoot apex, transition (before bolting) Roots Rosette leaf #4, 1 cm long Rosette leaf # 2 Rosette leaf # 4 Rosette leaf # 6 Rosette leaf # 8 Rosette leaf # 10 Rosette leaf # 12 Leaf 7, petiole Leaf 7, proximal half Leaf 7, distal half Developmental drift, entire rosette after transition to flowering, but before bolting The same as the above The same as the above Senescing leaves Cauline leaves Stem, 2nd internode 1st node Shoot apex, inflorescence (after bolting) Flowers stage 9 Flowers stage 10/11 Flower stage 12 Flowers stage 12, sepals Flowers stage 12, petals Flowers stage 12, stamens Flowers stage 12, carpels Flowers stage 15 Stage 15, pedicels Flowers stage 15, sepals Flowers stage 15, petals Flowers stage 15, stamen Stage 15, carpels Siliques, w/ seeds stage 3; mid globular to early heart embryos Siliques, w/ seeds stage 4; early to late heart embryos Siliques, w/ seeds stage 5 Seeds, stage 6, w/o siliques; mid to late torpedo embryos Seeds, stage 7, w/o siliques; late torpedo to early walkingstick embryos
3
Table S1. List of ATH1 microarray data setsa Treatments Seeds, stage 8, w/o siliques; walking-stick to early curled cotyledons embryos Seeds, stage 9, w/o siliques; curled cotyledons to early green cotyledons embryos Seeds, stage 10, w/o siliques; green cotyledons embryos Vegetative rosette Vegetative rosette Vegetative rosette Leaf Flower Root Root, 1x MS agar Root, 1x MS agar, 1% sucrose Seedling, green parts, 1x MS agar Seedling, green parts, 1x MS agar, 1% sucrose Root, 1x MS agar Root, 1x MS agar, 1% sucrose Seedling, green parts, 1x MS agar, 1% sucrose Seedling, green parts, 1x MS agar Pollen
Expression measurements
TAIR accession number
8 wk
ME00319
8 wk 8 wk 7 days 14 days 21 days 15 days 28 days 15 days 8 days 8 days 8 days 8 days 21 days 21 days 21 days 21 days Mature pollen
ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319 ME00319
Table S2a. Number of duplicate genes that were up-regulated in response to abiotic and biotic stresses Stimulus Abiotic stress Cold, Root Cold, Shoot Genotoxic, Root Genotoxic, Shoot Osmotic stress, Root Osmotic stress, Shoot Salt, Root Salt, Shoot UV, Root UV, Shoot Wound, Root Wound, Shoot Drought, Root Drought, Shoot Heat, Root Heat, Shoot Oxidative stress, Root Oxidative stress, Shoot Sulfate deficient, Root Biotic stress b Pseudomonas syringae c NPP1 d HRP Z e Flagellin f LPS g Botrytis cinerea h Phytophthora infestans
Observed number (percentage)
Expected number a (percentage)
P-value
343 (8.48%) 397 (9.82%) 231 (5.71%) 330 (8.2%) 405 %10.0%) 433 (10.7%) 467 (11.5%) 361 (8.9%) 341 (8.4%) 493 (12.2%) 272 (6.7%) 335 (8.3%) 304 (7.5%) 302 (7.5%) 603 (14.9%) 505 (12.5%) 244 (6.0%) 283 (7.0%) 94 (2.3%)
223 (5.4%) 247 (6%) 177 (4.3%) 266 (6.4%) 447 (10.9%) 342 (8.3%) 334 (8.1%) 278 (6.7%) 253 (6.1%) 335 (8.1%) 182 (4.4%) 245 (5.9%) 205 (4.9%) 243 (5.9%) 525 (12.7%) 493 (12.0%) 169 (4.1%) 232 (5.6%) 74 (1.8%)
1.13E-17 1.60E-24 1.31E-05 1.33E-05 0.071 5.48E-08 2.84E-15 4.68E-08 2.78E-09 6.96E-21 1.33E-12 5.84E-10 1.74E-13 2.98E-05 5.56E-05 0.346283116 8.92E-10 0.00020 0.013
723 (17.9%) 626 (15.5%) 593 (14.7%) 518 (12.8%) 378 (9.3%) 2126 (52.6%) 737 (18.2%)
539 (13.1%) 365 (8.9%) 362 (8.8%) 311 (7.5%) 223(5.4%) 1789 (43.6%) 551 (13.4%)
3.50E-19 6.98E-49 4.80E-39 3.52E-36 4.37E-28 7.00E-31 2.72E-19
a
Expected number was calculated using the proportion of up-regulated genes from all annotated genes excluding gene duplicates using the t-test (P ≤ 0.01). χ2-test (d.f. =1) was used to test the difference between observed and expected numbers of gene duplicates.
b
Pseudomonas syringae pv. tomato DC3000, virulent pathogen infection.
c
NPP1, treatment of GST-Necrosis-inducing Phytophthora protein 1, a pathogen derived elicitor.
d
HRP Z, treatment of Hairpin Z, a proteinaceous elicitor of plant hypersensitive responses.
e
FLAGELLIN, treatment of FLAGELLIN, Flg22, P. syringae-derived peptide elicitor of plant defense response.
f
LPS, treatment of LPS, a pathogen derived elicitor constitutively present in the pathogen cell wall capable of inducing a plant host defense response.
g
Treatment of pathogenic fungus, Botrytis cinerea.
h
Treatment of fungus-like pathogen, Phytophthora infestans.
4
Table S2b. Number of duplicate genes that were down-regulated in response to abiotic and biotic stresses Stimulus
Observed number (percentage)
Expected number a (percentage)
P-value
Abiotic stress Cold, Root Cold, Shoot Genotoxic, Root Genotoxic, Shoot Osmotic stress, Root Osmotic stress, Shoot Salt, Root Salt, Shoot UV, Root UV, Shoot Wound, Root Wound, Shoot Drought, Root Drought, Shoot Heat, Root Heat, Shoot Oxidative stress, Root Oxidative stress, Shoot Sulfate deficient, Root
354 (8.8%) 380 (9.4%) 276 (6.8%) 284 (7.0%) 502 (12.4%) 476 (11.8%) 649 (16.1%) 347 (8.6%) 299 (7.4%) 381 (9.4%) 241 (6.0%) 280 (7.0%) 286 (7.1%) 313 (7.7%) 565 (14.0%) 416 (10.3%) 207 (5.1%) 261 (6.5%) 111 (2.7%)
334 (8.1%) 277 (6.7%) 197 (4.8%) 181 (4.4%) 371 (9.0%) 350 (8.5%) 494 (12.0%) 223 (5.4%) 205 (5.0%) 284 (6.9%) 178 (4.3%) 212 (5.2%) 219 (5.3%) 207 (5.0%) 366 (8.9%) 302 (7.4%) 137 (3.3%) 170 (4.1%) 71 (1.7%)
0.18 1.02E-10 7.55E-10 1.90E-15 5.64E-14 8.76E-14 1.00E-14 4.18E-19 2.55E-12 6.17E-10 4.82E-07 2.44E-07 1.50E-06 1.70E-15 1.97E-28 4.08E-12 5.40E-10 1.86E-13 1.88E-06
Biotic stress b Pseudomonas syringae c NPP1 d HRP Z e Flagellin f LPS g Botrytis cinerea h Phytophthora infestans
566 (14.0%) 638 (15.8%) 679 (16.8%) 678 (16.8%) 326 (8.1%) 9 (0.2%) 684 (16.9%)
571 (13.9%) 608 (14.7%) 625 (15.2%) 666 (16.2%) 351(8.6%) 7 (0.2%) 618 (15.0%)
0.83 0.077 0.0053 0.33 0.26 0.47 0.00075
a
Expected number was calculated using the proportion of down-regulated genes from all annotated genes excluding gene duplicates using the t-test (P ≤ 0.01). χ2-test (d.f. =1) was used to test the difference between observed and expected numbers of gene duplicates. b Pseudomonas syringae pv. tomato DC3000, virulent pathogen infection. c NPP1, treatment of GST-Necrosis-inducing Phytophthora protein 1, a pathogen derived elicitor. d HRP Z, treatment of Hairpin Z, a proteinaceous elicitor of plant hypersensitive responses. e FLAGELLIN, treatment of FLAGELLIN, Flg22, P. syringae-derived peptide elicitor of plant defense response. f LPS, treatment of LPS, a pathogen derived elicitor constitutively present in the pathogen cell wall capable of inducing a plant host defense response; g treatment of pathogenic fungus, Botrytis cinerea. h Treatment of fungus-like pathogen, Phytophthora infestans.
Table S3. Numbers of gene duplicates and other annotated genes showing differential expression profiles in 79 combinations of developmental stages Expression patterns
Total transcriptome a
Recent duplicate genes
Differential expression 14,534 (90.8 %)
3,920 (96.9 %)
Constant expression
124 (3.1 %)
Sum
1466 (9.2%) 16,000 (100%)
b
4,044 (100%)
a
A gene was considered to be differentially expressed in 79 developmental stages if the p-value of the ANOVA using 79 triplet expression values, was ≤0.0001. After excluding 5,298 genes in ancient and recent duplicate genes from 21,298 array elements with detectable expression, 16,000 genes were used in the analysis. The proportion of duplicate genes with differential expression was significantly higher than that of entire transcriptome (χ2 = 179.10, d.f. = 1, P ≤ 2.2x10-16)
Table S4. Numbers of gene duplicates and other annotated genes showing differential expression profiles in 5 tissues Expression patterns
Total transcriptome
Differential expression a 12,878 (80.5 %) Constant expression 3122 (19.5 %) Sum 16,000 (100%)
Recent duplicate genes 3600 (89.0 %) 444 (11.0%) 4,044 (100%)
a
A gene was considered to be differentially expressed if the expression levels among five tissues (leaf, flower, root, seed, pollen) were significantly different using ANOVA and P ≤ 0.001. The proportion of duplicate genes with differential expression was significantly higher than that of entire transcriptome (χ2 = 1147.481, d.f. = 1, P ≤ 2.2x10-16).
5
Table S5. List of recent duplicate genes inferred in Ref. [1] and used for gene expression analyses in this study (see Excel file). Table S6. List of recent duplicate genes inferred in Ref. [1] and used for gene expression analyses in this study (see Excel file)
Figure S1. Expression correlation between duplicate genes derived from old and recent polyploidization events predicted by Blanc et al. (2003). Pearson‘s correlation coefficient was used to analyze similarities between expression profiles of gene duplicates in 512 microarray experiments. Red, green, and black lines indicate the distribution of expression correlation coefficient between the duplicate genes formed by recent polyplidization event, by ancient polyploidization events, and by randomly pairing, respectively. Dotted line shows 95th percentile (r = 0.74) in the distribution of randomly paired genes.
Figure S2. a. Empirical cumulative distributions of differences in expression correlation between duplicate genes and randomly paired genes. The Kolmogorov- Smirnov test indicates that the two distributions are significantly different (P ² 2.2 x 10-16). b. Probability density plot of difference (D) between the duplicate genes and the randomly
6
paired genes. Based on the Wilcoxon rank sum test, the distribution for the duplicate genes is shifted to the right of that for the randomly paired genes. The average values of D are 0.1460 [Rdev - Renv(shoots)] and 0.1282 [Rdev - Renv(roots)], respectively, indicating that Rdev is significantly larger than Renv(shoots) and Renv(roots) (P ² 2.2 x 10-16).
Figure S3. Distributions of expression correlation between duplicate genes in internal and external processes using the recent gene duplicates inferred in Bowers et al (2003). The probability density of expression correlation was plotted against correlation coefficients, and the statistic analysis was performed as shown in Fig. 1. The same conclusion was obtained using the gene duplicates inferred in Blanc et al. (2003) (Fig. 1) and those inferred in Bowers et al. (2003) (this analysis).
7