Control analysis of DNA microarray expression data

Report 4 Downloads 158 Views
Control analysis of DNA microarray expression data R. Keira Curtis and Martin D. Brand, MRC Dunn Human Nutrition Unit, Hills Road, Cambridge, CB2 2XY, U.K. Phone: (+44) 1223 252806 ; Fax: (+44) 1223 252805 Address correspondence to R.K.C. ([email protected]) Keywords: microarray, transcriptome, gene hunting, metabolic control analysis, regulation analysis Summary DNA microarrays produce large amounts of data. Complex changes in gene expression are revealed; sometimes thousands of mRNAs change between experiments. Here we apply modular regulation analysis to microarray data to reveal and quantify the mRNA changes that are important for cellular responses. The mRNAs are sorted into clusters. How strongly a perturbation alters each cluster is multiplied by how strongly each cluster affects an output, to obtain coefficients that describe how much of the change in the output is transmitted through each mRNA cluster. An example published dataset is analysed to reveal that the response (‘relative fitness’) of yeast to 2-deoxy-D-glucose is not transmitted by a single mRNA cluster, but instead many clusters contribute to the overall response. The method is applicable to microarray, transcriptome, proteome and metabolome data. Introduction Microarrays are increasingly used to profile gene expression. They have been applied to a range of biological problems (Spellman et al., 1998; Golub et al., 1999; Gasch et al., 2000; Alexandre et al., 2001; Le Naour et al., 2001) and often complex changes in expression are observed, with many mRNAs changing between experimental states. Interpreting the large volume of data produced is recognised as a problem (Hegde et al., 2000; Hess et al., 2001) as current methods have no way of distinguishing mRNAs that are critical in any particular response from those that change but are not important. Generally, mRNAs that change greatly are described as being important, yet this is not necessarily the case. Modular regulation analysis, a subset of control analysis (Fell, 1997), has previously been applied to metabolism and cell signalling (Kesseler and Brand, 1994; Brand, 1997; Ainscow and Brand, 1999b; Krauss et al., 1999; Krauss and Brand, 2000; Krauss et al., 2001; Brand and Curtis, 2002). The modular approach to metabolic control analysis (Kacser and Burns, 1973; Westerhoff et al., 1984; Bohnensack, 1985; Fell and Sauro, 1985; Westerhoff et al., 1987; Westerhoff and van Dam, 1987; Kacser et al., 1995; Brand, 1996; Brand, 1997; Brand and Curtis, 2002) involves grouping enzymes into reaction blocks, and coefficients apply to each block as if it were a single enzyme. Modular regulation analysis involves multiplying how much the blocks change in response to an external effector (integrated response coefficients) by how much a change in each block affects the output (elasticity coefficients) to give a set of partial response coefficients that quantify how much of a particular response is transmitted through each of these reaction blocks. Here we apply modular regulation analysis to a published microarray dataset to quantify the response transmitted through different clusters of mRNAs.

Theory The system is constructed in such a way that an input, for example addition of a drug or a change in growth conditions, acts to change the level of the mRNA clusters (integrated response coefficients). These clusters then act to affect the output of the system (elasticity coefficients). The output could be an enzyme rate, the concentration of a metabolite or an mRNA, or any other quantifiable response. The product of the integrated response and elasticity coefficients for each cluster reveal how much of the change in output is transmitted by that cluster. The method is described in detail in the appendix. Results We applied the analysis described to the dataset published by Hughes et al (2000). Fullgenome microarray data is given for a series of 300 experiments using yeast grown in identical conditions; 276 of these were deletion mutants. The ‘relative fitness’ of each strain was used as the system output. This was determined by the Hughes et al (2000), using a quantitative parallel growth assay to give a relative growth rate. The effect on transcription of 13 different compounds was also profiled, and one that had expression data relative to the control strain (2deoxy-D-glucose) was chosen (arbitrarily) to be the system input. 2-deoxy-D-glucose is a nonmetabolisable analogue of glucose that inhibits yeast growth. Only 120 of the deletion mutants could be used as modulations, i.e. had repeated microarray hybridisations, a measured output (relative fitness), and had expression measured relative to the same control strain. This allowed a system with 120 mRNA clusters. We used a program, written in the interpreted language Python (http://www.python.org/), to perform the calculations. This program sorts the mRNAs into clusters, using output from the European Bioinformatics Institute’s online Expression Profiler clustering program (http://ep.ebi.ac.uk/), performs all the coefficient calculations, and calculates which experiments should be omitted in the testing process. Calculation and testing were cycled until a stable result (78 clusters) was reached. Here, the elasticity coefficients were similar, whatever experiment was omitted: 72 of the 78 clusters (92%) had a standard deviation 50% or less of the mean, and 54/78 (69%) had a standard deviation 10% or less, although omission of experiment yea4 made a slight difference to the elasticities. This solution was confirmed by merging the two most similar clusters to give a 77 cluster system; again the standard deviations of the elasticities were consistent: 76/77 (99%) within 50% of the mean, and 62/77 (80%) within 10% of the mean. The 78 satisfactory experiments, i.e. not yea4, were then applied to the 78 cluster system to give a final solution. The analysis revealed that there was no single mRNA cluster responsible for transmission of most of the response. Instead the control was distributed between the clusters. Results for the clusters with the largest partial response coefficients are shown in Figure 1 and in the Appendix (Table 1, Figure 2), and show that the clusters that changed the most were not necessarily the most important. The 6 clusters with the most positive partial responses account for about 73% of the total positive partial response; the 6 most negative for about 69% of the total negative partial response, demonstrating that the remaining 66 clusters are less important. Overall, the analysis predicts the net response of the cells to be a decrease in relative fitness, i.e. impaired growth. This is in agreement with Hughes et al. (2000).

Figure 1: Partial response coefficients for the 78 cluster system. The relative size of the coefficient is indicated by the weight of the arrow. Coefficients for the six blocks with the most positive and the six most negative partial responses are shown. The sum of the partial responses through the remaining clusters with positive and negative coefficients is also illustrated. Black: positive; red: negative. The number of genes in each cluster is indicated in brackets.

Figure 2: 78 cluster system integrated response (blue), elasticity (yellow) and partial response (red) coefficients, for the 6 clusters with the most positive (left) or most negative (right) partial responses. “Other positive”: the sum of partial responses for all other clusters with a positive partial response. "other negative": the sum for all other clusters with a negative partial response. See Table 1 in appendix for the coefficient values and gene product functions. Cluster 24 is an example of a cluster with a high integrated response and a low partial response coefficient.

Discussion The authors of the analysed dataset profiled the yeast strains in order to perform extensive correlation-based clustering of the mRNAs (Hughes et al., 2000). They then used the clustering patterns to predict the function of uncharacterised open reading frames, based on them clustering with genes of known function. Our analysis extracted useful information from the data, without requiring prior knowledge of the function of any of the gene products, not only finding important clusters of mRNAs, but also quantifying their importance in transmitting a response. Cluster 39 contained only one gene and had the second highest partial response coefficient. This gene (YLR302C) has no known function. This demonstrates that modular regulation analysis can be used to pinpoint novel mRNAs (or clusters of mRNAs) that may be amenable to manipulation, in order to affect the response. Hughes et al (2000) suggest that the addition of 2-deoxy-D-glucose has an effect on the cell wall and that the previously uncharacterised YER083C is required for normal cell wall function. Our analysis does reveal that 2 blocks implicated in cell wall function (clusters 12 and 27) are important for the response to 2-deoxy-D-glucose, and YER083C itself is found in cluster 36, which is found to be important for the response. Hughes et al. (2000) also say that their correlation-based clustering resulted in ‘several large classes’, while our Euclidean clustering produced one very large and many very small clusters, allowing the partial responses of many individual mRNAs to be calculated. The one very large cluster contained those mRNAs that did not change very much in most of the modulations. If a threshold of a minimum change in expression were used, many of the mRNAs in this block would be removed from the analysis. Metabolic control analysis generally reveals that control is distributed, rather than localised. Our analysis agrees with this observation. We also showed that the mRNAs that changed the most were not necessarily the most important for transmission of the response. For example, cluster 24 had the highest integrated response to 2-deoxy-D-glucose, yet due to its small elasticity, it had a very small partial response coefficient. Conversely, cluster 52 had a small integrated response coefficient, but multiplication of this by its large elasticity resulted in one of the highest partial response coefficients. Modular regulation analysis is highly applicable and transferable to proteome and metabolome data as it becomes available. Acknowledgements: We wish to thank Andrew Raine for technical assistance. References Ainscow, E. K. and Brand, M. D. Eur. J. Biochem. 231 (1995), 579 Ainscow, E. K. and Brand, M. D. J. Theor. Biol. 194 (1998), 223 Ainscow, E. K. and Brand, M. D. BioSystems 49 (1999a), 151 Ainscow, E. K. and Brand, M. D. Eur. J. Biochem. 265 (1999b), 1043 Alexandre, H., Ansanay-Galeote, V., Dequin, S. and Blondin, B. FEBS Lett. 498 (2001), 98 Bohnensack, R. Biomed. Biochim. Acta. 44 (1985), 1567 Brand, M. D. J. Theor. Biol. 182 (1996), 351 Brand, M. D. J. Exp. Biol. 200 (1997), 193 Brand, M. D. and Curtis, R. K. Biochem. Soc. Trans. 30 (2002), 25 Fell, D. (1997). Understanding the Control of Metabolism: Portland Press, London. Fell, D. and Sauro, H. Eur. J. Biochem. 148 (1985), 555 Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B., Storz, G., Botstein, D. and Brown, P. O. Mol. Biol. Cell 11 (2000), 4241 Giersch, C. Eur. J. Biochem. 227 (1995), 194 Giersch, C. and Cornish-Bowden, A. J. Theor. Biol. 182 (1996), 361 Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. et al. Science 286 (1999), 531

Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E., Snesrud, E., Lee, N. and Quackenbush, J. Biotechniques 29 (2000), 548 Hess, K. R., Zhang, W., Baggerly, K. A., Stivers, D. N. and Coombes, K. R. Trends. Biotechnol. 19 (2001), 463 Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R., Armour, C. D., Bennett, H. A., Coffey, E., Dai, H., He, Y. D. et al. Cell 102 (2000), 109 Kacser, H. and Burns, J. A. Symp. Soc. Exp. Biol. 27 (1973), 65 Kacser, H., Burns, J. A. and Fell, D. Biochem. Soc. Trans. 23 (1995), 341 Kesseler, A. and Brand, M. D. Eur. J. Biochem. 225 (1994), 923 Krauss, S. and Brand, M. D. FASEB J. 14 (2000), 2581 Krauss, S., Brand, M. D. and Buttgereit, F. Immunity 15 (2001), 497 Krauss, S., Buttgereit, F. and Brand, M. D. Biochim. Biophys. Acta 1412 (1999), 129 Le Naour, F., Hohenkirk, L., Grolleau, A., Misek, D. E., Lescure, P., Geiger, J. D., Hanash, S. and Beretta, L. J. Biol. Chem. 276 (2001), 17920 Spellman, P. T., Sherlock, G., Zhang, M. Q., Vishwanath, R. I., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. Mol. Biol. Cell 9 (1998), 3273 Westerhoff, H. V., Groen, A. K. and Wanders, R. J. Bioscience Rep. 4 (1984), 1 Westerhoff, H. V., Plomp, P. J., Groen, A. K., Wanders, R. J., Bode, J. A. and van Dam, K. Arch. Biochem. Biophys. 257 (1987), 154 Westerhoff, H. V. and van Dam, K. (1987). Thermodynamics and Control of Biological Free Energy Transduction. Amsterdam: Elsevier.