Bioinformatics Advance Access published April 4, 2006 © The Author (2006). Published by Oxford University Press. All rights reserved. For Permissions, please email:
[email protected] ProMAT: Protein Microarray Analysis Tool Amanda M. White1,*, Don S. Daly1, Susan M. Varnum2, Kevin K. Anderson3, Nikki Bollinger2 and Richard C. Zangar2 Statistical Sciences, 2Cell Biology & Biochemistry, 3Decision & Sensor Analytics, Pacific Northwest National Laboratory, Richland, WA 99354, USA * Corresponding author:
[email protected] 1
Associate Editor: Martin Bishop
ABSTRACT Summary: ProMAT is a software tool for statistically analyzing data from ELISA microarray experiments. The software estimates standard curves, sample protein concentrations and their uncertainties for multiple assays. ProMAT generates a set of comprehensive figures for assessing results and diagnosing process quality. The tool is available for Windows or Mac, and is distributed as open-source Java and R code. Availability: ProMAT is available at http://www.pnl.gov/statistics/ProMAT. ProMAT requires Java version 1.5.0 and R version 1.9.1 (or more recent versions). ProMAT requires either Windows XP or Mac OS 10.4 or newer versions. Contact:
[email protected] INTRODUCTION Enzyme-linked immunosorbent assays (ELISA) are widely used to estimate the concentration of a specific protein. Proteomic technologies have recently increased the rate by which protein biomarker candidates are being discovered, such that traditional ELISA approaches (e.g., 96-well plates) have become an inefficient means to validate biomarkers. ELISA in a microarray format permits simultaneous estimation of the concentrations of numerous proteins in a small sample, and therefore can increase the rate of biomarker validation. High-throughput analysis of ELISA microarrays requires statistical software specifically designed for estimating standard curves, predicting protein concentrations and estimating their uncertainties [1]. Currently, there are no freeware tools suitable for this use. Typically, researchers are forced to do these calculations using an inefficient spreadsheet program which is not designed for these calculations. Therefore, we have developed a statistically based bioinformatics tool, ProMAT, which significantly decreases ELISA microarray data analysis time while increasing the information content of the results.
SOFTWARE CAPABILITIES ELISA microarray experiments consist of two component analyses that are undertaken in parallel. One component is the analysis of a serial dilution of protein standards. This analysis is used to estimate the standard curves which model protein concentration as a function of spot fluorescence. The second component is the analysis of the biological samples. The protein concentrations in the biological samples are estimated from their corresponding standard curves. These concentration estimates, however, are inherently uncertain due to the uncertainty in both the estimates of the standard curve and the fluorescence intensity of the biological samples. Evaluating this uncertainty is critical for interpreting the data and optimizing experimental
procedures. Therefore, estimation of uncertainty has been automated in ProMAT. The statistical methods used to fit standard curves and estimate uncertainties are discussed in depth by Daly and coworkers [2]. For each antigen, multiple statistical models are fit to the standard data, and the PRESS statistic [3] is used to select the best fitting model. Currently, the four-parameter logistic model [ spotIntensity = A + ( B A) /(1 + exp(C concentration) / D ) ] and the power curve model [ spotIntensity = m * concentration n + b ] are used, although additional models may be incorporated in the future. The models are fit using ‘nlm’, a nonlinear optimization procedure in R, which also provides uncertainty estimates for the curve parameters (e.g., A, B, m, n). Replicate spots on the biological sample arrays are used to estimate the uncertainty of the spot intensities. Then, upper and lower bounds on the sample concentration predictions are estimated by incorporating the uncertainties in both the model parameter and biological spot intensity estimates through propagation of error [4]. ProMAT combines these elements into a simple but comprehensive diagnostic figure (Figure 1) that allows the user to quickly assess assay quality. This figure shows statistical summaries of both the standard and sample data for a single antigen. The lower right panel shows the standard data for this antigen (black points) along with the estimated standard curve (black line). The upper and lower blue lines in this panel are the approximate 95% confidence bounds on the concentration prediction. The red dotted lines delineate a prediction acceptance region; sample data points that fall within this region are considered to be high quality data. The lower left panel shows a histogram of sample spot intensities on a vertical axis aligned with that of the standard curve. The top panel shows the percent coefficient of variation (%CV =100*(prediction error)/(predicted concentration)) for each point along the standard curve. This diagnostic figure allows users to quickly identify problems such as the sample signal intensities being outside the optimal region of the curve, high variability in the standard data, or a standard curve that has a limited useful range (i.e., red lines are close together). ProMAT takes as input the tables of spot intensity values output by most microarray image analysis tools, along with information about the slide layout (e.g., the antigen corresponding to each spot) and standards dilution series. ProMAT output is in the form of ASCII tables in comma-delimited format showing the concentration predictions, and approximate 95% bounds for each sample-antigen-array combination. Diagnostic figures of the type shown in Figure 1 are produced for each antigen, and an HTML page is created to allow the user to easily browse a complete set of these figures.
SOFTWARE IMPLEMENTATION ProMAT will run under Windows XP or Mac OS 10.4 or newer versions. ProMAT’s userinterface is written in Java; the underlying statistical algorithms are written in R (http://www.rproject.org), an open-source statistical programming language. ProMAT is distributed with its source code so that it may be adapted or customized if desired.
CONCLUSIONS ProMAT is a tool for the statistical analysis of ELISA microarray experiment datasets. ProMAT estimates standard curves, sample protein concentrations and their uncertainties, and provides insightful diagnostic figures. This is the first open-source tool specifically designed for protein microarrays. ProMAT runs on Windows or Mac operating systems.
REFERENCES 1. Zangar, RC, DS Daly, and AM White. "Advancing ELISA Microarray Technology into a High-Throughput System for Cancer Biomarker Validation" Expert Rev. Proteomics 2006, In Press. 2. Daly DS, AM White, SM Varnum, KK Anderson, and RC Zangar. "Evaluating Prediction Errors in ELISA Microarray Experiments." BMC Bioinformatics 2005, 6:17. 3. Raymond H. Myers. Classical and Modern Regression with Applications. PWS Pub. Co., 1990. 4. Mood AM, Graybill FA, Boes DC. Introduction to the theory of statistics. McGrawHill Book Company, 1974.
ACKNOWLEDGEMENTS Funding for this work was provided by the U.S. Department of Energy through the Laboratory Directed Research and Development program, and by the National Cancer Institute and the Early Detection Research Network (CA117378). Pacific Northwest National Laboratory is operated by Battelle for the U.S. Department of Energy under Contract DE-AC06-76RL01830.
Figure 1: A representative ProMAT diagnostic figure from the ELISA microarray standards and sample data for a single assay. ProMAT generates a similar figure for each assay in an ELISA microarray experiment.