MCP Papers in Press. Published on July 24, 2007 as Manuscript M600470-MCP200
Assessing bias in experiment design for large-scale mass spectrometry-based quantitative proteomics
Amol Prakash1,2,4*, Brian Piening2,8, Jeff Whiteaker2, Heidi Zhang2, Scott A. Shaffer7, Daniel Martin3,2, Laura Hohmann3, Kelly Cooke3, James M. Olson2, Stacey Hansen2, Mark R. Flory6, Hookeun Lee5, Julian Watts3, David R. Goodlett7, Ruedi Aebersold5,3, Amanda Paulovich2, Benno Schwikowski4,1
* To whom correspondence should be addressed.
Abstract Mass spectrometry-based proteomics holds great promise as a discovery tool for biomarker candidates in the early detection of diseases. Recently, much emphasis has been placed upon producing highly reliable data for quantitative profiling, for which highly reproducible methodologies are indispensable. The main problems that affect experimental reproducibility stem from variations introduced by sample collection, preparation and storage protocols, and liquid chromatography-mass spectrometry (LCMS) settings and conditions. On the basis of a formally precise and quantitative definition of similarity between LC-MS experiments, we have developed Chaorder, a fully automatic software tool that can assess experimental reproducibility of sets of large-scale LC-MS experiments. By visualizing the similarity relationships within a set of experiments, this tool can form the basis of quality control and thus help assess the comparability of mass spectrometry data over time, across different laboratories, and between instruments. Applying Chaorder to data from multiple laboratories and a range of instruments, experimental protocols, and sample complexities revealed biases introduced by the sample processing steps, experimental protocols and instrument choices. Moreover, we show that reducing bias by correcting for just a few steps, for example randomizing the run order, does not provide much gain in statistical power for biomarker discovery. 1 Copyright 2007 by The American Society for Biochemistry and Molecular Biology, Inc.
Downloaded from www.mcponline.org by on February 6, 2008
1. Department of Computer Science and Engineering, University of Washington, Seattle, WA-98195. 2. Fred Hutchinson Cancer Research Center, Seattle, WA-98109. 3. Institute for Systems Biology, Seattle, WA-98103. 4. Systems Biology Group, Institut Pasteur, 75015 Paris CEDEX 15, France. 5. Institute of Molecular Systems Biology, Zurich, Switzerland 6. Department of Molecular Biology and Biochemistry, Wesleyan University, Middletown, CT-06459 7. Department of Medicinal Chemistry, University of Washington, Seattle, WA98195. 8. Molecular and Cellular Biology Program, University of Washington, Seattle, WA-98195.
Introduction Mass spectrometry has exhibited tremendous promise in allowing to probe complex biological samples globally at the protein level [1,2,3,4]. This capability is of key importance for the identification of diagnostic biomarkers for developing early detection methodologies for many human diseases, including cancers, and for unbiased, global measurements of cellular processes that are a key component of systems biology approaches [5]. Although mass spectrometers are capable of both selective and sensitive measurements, mass analyzers are limited in their dynamic range. The consequence of this is a limited capability to detect very low-abundant analytes in biological samples possessing high dynamic range. Concomittanly, mass spectrometer duty cycles limit the number of collision induced dissociation (CID) events per unit time, and often lead to a significant under-sampling of more complex proteomes [6]. Furthermore, the subset of peptides being sampled for CID can vary from one experiment to the next, hindering both interpretation and confidence in quantification.
2
Downloaded from www.mcponline.org by on February 6, 2008
Many approaches therefore go beyond the straightforward use of CID for largescale protein identification. These range from the AMT approach [7], clustering [8], complete workflow solutions for LC-MS data sets [9-12], alignment algorithms for LCMS [13,14] and feature detection approaches for SELDI platforms [15-17]. These approaches rely heavily on high-quality LC-MS profiles for peak alignment, peptide identification and quantitation, and thus require a high degree of reproducibility in the sample collection, processing, and analytical run conditions [18]. The need for reproducibility in LC-MS experiments has become increasingly essential with the popularization of large-scale quantitative proteomics [19]. Sources of variation that greatly affect this reproducibility can vary depending on the experimental platform, and in particular the choice of instrument. Critical sources of variation common to all LC-MS experiments include variation in signal intensity, mass accuracy [20-24], and elution profile. For the latter, this can be further described by variations in peptide elution time, elution order, and peak width [25]. Sample collection (e.g., harvesting conditions), preparation protocols (e.g., freeze/thaw cycles), experimental design (e.g., run order), platform stability (e.g., column, spray) and sample stability can affect results, thereby leading to biomarkers that have no biological basis or discovery of biomarkers with high false positive rates. Further experimental variation is often observed when comparing results across laboratories [26,27] and instruments. Previous work highlighting these problems [19-24] have suggested that careful experimental design can minimize variation. Recently, focused studies have assessed reproducibility of MS/MS acquisition [28] and ICAT labeling [29]. Thus, to improve reproducibility of various protocol steps, biologists and chemists are exploring various techniques and ideas, for example, different numbers of washes, different columns, different sample processing strategies, randomizing run order, etc. However, the lack of proper measures of experimental reproducibility prevents investigators from having a complete understanding of the impacts of their protocol modifications, thus impeding progress. A common strategy to study reproducibility in sample and platform quality control is to spike a set of standard peptides/proteins into a sample, and then describe
Results An approach to measure similarity in LC-MS experiments Here, we present a practical, quantitative measure of experimental reproducibility, and, more generally, similarity, in LC-MS-based proteomics. Our measure quantifies how similar all raw MS1 peaks from two LC-MS experiments are relative to each other after proper alignment in the retention time dimension. As we illustrate below, this definition captures changes in mass resolution, signal intensity, signal elution profile, and noise levels. Building upon an existing alignment algorithm to compute this measure of similarity between two LC-MS experiments [9], we present a tool Chaorder* that produces a visual representation of the similarity relationships in a set of experiments. Chaorder is highly efficient, scalable and parallelizable, and thus can handle the gigabytes of data that current state-of-the-art instruments generate. It can handle data generated from a variety of instruments and of varying sample complexities (shown in Results). It measures all features (mass accuracy, elution profile, intensity, elution times, signal-to-noise, etc.) of all signals present in the data. Building on this tool, we propose a methodology for quality control of LC-MS data that is flexible enough to be applied to any LC-MS proteomics platform. The application of our methodology to data from different laboratories reveals significant and consistent biases in large-scale LC-MS experiments, despite careful experimental design. Biases detected in this study are caused by experimental protocols, like HPLC washing, sample freeze thaw cycles, run order and date. The fact that these are standard elements of any sample protocol suggests that these biases may be occurring in many proteomics experiments today. The systematic exploration of these biases may thus be an important first step on the way to the design of unbiased LC-MS proteomic experiments. *
Taken from the book ‘Birth of the Chaordic Age’ by Dee W. Hock, founder of Visa International.
3
Downloaded from www.mcponline.org by on February 6, 2008
variance in the measurement through coefficient-of-variance (CV) scores [30, 31]. This strategy is problematic in two respects. First, using 5-10 peptides to represent the complexity of the entire sample is a classic case of statistical under-representation, especially for complex samples like human serum, which potentially may contain a few hundred thousand peptides [32]. Second, each coefficient of variance score captures only one particular reproducibility factor, e.g., mass accuracy, intensity, etc, ignoring all other factors. In addition, there are no well-characterized CV scores for factors such as signalto-noise ratio, elution profile, elution time, etc. Comparative measures for experimental data have proven to be key enablers of progress in other scientific domains. Examples include the BLAST [33] E-value score for the degree of homology between proteins, and the PHRED [34] score, a key ingredient to enabling quality-assessed, large-scale, automated DNA sequencing. An ideal measure of reproducibility in LC-MS-based proteomics would build on a qualitative and quantitative measurement of every peptide or protein that is present in the sample, and compare experiments based on this knowledge. However, our inability to attain identification for the majority of peaks in an experiment makes a corresponding definition of reproducibility impractical.
4
Downloaded from www.mcponline.org by on February 6, 2008
Chaorder takes as input a list of experiments, and computes the bounded alignment score between each pair of experiments [9]. It then represents each experiment as a point in two dimensions, such that the Euclidean distance between a pair of points approximates the inverse of the alignment score between the corresponding two experiments (details are described in Methods). A set of experiments that are expected to be similar (e.g., almost perfect technical repeat experiments) correspond to high pairwise alignment scores, and the corresponding points tend to appear in close proximity in the two-dimensional image. Conversely, distant points correspond to dissimilar experiments. In our experience, approximate ranges of pairwise alignment scores (Euclidean distances) are empirically correlated with different qualitative classes of similarity. A distance between two points close to 0 represents ideal levels of similarity; a distance between 0 and 0.2 represents achievable (and tolerable, depending on the experimental setup) levels of reproducibility, usually seen in repeat experiments with exact same experimental setup. A distance between 0.2 and 0.5 represents biases in the experimental setup, if experiments were expected to be similar (e.g., technical replicates). The experimental setup could then be studied further for more in-depth analysis. Distances greater than 0.5 usually represent similarities caused the same system being analyzed in under very different perturbations, different platforms, etc. Distances greater than 0.7-0.8 represent unrelated experiments, e.g., comparisons between yeast cell lysate with human serum. Beyond interpretations of individual pairwise distances, the two-dimensional image can be used to understand systematic effects that may occur over time due to possible sample degradation or changing experimental conditions (presented in detail below). We applied Chaorder to a variety of data sets, ranging from simple quality control studies over simulated biomarker experiments to real biomarker studies. Most of these experiments were LC-MS runs with no MS/MS acquisition. Detailed experimental protocols of these data sets are provided in the Appendix.
Human Serum repeats: Sample Preparation effects Periodic experiments were performed using human serum on a time of flight (LCT Premier) instrument. Between experiments, the sample went through freeze/thaw cycles, and immediately before each LC-MS experiment, an aliquot was digested using trypsin. Figure 2 shows the Chaorder image for this data set. Again, the image reveals strong run order effects, as the run order is inferred without any prior information. Our own manual follow-up analysis revealed that, globally, the total ion intensity was reduced over this series of experiments. Many peptides maintain their intensity levels, but many other intensity levels are significantly reduced. This suggests that the sample is degrading over time. Another possibility is a systematic change of digestion efficiency. Overall, Figure 2 represents an example, where sample preparation steps appear to introduce biases that are not well characterized. Repeated injects of Angiotensin II: Column variability and run order effects Experiments with Angiotensin II injections were performed periodically to study the integrity, reproducibility, and sensitivity of the LC column used for a yeast genetics study. These injections were performed between the yeast experiments on an FT-ICR instrument. As the LC column degrades over time, the column was changed for a new one after 212 injections. For each experiment, the raw LC-MS data produced was recorded. Figure 3 shows the Chaorder image for this data set. Each data point represents one Angiotensin II LC-MS experiment. Squares and triangles correspond to different LC columns, and color represents the different dates on which experiments were run and the data label corresponds to the run order. Without any prior knowledge, Figure 3 shows a clear split between the two columns. Surprisingly, the variation between colors (dates) appears to be larger. Beyond this, the image reveals a strong clustering by date and,
5
Downloaded from www.mcponline.org by on February 6, 2008
Instrument Comparison for Reproducibility: Instrument variability and run order effects 10 repeat LC-MS experiments of human serum were performed on two instruments, a quadrupole-time of flight (QTOF) and QSTAR with an electrospray source. Both instruments are based on the time-of-flight principle, but come from different manufacturers. Figure 1 shows the Chaorder image of this data set. Each point represents a complete LC-MS run of the unfractionated human serum. The colors correspond to the different instruments (QSTAR=blue, QTOF=red) and the data labels represent the order in which the experiments were run. A number of observations can be made. First, the data from the two instruments form two distinct clusters. Second, the QSTAR instrument shows much more variation than the QTOF. Third, run order effects are revealed: successive experiments tend to move into one direction. Sample carryover is one possible explanation for this effect. These data also show the problems faced by the community with different choices of instruments available. There is little hope that results generated on QSTAR can be reproduced on a QTOF, and comparing data between the two platforms would present a significant challenge.
within each date, strong run-order effects are revealed. . Upon further examination, we found that even though the sample contained only Angiotensin, there was significant carryover, and many features were observed in the resulting output that resemble peptide peaks. Feature identification using msInspect [35] results in more than 100 features being detected. As these experiments are performed between the yeast experiments, the carryover is expected, but the clustering of these experiments by run order and dates reveals the potential problems with the variation of the analytical platform.
Time course study for yeast cell cycle: Freeze/thaw cycle effects Our next analysis concerns two LC/LC-MS/MS measurements of tryptic digests of a whole-cell lysate of the yeast Saccharomyces cerevisiae (from [36]). The two samples were collected from cells synchronized in the G1 phase of the cell cycle, and at 30 minutes following release. Both protein samples were digested into peptides using trypsin, and then separated by Strong Cation Exchange (SCX) chromatography into a number of fractions. For each time point, each fraction was split in half (or three parts), one analyzed immediately, the second and third after multiple freeze/thaw cycles, using RPLC-ESI-MS (ThermoFinnigan LCQ Deca XP). Figure 5 shows the Chaorder plot for time points 0 (blue) and 30 (red) of this data set. The data labels correspond to the SCX fraction, and multiple data points with the same label are the repeat experiments of that particular SCX fraction after additional freeze/thaw cycles. The first observation is that the horizontal axis of the image
6
Downloaded from www.mcponline.org by on February 6, 2008
Simulated biomarker hand mix data: day-to-day variations In experiments designed to emulate a test case for biomarker discovery, two protein samples were prepared, one “control sample” consisting of a four protein digest and a second “disease” sample, in which, in addition to the four digested proteins, βlactoglobulin (as a simulated “biomarker”) was spiked in. Multiple LC-MS experiments of the two samples were performed on a TOF instrument. Despite the low number of proteins, these resulting LC-MS maps tend to be very complex. Possible reasons are that the peptides exhibit multiple charge and isotopic states, the proteins are not absolutely pure, the tryptic digestion is imperfect, leading to missed-cleavages and mis-cleaved peptides and that the sample was run over a short 30 minute gradient. All these issues arise in any proteomic setup, increasing the complexity of the output many-fold. This observation was made for this mixture as well, with thousands of peptide-like features being observed in the experiments (without de-convoluting), whereas the tryptic digestion theoretically yields only on the order of a hundred peptides. Figure 4 shows the Chaorder image for this data set. The 4-protein experiments are shown in blue, and the five-protein experiments are shown in red, and the data label corresponds to the date on which experiments were run. A first observation is that, even after randomizing the run order, the experiments cluster by the dates they were run on, especially on the 19th and 21st. Secondly, the 4 and 5 protein data seems to be distinguishable, despite the somewhat weak clustering of 4-protein data sets and 5-protein data sets, could make it hard to differentially identify the spiked-in fifth protein. This test experiment shows the problem that can arise when we have data with low reproducibility: Any actual biomarkers may remain hidden behind all the less interesting experimental variations.
approximately reflects the different time points, and the vertical axis reflects the SCX chromatography. This is remarkable, as Chaorder generated the plot without any additional prior knowledge. As one would expect, successive SCX fractions differ significantly, but also share a set of common proteins, which then occur in related RPLC fractions. Both the split into two SCX fractions, as well as the relatedness of certain RPLC fractions are revealed by Figure 5. Another interesting point to note is that the repeat experiments are not as similar to each other as we expect (do not cluster tightly). Our detailed analysis revealed many unexpected artifacts presumably generated by the freeze/thaw cycle, e.g., change in noise levels, changes in the intensity levels of many peptides, MS/MS undersampling, etc.
Methods The peptides in an LC-MS experiment can vary in their signal intensity, elution profile, elution time, mass resolution, etc. Beyond this, the experiments can themselves vary in signal-to-noise levels. In measuring similarity, one would like to take all of these variations into account. Also one would want to penalize only local variations, but not global variations that can be accounted for. For example, if the amount of sample loaded differed between two experiments, and all observed signal intensities were scaled according the loading amount, one would still like to call the experiments similar. Applying different gradients leads to varying retention times [37]; different mass
7
Downloaded from www.mcponline.org by on February 6, 2008
Mouse models for Huntington’s disease: run order effects Serum from mice with Huntington’s disease (homozygous and heterozygous) and from normal mice were analyzed on a QTOF mass spectrometer in an LC-MS mode. Experiments were performed for mice at the ages of 3, 6, 9 and 12 months. In the mouse model used, Huntington’s disease is known to result in symptoms in 12 month old mice. The aim of this study was the identification of Huntington’s disease biomarkers in younger mice. Many of the mice were littermates. Each experiment was performed in triplicate, i.e., the serum collected from a mouse was divided into three aliquots, and each was digested and analyzed separately. Figure 6 shows the Chaorder plot for 3-month old mice data set. This data set had 3 wild type mice (squares) and 5 in the homozygous/heterozygous class (triangles). Triplicate experiments for a single mouse are shown in a single color. The data label represents in the run order. The first observation to be made from the plot is that the triplicate experiments are not as tightly clustered as one could have expected. As there are multiple sources of variability (mice-to-mice variations, homozygous/heterozygous/wild type, replicates, etc.) one does not expect to see a very simple cluster structure, but one might hope to at least see that mice in one disease state would cluster together. Looking at Figure 6, one can see that this is not the case. Instead, an added effect of run order makes the clustering more complex. For example, some experiments close in run order 0-1-2-3, 5-6-7, 14-15, etc. are close by. Note that Chaorder identified this clustering without any added prior knowledge. The clustering suggests that run order is creating artifacts in the data that reduces our statistical power of (1) clustering replicates and (2) distinguishing wild type from disease type. In particular, the 3-month old mice data has to be treated with caution. Similar effects are suggested in the 6 and 9 month old data, though the effects of run order are much weaker.
This results in an NN matrix of distances. We apply multidimensional scaling [38] which allows to identify each experiment with a point in two dimensions, such that the distances between any two experiments A and B approximately represent the pairwise distances d(A,B). Such an embedding in two dimensions in necessitates a certain distortion of some of the distances d(A,B), but our study suggests that the global embedding in two dimensions still reveals major global effects. Furthermore, as the embedding can be rotated without changing any of the embedding distances, the axes do not have a particular significance – the embedding just illustrates the relative distance relations between all experiments. The Appendix describes multidimensional scaling in more detail. Chaorder provides other views of the data as well, e.g., analyze a pair of experiments for their differences and similarities using ChAMS[9]. Most of the analysis in the Results section could also have been obtained by manual analysis of the data. In fact, many of our conclusions about run order effects, etc. were validated by manual analysis. However, just as with the manual analysis of tandem mass spectra, manual analysis alone is not feasible for the throughput levels of even current larger-scale experimentation. Chaorder performed all of the above analysis in a matter of few minutes or hours, depending on the size of the data on a single Linux desktop computer. We are not aware of any other method that can perform the above similarity analysis with such a high efficiency and quality. The software Chaorder is available on request. Please email
[email protected].
8
Downloaded from www.mcponline.org by on February 6, 2008
analyzers have different mass resolutions. Thus, there are global variations that are to be expected depending on the experimental setup. We do not want to penalize for these variations. Other than these, each experiment setup also leads to other variations, e.g., different amounts of sample loaded that does not change all peptide levels proportionately. These are the variations that we aim to capture in the similarity measure developed here. Prakash et al. [9] presented the ChAMS method to align LC-MS experiments based on raw MS1 signals, following the principle described above. The alignment algorithm is based on a score that measures the similarity between pairs of mass spectra. Using this, ChAMS produces a mapping between related spectra, i.e. the spectra that contain peaks generated from the same peptide. The alignment score is the average spectra similarity score between all spectra pair that are in the alignment map, and the alignment is entirely based on information from the MS1 level. The algorithm is capable of handling data from different mass analyzers (e.g., FT, TOF, LTQ, etc.) by tuning the mass resolution parameter (called ε in [9]). More details of this algorithm are given in the Appendix. Specifically, given a list containing N LC-MS experiments (possibly from different instruments), Chaorder computes the alignment score between every pair of experiments. If A and B be two LCMS experiments, their distance d(A,B) is [9]: sc(A,B) d(A,B) 1 sc(A, A) sc(B,B)
Discussion
9
Downloaded from www.mcponline.org by on February 6, 2008
The assessment of reproducibility and similarity between LC-MS experiments is the first step towards bias-free and statistically-powerful experimental setups that can help identify biomarkers. High reproducibility is an essential prerequisite for any analytical approach that aims to address this problem. Careful designs, such as randomized run order, may help minimize artifacts, but do not eliminate them. This is evident in the simulated biomarker experiment, where the relatively strong variation introduced by the emulated “biomarker” was offset by other types of variations. The fact that such strong effects compound data analysis of a controlled and relatively simple analytical challenge suggests that variation may present even bigger problems in many applications. The other case studies illustrate the need to systematically address the biases of each step of an experimental setup, for example, the freeze/thaw cycles, number of washes, chemicals used, trypsin digestion efficiency, instrument choices, column choices, the day the experiments were run, the scientist performing each step, etc. Judicious decisions are required about each of these to achieve highest levels of reproducibility at each step, thus yielding an experiment design that allows statistically valid conclusions about the underlying biological phenomena. To this end, the software tool Chaorder that can assess global LC-MS experimental reproducibility and similarity, and can be used as a robust and fast method for quality control of LC-MS data. To the best of our knowledge, Chaorder is the only software capable of doing this in an efficient manner. Using Chaorder, we presented results from various studies, completed in a number of different laboratories that show experimental reproducibility being significantly affected by sample processing steps, experimental protocol and instrument choices. The low degree of reproducibility indicated by Chaorder in all case studies suggests a widespread need for quality-control experiments, and the use of quality assessment tools before any kind of comparative data analysis (MS or MS2, e.g., Sequest [39]). Chaorder can also be used to identify outlier experiments, which can either be analyzed manually, or be removed from downstream analysis, if feasible. When deciding between different types of columns, Chaorder allows an assessment of their reproducibility. Measuring the experimental quality of the wash runs, Chaorder can suggest how many wash runs lead to a clean column. Chaorder can help tune the experimental setup, e.g., indicate if a longer column is required for higher reproducibility, if the column is degraded beyond tolerable limits, etc. It can also help tune the sample processing steps, e.g., the choice of parameters for freeze/thaw cycles, choice of chemicals, etc. The problems discussed here may be tolerable for studies in which qualitative aspects matter most, but appear critical to render mass spectrometry useful as a quantitative survey and discovery tool. As we used data from multiple mass spectrometry laboratories, these issues do not appear to be unique to any one of them; instead, they are general challenges facing the mass spectrometry community. We find that one of the strongest parameters that affect the experimental reproducibility is the order in which experiments are run. There are multiple possible reasons, e.g. change in instrument calibration over time, sample degradation, LC column degradation, etc. As results from different labs show similar problems, most likely it is a combination of all these causes
that are biasing the results. Significant efforts need to be focused in this direction to understand and eliminate these causes, so as to strengthen the downstream analysis. Beyond reproducibility across different instruments, sample processing protocols within a single laboratory, Chaorder is also a first step towards the comparison and standardization of experimental platforms and conditions across laboratories, another important step to make mass spectrometry more reliable, trustworthy, and relevant for biomedical research.
Acknowledgements We would like to thank Leo Bonilla, Jimmy Eng, Jennifer Sutton, and Sébastien- LiThiao-Té for helpful discussions. We thank the following NIH institutions for partial support of this research: UW NIEHS sponsored Center for Ecogenetics and Environmental Health (NIEHS P30ES07033) and the NW RCE for Biodefense and Emerging Infectious Diseases (1U54 AI57141-01).
1. Aebersold, R. and Goodlett, D.R. (2001) Mass spectrometry in proteomics. Chem. Rev. 101(2), 269–295. 2. Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422(6928), 198–207. 3. Peng, J., et al. (2003) Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J. Proteome Res., 2(1), 43–50. 4. Tyers, M. and Mann, M. (2003) From genomics to proteomics. Nature 422(6928), 193–197. 5. Ideker, T., et al. (2001) A new approach to decoding life: Systems biology. Annu. Rev. Genomics Hum. Genet. 2, 343–372. 6. Desiere, F., et al. (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 6:R9. 7. Smith, R.D., et al. (2002) An Accurate Mass Tag Strategy for Quantitative and High Throughput Proteome Measurements. Proteomics 2(5), 513-523. 8. Bear, E., et al. (2004) Improving Large-Scale Proteomics by Clustering of Mass Spectrometry Data. Proteomics 4(4), 950-960. 9. Prakash A., et al. (2006) Signal Maps for Mass Spectrometry-based Comparative Proteomics. Mol. Cell. Proteomics 5, 423-432. 10. Listgarten, J. and Emili, A. (2005) Statistical and Computational Methods for Comparative Proteome Profiling using Liquid Chromatography-Tandem Mass Spectrometry. Mol. Cell. Proteomics 4, 419-434. 11. Radulovic, D., et al. (2004) Informatics platform for global proteomic profiling and biomarker discovery using liquid-chromatography- tandem mass spectrometry. Mol. Cell. Proteomics 10, 984-997. 12. Wang, W., et al. (2003) Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 75, 48184826.
10
Downloaded from www.mcponline.org by on February 6, 2008
Bibliography
11
Downloaded from www.mcponline.org by on February 6, 2008
13. Bylund, D., et al. (2002) Chromatographic alignment by warping and dynamic programming as a pre-processing tool for parafac modelling of liquid chromatography mass spectrometry data. J. Chromatogr. A. 961, 237-244. 14. Listgarten, J., et al. (2005) Multiple alignment of continuous time series. In Advances in Neural Information Processing Systems. MIT Press. 15. Yasui Y., et al. (2003) An automated peak identification/calibration procedure for high-dimensional protein measures from mass spectrometers. J. Biomed Biotechnol., 242-248. 16. Coombes K., et al. (2003) Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Clin. Chem. 49, 1615-1623. 17. Qu Y., et al. (2003) Data reduction using a discrete wavelet transform in discriminant analysis of very high dimensionality data. Biometrics 59:143-151. 18. Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR, Norton S, Kumar P, Anderle M, Becker CH. (2003) Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal Chem. 15;75(18):4818-26. 19. Ransohoff D. (2005) Bias as a threat to the validity of cancer molecular-marker research. Nature 5, 142-149. 20. Hu J., et al. (2005) The importance of experimental design in proteomic mass spectrometry experiments: Some cautionary tales. Briefings in Functional Genomics and Proteomics 3(4), 322-331. 21. Sorace JM and Zhan M. (2003) A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4, 24. 22. Baggerly KA, et al. (2004) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20, 777785. 23. Ransohoff D. (2004) Rules of evidence for cancer molecular-marker discovery and validation. Nat. Rev. Cancer 4, 309-314. 24. Baggerly KA, et al. (2004) Signal in noise: can experimental bias explain some results of serum proteomics tests for ovarian cancer? M.D. Anderson Biostatistics Technical Report UTMDABTR-008-04. 25. Jaffe J.D., et al. (2006). PEPPeR, a Platform for Experimental Proteomic Pattern Recognition. Mol. Cell. Proteomics 5: 1927-1941. 26. ABRF 2006 in Long Beach, California (2005). Journal of Biomolecular Techniques. 16(2): 178 27. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD, Mehigh RJ, Cockrill SL, et al. (2005) HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics. 5(13):3262-77. 28. Elias JE, Haas W, Faherty BK, Gygi SP. (2005) Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat Methods. 2(9):667-75. 29. Molloy MP, Donohoe S, Brzezinski EE, Kilby GW, Stevenson TI, Baker JD, Goodlett DR, Gage DA. (2005) Large-scale evaluation of quantitative
12
Downloaded from www.mcponline.org by on February 6, 2008
reproducibility and proteome coverage using acid cleavable isotope coded affinity tag mass spectrometry for proteomic profiling. Proteomics. 5(5):1204-8. 30. Silva J., et al. (2006) Absolute Quantification of Proteins by LCMS: A Value of Parallel MS Acquisition. Mol. Cell. Proteomics 5, 144-156. 31. Ishihama Y., et al. (2005) Quantitative mouse brain proteomics using culturederived isotope tags as internal standards. Nat. Biotechnology 23, 617-621. 32. Anderson, N.L., and N.G. Anderson. (2002) The human plasma proteome: History, character, and diagnostic prospects. Molecular & Cellular Proteomics 1, 845-867. 33. Karlin, S, and Altschul S. (1993) Applications and statistics for multiple highscoring segments in molecular sequences. Proc. Natl. Acad. Sci. 90, 5873-5877. 34. Ewing B, Green P. (1998) Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Research 8:186-194. 35. Bellew M, et al. (2006) Submitted 36. Flory M., et al. (2005) Quantitative proteomic analysis of the budding yeast cell cycle using acid-cleavable isotope-coded affinity tag reagents, Submitted.. 37. Snyder, L.R., et al. (1997) Practical HPLC Method Development. Published by Wiley Interscience, 2nd Edition. 38. Abdi H. (2007) Metric Multidimensional scaling. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage. 39. Eng, J., et al. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass. Spectrom. 5, 976–989.
Figure 2: Chaorder plot for repeats of human serum on time-of-flight instrument. Each square represents an LC-MS experiment. The key for each experiment is written in the square. Data labels represent the run order.
13
Downloaded from www.mcponline.org by on February 6, 2008
Figure 1: Chaorder plot for instrument reproducibility analysis. Each square represents an LC-MS experiment generates from the same sample. Blue squares represent experiments run on QSTAR and red squares represent experiments run on QTOF. Data labels represent the run order.
Figure 4: Chaorder plot for 4-5 protein hand mix data on TOF. Each square represents an LC-MS experiment. Blue squares represent 4-protein samples, and red squares represent 5-protein samples. Data labels represents the dates on which experiments were run.
14
Downloaded from www.mcponline.org by on February 6, 2008
Figure 3: Chaorder plot for Angiotensin II analyzed by LC-MS. Each data point represents an LCMS experiment. The two geometric shapes, squares and triangles, represent two different C18 HPLC columns used. Color represents differences in data acquisition date: light blue; 04/22/05; dark blue: 04/25/05; pink: 05/06/05: red: 05/09/05; yellow: 06/13/05; green: 06/23/05. Each data point is numbered in sequence from first (#0) to last (#44) date of acquisition.
Figure 6: Chaorder plot for Huntington’s disease study in mouse performed on QTOF for 3-month old mice. Each square represents an LC/MS experiment. Different colors represent different mice. Shape represents the biological condition; squares are for wild type and triangles for heterozygous/homozygous. Data labels represent the run order. Multiple data points with the same color and shape represent repeat experiments.
15
Downloaded from www.mcponline.org by on February 6, 2008
Figure 5: Chaorder plot for LC/LC-MS yeast cell lysate cell cycle time series study on LCQ-Deca XP. Each square represents an LC-MS experiment on a SCX fraction. Blue squares represent time point 0 and red represents time point 30. Data labels represent the number of the SCX fraction. Multiple squares with the same label represent repeat experiments.