Computers in Biology and Medicine 63 (2015) 36–41
Contents lists available at ScienceDirect
Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm
Application of dual tree complex wavelet transform in tandem mass spectrometry Selvaraaju Murugesan a, David B.H. Tay a,n, Ira Cooke b, Pierre Faou b a b
Department of Electronic Engineering, La Trobe University, Bundoora, Victoria 3086, Australia Department of Biochemistry, La Trobe University, Bundoora, Victoria 3086, Australia
art ic l e i nf o
a b s t r a c t
Article history: Received 8 February 2015 Accepted 2 May 2015
Mass Spectrometry (MS) is a widely used technique in molecular biology for high throughput identification and sequencing of peptides (and proteins). Tandem mass spectrometry (MS/MS) is a specialised mass spectrometry technique whereby the sequence of peptides can be determined. Preprocessing of the MS/MS data is indispensable before performing any statistical analysis on the data. In this work, preprocessing of MS/MS data is proposed based on the Dual Tree Complex Wavelet Transform (DTCWT) using almost symmetric Hilbert pair of wavelets. After the preprocessing step, the identification of peptides is done using the database search approach. The performance of the proposed preprocessing technique is evaluated by comparing its performance against Discrete Wavelet Transform (DWT) and Stationary Wavelet Transform (SWT). The preprocessing performed using DTCWT identified more peptides compared to DWT and SWT. & 2015 Elsevier Ltd. All rights reserved.
Keywords: Tandem mass spectrometry Proteomic data processing Dual tree complex wavelet transform Signal denoising Peptides detection
1. Introduction Many techniques in analytical chemistry involve matching observed signals against an expected theoretical pattern. Examples include identification of analytes based on chromatographic signatures, analysis of 2D gel electrophoretic images and identification of peptides by tandem mass spectrometry. Selection of appropriate signal processing algorithms can significantly affect the sensitivity and reproducibility of end results when using such techniques [1,2]. This is particularly important in the field of mass spectrometry based proteomics, where recent advancements have led to a rapid expansion of the volume of data that is available for processing, and where novel approaches such as SWATH [3] place greater demands on the data. Mass Spectrometry (MS) based proteomic analysis is a widely used technique to study interrelations between protein expressions and also to study relationships between proteins themselves [4]. The basic principle behind MS in proteomics is to fragment complex protein molecules via soft-ionisation techniques into smaller molecules such as peptides or amino acids so they are more readily analyzed [6]. The fragmented ions are separated according to their mass to charge ratio (m/z) which is measured in n
Corresponding author. E-mail addresses:
[email protected] (S. Murugesan),
[email protected] (D.B.H. Tay),
[email protected] (I. Cooke),
[email protected] (P. Faou). http://dx.doi.org/10.1016/j.compbiomed.2015.05.002 0010-4825/& 2015 Elsevier Ltd. All rights reserved.
Daltons. A typical MS scan of a known dataset 3666 is shown in Fig. 1. Peptide mass fingerprint is determined by the extraction of the set of measured peptide masses. There are many algorithms developed [7,8] that match the experimental data against the theoretical masses obtained from the in silico digestion at the same enzyme cleavage sites of all protein amino acid sequences in the database. The proteins in the database are then ranked according to the number of peptide masses matching their sequence within a given mass error tolerance. Tandem mass spectrometry is a specialised mass spectrometry technique whereby the sequence of peptides can be determined. The precise identification of peptides and proteins is indispensable in developing new drugs for the treatment of human diseases such as cancer, diabetes and asthma. Tandem mass spectrometry is also called the MS/MS technique. In the MS/MS technique, peptide ions of interest are first selected in a precursor ion scan. Those ions selection is based on relative abundance. The direct derivation of peptide sequence from the MS/MS spectrum can be obtained by matching to theoretical spectra from peptides in a database (sequence database search) [9], matching to curated reference spectra (spectral database search) [10], or de novo sequencing [11]. The method in most widespread use is sequence database search because existing spectral databases are not comprehensive enough, and because de novo sequencing is often impractical due to the existence of missing peaks in most MS/MS spectra. In the sequence database search method the m/z of the precursor ion is used to select a set of candidate peptides with matching masses
S. Murugesan et al. / Computers in Biology and Medicine 63 (2015) 36–41 140
Hilbert pair of wavelets [30]. The previous works focus on the MS preprocessing and this is the first work to concentrate on the MS/MS preprocessing. The other novelty of the paper is the application of the DTCWT in preprocessing the tandem mass spectrometry data. The overview of the paper is as follows. In Sections 2 and 3 we give a brief overview of the proteomics preprocessing stages and data collection process. The review of almost symmetric Hilbert pair of wavelets is presented in Section 4. In Section 5 we preprocess the MS/MS data using Dual Tree Complex Wavelet Transform and we discuss the results. The Section 6 concludes the paper.
120
100
Intensity
37
80
60
40
2. Data collection 20
0
0
500
1000
1500
2000
2500
3000
M/Z Fig. 1. Plot of a MS scan of the dataset 3666.
from a database. Theoretical MS/MS spectra are then generated for these candidate peptides using the rules of peptide fragmentation, and these are compared with the experimental MS/MS spectrum, and the best match is determined using a predefined scoring system 6. The MS/MS spectrum is usually corrupted by electrical noise, chemical noise and machine artifacts. Performing statistical and computational analysis on the noisy MS/MS is an extremely challenging task. Thus preprocessing of MS/MS data plays a vital role in identification of peptides. If the preprocessing of MS/MS produces false peaks, it will reduce the number of correctly identified peptides in the sample. The Discrete Wavelet Transform (DWT) [12,13] is a versatile tool that has been used in a plethora of applications [14–17]. Wavelets are also widely used in the preprocessing stages of the proteomics data as it facilitates multi-resolution analysis [18,19]. The isotope wavelet transform proposed in [20] is an efficient framework for detecting isotope patterns in the MS data sets. The isotope wavelet transform shows the versatility of the wavelet transform in detecting region of interest in the MS scans with low number of false positives. Coombes et al. [21] used the undecimated discrete wavelet transform (DWT) to denoise the MS spectra. However the undecimated DWT requires higher computational time compared to the critically decimated DWT. The rationale behind using the undecimated discrete wavelet transform is that it is shift invariant. Kwon et al. [22] used wavelet based denoising technique to reduce the noise from the MS data. Kwon et al. [22] used the undecimated DWT to remove the chemical and instrument noise from MS spectra using the hard thresholding technique. Kwon et al. [22] showed that the noise is heterogeneous in the MS experiments and it is not uniform in each MS scan. In their work, the data is segmented based on the variance change and the threshold for each of the segmented data is computed. The noise variance of each segment is estimated using the Median Absolute Deviation criterion [23]. To detect the variance change in the MS spectra, Kwon et al. [22] used an iterated cumulative sums of squares algorithm proposed by Gabbanini et al. [24]. Li et al. [25] showed that wavelet based denoising improved the performance of machine learning methods. Morris et al. [26] applied wavelet transform for feature extraction and quantification of MS data. The Dual Tree Complex Wavelet Transform (DTCWT) introduced by Kingsbury has emerged as one of the most popular redundant transform in a wide variety of applications [27–29]. The DTCWT has near shift invariance, provides directional selectivity in multidimensions and lower redundancy than the undecimated DWT [28]. In this work we present new techniques to preprocess the MS/MS using the Dual Tree Complex Wavelet Transform with the newly designed almost symmetric
All the biological experiments for this paper were conducted at LaTrobe Institute of Molecular Science (LIMS), La Trobe University, Australia. We obtained six datasets that correspond to three biologically independent samples with two technical replicates per sample. All three biological samples correspond to extracts from human cancer cells after enriching for membrane glycoproteins and were subject to identical cell culture and sample preparation procedures. Importantly, all samples are relatively complex, containing many different peptides with different abundances. Prior to mass spectrometry all samples were reduced, alkylated and digested with trypsin. This process results in the production of a large number of distinct peptides of an appropriate size and charge for tandem mass spectrometry analysis. Peptide samples were then loaded onto a trap column (C18 PepMap 100 μm I.D. 2 cm trapping column, Dionex) at 5 μL/min for 6 min and washed for 6 min before switching the precolumn in line with the analytical column (Vydac MS C18, 3 μm, 300 and 75 μm I.D. 25 cm., Grace Pty. Ltd). The separation of peptides was performed at 300 nL/min using a linear ACN gradient of buffer A and buffer B (0.1% formic acid, 80% ACN), starting from 5% buffer B to 60% over 90 min. Data were collected on an hybrid quadrupole/time-of-flight MS (MicroTOF-Q, Bruker, Germany) with a nano-electrospray ion source using Data Dependent Acquisition mode and m/z 150–2500 as MS scan range. Nitrogen was used as the collision gas. The ionisation tip voltage and interface temperature were set at 4200 V and 205 1C, respectively. CID MS/MS spectra were collected for the 4 most intense ions. Dynamic exclusion parameters were set as follows: repeat count 2, duration 60 s. The data were collected and analysed using Data Analysis Software (Bruker Daltonics, Bremen, Germany).
3. Preprocessing of MS data In order to identify and quantify the proteins in the sample, regions of interest corresponding to peptides (features) must be extracted from the raw MS spectra. These features consist of sets of closely spaced peaks whose arrangement can be used to deduce the peptide charge and monoisotopic mass. This information is combined with the MS/MS scans to determine the amino acid sequence of peptides. In addition, a quantitative measure for each feature is usually obtained by taking the area under the curve (AUC) across all the peaks associated with a feature, and this in turn can be used to infer relative protein concentration. Since the raw MS and MS/MS spectra are usually corrupted with noise and baseline artifacts, preprocessing plays a vital role in both quantitation and identification aspects of the experiment. In this paper we are mostly concerned with the application of the DTCWT algorithm to MS/MS spectra and its effects on the peptide identification process. In order to measure the effects of this algorithm on a single aspect of the system (MS/MS preprocessing) in isolation we used the msconvert tool [5] to apply standard
38
S. Murugesan et al. / Computers in Biology and Medicine 63 (2015) 36–41
algorithms supplied by the instrument manufacturer (Bruker Daltonics) for MS preprocessing. An overview of MS preprocessing is as follows: 1. 2. 3. 4.
Noise filtering Baseline correction Peak and feature detection Normalisation
The first preprocessing step is noise filtering. The data from the MS experiment are corrupted by chemical and instrument noise. Since the volume of MS data is large, noise filtering is essential to clean the data. Wavelet based noise filtering is widely applied to denoise the MS data [22,21,20]. The hard thresholding of wavelet coefficients is performed after the decomposing the MS data using the DWT. Then the signal is reconstructed by taking inverse DWT. The baseline artifact is due to chemical noise and detector overload in the MS instrument. The baseline is estimated by taking the local minimum of a fixed width window. Once the baseline is estimated, it is then subtracted from the spectrum to get the baseline corrected spectrum. The important thing in baseline correction is that it should not remove the peak information from the spectrum. The baseline correction must be performed only after the noise filtering step. Peak detection is the process of distinguishing interesting peaks from noise. In a proteomics experiment, the most scientifically interesting peaks are those that form features corresponding to peptides in the MS data. The recent algorithms for peak detection are based on wavelets [32,33,20] and are able to detect both individual peaks and entire peptide features. The purpose of normalisation is to identify and remove sources of systematic variation between spectra due to varying amounts of protein or degradation over time in the sample or even variation in the instrument detector sensitivity. In MS spectrum, each peptide concentration is measured by area under the curve (AUC) of its corresponding feature and the natural choice of scaling is the average AUC. The average AUC can be obtained from Total Ion Chromatogram (TIC). The choice of normalisation step alone can greatly affect the post processing stage [34]. The TIC of the dataset 3666 is shown in Fig. 2.
4. Review of almost symmetric Hilbert pair of wavelets P Let H 0 ðzÞ Ln ¼ ðL 1Þ h0 ðnÞz n be the (even) length Lf ¼ 2L analysis low pass filter of a two channel orthogonal filter bank (FB) and is also known as a CQF (conjugate quadrature filter). Note that an
almost-centered-at-the-origin (ACO) version of the CQF with support n A ½ ðL 1Þ; L is considered for convenience. The synthesis low pass filter is given by F 0 ðzÞ ¼ z 1 H 0 ðz 1 Þ. The analysis and synthesis high pass filters H1 and F1 respectively are obtained as follows: H 1 ðzÞ ¼ z 1 H 0 ð z 1 Þ; F 1 ðzÞ ¼ zH 0 ð zÞ. The scaling and wavelet functions, ϕðtÞ and ψðtÞ respectively, are obtained pffiffiffiPfrom the filter coefficientspvia ffiffiffiPthe two-scale equations ϕðtÞ ¼ 2 n h0 ðnÞϕð2t nÞ and ψðtÞ ¼ 2 n h1 ðnÞϕð2t nÞ. To ensure smooth scaling and wavelet functions zeros at z ¼ 1 are imposed on H 0 ðzÞ, and this is known as the (wavelet) vanishing moment (VM) condition. The DTCWT is implemented using a pair of two-channel perfect reconstruction (PR) multirate filter banks. The equivalent wavelet function of the two filter banks, ψ h ðtÞ and ψ g ðtÞ (with Fourier transform Ψ h ðωÞ and Ψ g ðωÞ respectively) should ideally satisfy the following Hilbert transform relationship: ( jΨ h ðωÞ; ω4 0 Ψ g ðωÞ ¼ ð1Þ ωo 0 jΨ h ðωÞ; The wavelets (ψ h ðtÞ; ψ g ðtÞ) form a Hilbert-pair (and the same can be said of the corresponding pair of filter banks) and can be either be biorthogonal or orthogonal. The dual-tree complex wavelet transform is based on a pair of filter banks. The upper and lower tree filters are denoted by superscripts h and g respectively. It is proven in [37] that Eq. (1) will hold if the low-pass filters of the filter banks, H h0 ðzÞ and H g0 ðzÞ satisfy H g0 ðejω Þ ¼ e jω=2 H h0 ðejω Þ
ð2Þ
H g0 ðzÞ
Thus the two filters and should satisfy the half sample delay condition (2). By defining Ψ C ðωÞ as the complex wavelet spectrum given by Ψ C ðωÞ ¼ Ψ h ðωÞ þ jΨ g ðωÞ, quality measures based on the level of analyticity can be defined [38] as R0 j Ψ C ðωÞj 2 dω maxω o 0 j Ψ C ðωÞj ; E E1 R11 C 2 C 2 maxω 4 0 j Ψ ðωÞj 0 j Ψ ðωÞj dω Ideally if (1) is exact, then for ω o 0, Ψ C ðωÞ ¼ Ψ h ðωÞ þ jðjΨ h ðωÞÞ ¼ Ψ h ðωÞ Ψ h ðωÞ ¼ 0, i.e. Ψ C ðωÞ is complex analytic. This would mean E1 ¼ E2 ¼ 0 if (1) is exact. In practice (1) can only be approximated and therefore E1 and E2 are non-zero. The smaller the E1 and E2 values, the better the analytic quality. The complex valued (2) can be separated into magnitude and phase parts as follows: j H h0 ðωÞj ¼ j H g0 ðωÞj
ð3Þ
∠H h0 ðωÞ ¼ ðω=2Þ þ ∠H g0 ðωÞ
ð4Þ
90
Esym
80
Relative Intensity
π r ω rπ
The filters in the h and the g tree that are designed in the lattice framework by minimizing the measures Esym and Eodd
100
L X
h
h
g
g
ðh0 ðnÞ h0 ð1 nÞÞ2
n¼1
70
Eodd
LX 1
g
ðh0 ðnÞ h0 ð nÞÞ2 þ ðh0 ðLÞÞ2
n¼1
60 50 40 30 20 10 0
for
H h0 ðzÞ
0
1000
2000
3000
4000
Retention Time Fig. 2. Total ion chromatogram.
5000
6000
The above conditions on the filters will satisfy the phase condition (4) exactly while the magnitude condition (3) is satisfied approximately. In [30], we have successively re-optimised the filters in g tree by relaxing the phase condition (4) such that a better magnitude approximation can be achieved to yield a better analytic quality Hilbert pair filters. The designed filters are almost symmetric orthogonal Hilbert pair of filters and they have approximately linear phase characteristics. We have designed length 12 Hilbert pair of filters having 2 vanishing moments [30]. Fig. 3 (top) shows the plot of ψ h ðtÞ; ψ g ðtÞ and the corresponding complex envelope that are virtually symmetric. The complex wavelet spectrum shown in Fig. 3
S. Murugesan et al. / Computers in Biology and Medicine 63 (2015) 36–41
COMPLEX WAVELET
2 1 0 −1 −2 −3
−2
−1
0
1
2
3
4
COMPLEX SPECTRUM
4 3 2 1 0 −50
−40
−30
−20
−10
0
10
20
30
40
50
ω Fig. 3. Top: wavelet function ψ h ðtÞ; ψ g ðtÞ and j ψ C ðtÞj . Bottom: spectrum of complex wavelet.
(bottom) is approximately analytic with quality measures E1 ¼ 0:5219% and E2 ¼ 0:006%. In this work we use the designed length 12 almost symmetric Hilbert pair of filters having 2 vanishing moments in the DTCWT. 5. MS/MS denoising In this section, we present techniques to preprocess and postprocess the MS/MS data. Most of the literature concentrates on processing the MS data and this is the first work to focus on the MS/MS data. Preprocessing of MS and MS/MS data is vital for efficient post-processing tasks such as peak detection and peak quantification. The preprocessing steps begin with denoising followed by baseline correction and normalisation [39]. The post-processing begins with peak detection and quantification [21,26] to produce a reduced dataset that can then be used by peptide identification and/or sequencing software. The MS/MS signal is modelled as follows: yðtÞ ¼ sðtÞ þ BðtÞ þ ϵðtÞ where y(t) is the observed signal, s(t) is the original signal, B(t) is the baseline drift and ϵðtÞ is the noise which is modelled as zero mean white Gaussian. The main aim of the preprocessing stage is to get s(t) from y(t). The procedure to preprocess the MS/MS scans using the DTCWT is described below: 1. The first step is noise filtering. We observed from all our experiments that the noise in the MS/MS spectra does not show any variance change. We apply the DTCWT to decompose the MS/MS spectra. The hard thresholding technique is deployed to reduce the noise. The threshold (Tp) is set to three times the noise variance. The hard thresholding of the wavelet coefficient is defined as ( djk if j dij j 4T p δT ¼ 0 if j dij j rT p where djk is the wavelet coefficient at the level j at kth index. The noise variance is estimated using the Median Absolute Deviation [23] and it is given by σ^ ¼
medianðd11 Þ 0:6745
where d11 is the detail coefficient at the first level.
39
2. After hard thresholding, we apply the inverse DTCWT to reconstruct the MS/MS signal. The next step is baseline correction. The baseline component is then removed by computing a monotone local minimum curve on the denoised signal. We have used the Matlab function msbackadj to correct the baseline and the window width for estimating the local minima is set to 200 separation units. 3. The signal is normalised by standardizing the area under the curve (AUC). The normalisation can be performed using the Matlab command msnorm. 4. Any local maximum after denoising, baseline correction and normalisation is assumed to be a peak. Firstly all the local maxima and the associated peak endpoints are computed. Then the signal to noise ratio at each local maxima is calculated. All the local maxima that are greater than a threshold is considered as peaks. The threshold is set at 10% of the maximum signal to noise ratio of that particular scan. 5. After peak picking, the peak lists are written into Mascot Generic Format (mgf) format and sent to the Mascot search engine [35]. The schema of the mgf format is available from Matrix Science, Ltd., London, United Kingdom. All Mascot searches used the following parameters: precursor mass tolerance 80 ppm, fragment ion mass tolerance 0.65Da, deamidated N as a variable modification, carbamidomethyl C as a fixed modification, up to one missed enzyme cleavage. The search database used included all Mammalian entries from RefSeq [36]. 6. Raw Mascot search scores are then statistically analysed using Peptide Prophet [40] to produce a confidence level for each. 7. We then count the number of peptides obtained which have more than 95% of confidence level.
5.1. Discussion The plot of first scan with a peptide match from dataset 3666 is shown in Fig. 4. This shows that most of the peaks are in the lower m=z range and that this is also where most of the noise is located. Efficient preprocessing of the datasets will improve quality of MS/MS spectra and it will lead to a greater number of identified peptides. After pre- and post-processing (see Section 5) MS/MS spectra were submitted for database search using the Mascot search engine. This process attempts to match observed spectra with theoretical spectra from a large database of possible peptide sequences. We used the number of high scoring peptide sequence matches (PSM's) as a measure of the performance of the preprocessing algorithm. The number of identified peptides is shown in Table 1. We compare the results of DTCWT with other transforms such as DWT and Stationary Wavelet Transform (SWT) on the basis of the numbers of identified peptides. The length 12 Hilbert pair of filters having 2 VMs are used in DTCWT. The Daubechies length 12 filter having 6 VMs is used in DWT and SWT. In all the cases, the DTCWT performs better than the DWT and has a comparable performance to the SWT. However the computational load of the SWT is higher than the DTCWT.
6. Conclusion In this paper, we have proposed a new preprocessing of the MS/MS data. We have evaluated the performance of three transform namely DWT, SWT and DTCWT. We have shown that the DTCWT outperforms the DWT and SWT whereby detecting more number of peptides. The shift invariant properties of the DTCWT along with the approximate linear characteristics of the Hilbert pair
40
S. Murugesan et al. / Computers in Biology and Medicine 63 (2015) 36–41
Fig. 4. Single scan from dataset 3666. Vertical grey bars show peaks picked after DTCWT has been applied and coloured annotations show theoretical peaks corresponding to b, y and neutral loss ions for the peptide SQCSVGSVTPTSSLWR.
Table 1 Count of peptide sequence matches. Sample
Dataset
DWT
DTCWT
SWT
1 1 2 2 3 3
3666 3681 3670 3685 3677 3690
59 62 59 47 63 65
64 61 67 48 63 68
61 64 67 45 62 65
of wavelets have proved to be very efficient compared to DWT and SWT. Although our main focus has been on MS/MS data preprocessing it is also possible that the new preprocessing technique proposed here (DTCWT) will provide improvements to MS data processing. This would presumably lead to improvements in feature detection, charge assignment and estimates of peptide abundance. We believe that this would be a useful avenue for future work, but note that devising benchmarks is more challenging since key information derived from the MS signal such as feature intensity is very difficult to independently verify.
Conflict of interest None declared. References [1] D. Michał, B. Walczak, Use and abuse of chemometrics in chromatography, Trends Anal. Chem. 25 (11) (2006) 1081–1096. [2] Å.M. Wheelock, A.R. Buckpitt, Software-induced variance in two-dimensional gel electrophoresis image analysis, Electrophoresis 26 (2005) 4508–4520. [3] B.C. Collins, L.C. Gillet, G. Rosenberger, H.L. Röst, A. Vichalkovski, M. Gstaiger, R. Aebersold, Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system, Nat. Methods 10 (2013) 1246–1253. [4] J.S. Pawel Ciborowski, Proteomic Profiling and Analytical Chemistry the Crossroads, Elsevier, Netherlands, 2013.
[5] D. Kessner, M. Chambers, R. Burke, D. Agus, P. Mallick, ProteoWizard open source software for rapid proteomics tools development, Bioinformatics 24 (21) (2008) 2534–2536. [6] H. Steen, M. Mann, The ABC's (and XYZ's) of peptide sequencing, Nat. Rev. Mol. Cell Biol. 5 (Sept 2004) 699–711. [7] P.H.M. Mann, P. Roepstorff, Use of mass spectrometric molecular weight information to identify proteins in sequence databases, Biol. Mass Spectrom. 6 (22) (1993) 338–345. [8] W.J. Henzel TM, J.T Billeci, S.C Stults, C. Wong, C. Grimley, Watanabe Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases, Proc. Natl. Acad. Sci. USA 90 (11) (1993) 5011–5015. [9] D.N. Perkins, D.J. Pappin, D.M. Creasy, J.S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data, Electrophoresis 20 (December (18)) (1999) 3551–3567. [10] J.K. Eng, A.L. McCormack, J.R. Yates, An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database, J. Am. Soc. Mass Spectrom. 5 (11) (1994) 976–989. [11] A.M. Frank, M.M. Savitski, M.L. Nielsen, R.A. Zubarev, P.A. Pevzner, De novo peptide sequencing and identification with precision mass spectrometry, J. Proteome Res. (2007) 114–123. [12] I. Daubechies, Ten lectures on wavelets, in: Ser. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 61, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992. [13] P.P. Vaidyanathan, Multirate Systems and Filter Banks, in: E. Cliffs (Ed.), Prentice-Hall, USA, 1992. [14] S. Farokhi, S.M. Shamsuddin, U.U. Sheikh, J. Flusser, M. Khansari, K. JafariKhouzani, Near infrared face recognition by combining Zernike moments and undecimated discrete wavelet transform, Digit. Signal Process. 31 (August) (2014) 180–190. [15] D. Tay, S. Murugesan, Energy optimized orthonormal wavelet filter bank with prescribed sharpness, Digit. Signal Process. 31 (August) (2014) 136–144. [16] N.M. Makbol, B.E. Khoo, A new robust and secure digital image watermarking scheme based on the integer wavelet transform and singular value decomposition, Digit. Signal Process. 33 (October) (2014) 134–147. [17] S.M. Govindan, P. Duraisamy, X. Yuan, Adaptive wavelet shrinkage for noise robust speaker recognition, Digit. Signal Process. 33 (October) (2014) 180–190. [18] A. Ross McDonald, Paul Skpp, Julia Bennell, Chris Potts, Lyn Thomas, C. David O’Connor, Mining whole-sample mass spectrometry proteomics data for biomarkers—an overview, Exp. Syst. Appl. 36 (3) (2009) 5333–5340. [19] Hyun-Woo Cho, Myong K. Jeong, Enhanced prediction of misalignment conditions from spectral data using feature selection and filtering, Exp. Syst. Appl. 35 (1) (2008) 451–458. [20] R. Hussong, A. Hildebr, The isotope wavelet: a signal theoretic framework for analyzing mass spectrometry data, in: Eleventh Annual International Conference on Research in Computational Molecular Biology, 2007. [21] K.R. Coombes, S. Tsavachidis, J.S. Morris, K.A. Baggerly, M.-C. Hung, H.M. Kuerer, Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform, Proteomics 5 (16) (2005) 4107–4117.
S. Murugesan et al. / Computers in Biology and Medicine 63 (2015) 36–41
[22] D. Kwon, M. Vannucci, J.J. Song, J. Jeong, R.M. Pfeiffer, A novel wavelet-based thresholding method for the pre-processing of mass spectrometry data that accounts for heterogeneous noise, Proteomics 8 (15) (2008) 3019–3029. [23] D. Donoho, De-noising by soft-thresholding, IEEE Trans. Inf. Theory 41 (3) (1995) 613–627. [24] F. Gabbanini, M. Vannucci, G. Bartoli, A. Moro, Wavelet packet methods for the analysis of variance of time series with application to crack widths on the Brunelleschi dome, J. Comput. Graph. Stat. 13 (3) (2004) 639–658. [25] X. Li, J. Li, X. Yao, A wavelet-based data pre-processing analysis approach in mass spectrometry, Comput. Biol. Med. 37 (4) (2007) 509–516. [26] J.S. Morris, K.R. Coombes, J. Koomen, K.A. Baggerly, R. Kobayashi, Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum, Bioinformatics 21 (9) (2005) 1764–1775. [27] N. Kingsbury, Image processing with complex wavelets, Phil. Trans. R. Soc. Lond. A 357 (1997) 2543–2560. [28] I.W. Selesnick, R.G. Baraniuk, N.C. Kingsbury, The dual-tree complex wavelet transform, IEEE Signal Process. Mag. 22 (6) (2005) 123–151. [29] N. Kingsbury, Complex wavelets for shift invariant analysis and filtering of signals, Appl. Comput. Harmon. Anal. 10 (3) (2001) 234–253. [30] S. Murugesan, D. Tay, A new class of almost symmetric orthogonal Hilbert pair of wavelets, Signal Process. 95 (2014) 76–87. [32] P. Du, W.A. Kibbe, S.M. Lin, Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching, Bioinformatics 22 (17) (2006) 2059–2065.
41
[33] A. Antoniadis, J. Bigot, S. Lambert-Lacroix, Peaks detection and alignment for mass spectrometry data, J. Soc. Franc. Stat. 151 (1) (2010) 17–37. [34] W Meuleman JYMN, Engwegen M-CW, Gast JH, Beijnen MJT, Reinder LFA, Wessel, Comparison of normalisation methods for surface-enhanced laser desorption and ionisation (SELDI) time-of-flight (TOF) mass spectrometry data, BMC Bioinformatics 9 (2008). [35] D.N. Perkins, D.J. Pappin, D.M. Creasy, J.S. Cottrell, Electrophoresis 20 (December (18)) (1999) 3551–3567. [36] K.D. Pruitt, G.R. Brown, S.M. Hiatt, F. Thibaud-Nissen, A. Astashyn, O. Ermolaeva, C.M. Farrell, J. Hart, M.J. Landrum, K.M. McGarvey, M.R. Murphy, N.A. O'Leary, S. Pujar, B. Rajput, S.H. Rangwala, L.D. Riddick, A. Shkeda, S.H. Sun, P. Tamez, R.E. Tully, C. Wallin, D. Webb, J. Weber, W. Wu, M. Dicuccio, P. Kitts, D.R. Maglott, T.D. Murphy, J.M. Ostell, RefSeq: an update on mammalian reference sequences, Nucleic Acids Res. 42(Database issue) (2013) D756–63. [37] I.W. Selesnick, Hilbert transform pairs of wavelet bases, IEEE Signal Process. Lett. 8 (6) (2001) 170–173. [38] D.B.H. Tay, N.G. Kingsbury, M. Palaniswami, Orthonormal Hilbert-pair of wavelets with (almost) maximum vanishing moments, IEEE Signal Process. Lett. 13 (9) (2006) 533–536. [39] V.A. Emanuele, B.M. Gurbaxani, Benchmarking currently available SELDI-TOF MS preprocessing techniques, Proteomics 9 (7) (2009) 1754–1762. [40] A. Keller, A.I. Nesvizhskii, E. Kolker, R. Aebersold, Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search, Anal. Chem. 74 (20) (2002) 5383–5392.