Utilizing data-driven modeling and proteomic approaches for predicting mucosal immunity of the female genital tract K. Benedict1, K. Mogk2,3, L. McKinnon4, R. Novak5, T. Ball2,3, G. Westmacott3, D. Lauffenburger1, A. Burgener2,3,4 1Department
of Biological Engineering, Massachusetts Institute of Technology, USA. 2Department of Medical Microbiology, University of Manitoba, Canada. 3Public Health Agency of Canada, Canada. 4Karolinska Institutet, Sweden, 5University of Chicago, USA.
Background: • Problem: A major impediment to designing HIV vaccines/ microbicides to elicit protective mucosal immunity is a lack of understanding of what immunological patterns predict HIV susceptibility. • This is likely a function of the general complexity of the mucosal immune system, which contains thousands of factors, many involved in host defense against HIV as well as general immune functioning • This indicates that factors that modulate risk of HIV infection are complex and likely represent an interplay of multiple components, and inhibition or enhancement of HIV mucosal infection, in vivo, likely depends on the summation of all these biological factors • Reductionists approaches, studying predefined immune factors in isolation, are unlikely to make associations or predictions on disease outcome. Therefore new approaches are critically needed. • Multivariate computational modeling, utilizing comprehensive high-throughput datasets, is fast emerging as an effective method to unravel complex immunological systems. Partial least squares determinant analysis (PLSDA) is one such technique to determine patterns of features that best distinguish groups. • Here we explore the utility of this approach to examine bacterial vaginosis as a surrogate mucosal condition.
CVL fluid
Controls (n=33)
Reduction
Mass spectrometry analysis
Alkylation
Protein Processing & Digestion
Peptides
Fractionation
Decision Tree Analysis determined of the hierarchy of importance of biomarkers in discrimination of BV and control samples
LC-MS/MS
Lys
BV+ (n=10)
BV
Proteomic dataset Ø 700 unique proteins identified/quantified Ø Human (host) proteins (~600) selected for analysis
metabolic protein C ≥ 8,975
structural protein E < 73,537
BV
structural protein E ≥ 73,537
control 0 BV 34 C
This decision tree performed with 96% accuracy on calibration.
Results:
Conclusions:
LASSO feature selection suggested 14 features (out of 410) that best distinguished between BV+ and control groups.
1. Mucosal proteomic datasets, parsed with multivariate analysis, can distinguish a mucosal clinical condition with high accuracy. 2. This analyses reveals novel host metabolic enzymes likely involved with pathogenesis of bacterial vaginosis, as well as potential new diagnostic biomarkers. 3. Our hypothesis is that this approach could identify specific new protein expression profiles that predict increased risk of HIV acquisition. 4. This approach could be applied to vaccine/ microbicide endpoints for efficacy and potential insight into mechanisms of protection.
Functional annotation of proteins identified by LASSO
• Here we show the how this approach can unravel novel insight into the immunology of this disease as well as distinguish mucosal disease groups
Partial Least Squares Discriminant Analysis (PLSDA) differentiated between BV and control samples using 14 proteins identified by LASSO
Objectives:
metabolic protein C < 8,975
Protein quantification/ identification
BV control
Acknowledgements:
1. Characterize the complete proteome of cervicovaginal fluid of bacterial vaginosis (BV) positive individuals and negative controls by mass spectrometry.
We would like to express our gratitude to all study subjects for their participation, and many thanks to Fiona Lynch and Ellen Verlen for sample collection. We would also like to thank Max Abou and Stuart McCorrister for technical support.
2. Employ various data-driven modeling techniques (LASSO, PLSDA, and decision trees) to distinguish BV+ samples from BV- samples based on multivariate biomarker profiles.
Supported by: Canadian Institutes of Health Research (AB), Public Health Agency of Canada, Ragon Institute Postdoctoral Fellowship (K.F.B.)
3. Validate this model on a independent population of BV+ and BV- samples.
PLSDA loadings plot (right panel) indicates protein patterns that best discriminate between BV+ and control samples, as shown in the scores plot (left panel). The model performed with 95% accuracy on calibration and 94% accuracy on cross-validation.
.