Machine learning analysis substantiates importance of

Report 5 Downloads 54 Views
Machine learning analysis substantiates importance of inter-individual genetic variability in PPAR signalling for brain connectivity in preterm infants Michelle Krishnan, Paul Aljabar, Zi Wang, Gareth Ball, Serena Counsell, Giovanni Montana, and David Edwards, King’s College London, London Ghazala Mirza, University College London, London Alka Saxena, Guy’s and St Thomas’ NHS Foundation Trust, London Introduction The incidence of preterm birth is increasing, with a high proportion of survivors experiencing motor, cognitive and psychiatric sequelae. Prematurity places newborn infants in an adverse environment accentuating their individual ability to cope with systemic challenges, and calls for precision in healthcare interventions. Machine learning strategies are used here to investigate the neurobiological consequences of prematurity. Given the establishment of a large genetic contribution to quantitative neuroimaging features informative of downstream function, and the assumption that a subset of genetic markers will be found in statistically meaningful association with a subset of image features, computational models must be able to select those informative variables. Multivariate sparse regression models such as the sparse Reduced Rank Regression method (sRRR) obviate the need for multipletesting correction and significance thresholds, since this involves fitting a predictive model using all SNPs and ranking them based on their association to the image features (Vounou et al., 2010, Vounou et al., 2012). Method 272 infants (mean gestational age (GA) 29+4 weeks) had magnetic resonance (MR) imaging at termequivalent age (mean post-menstrual age (PMA) 42+4 weeks). 3-Tesla magnetic resonance images were used for probabilistic tractography (Robinson et al., 2008), using a 90-node anatomical neonatal atlas (Shi et al., 2011) and custom neonatal registration pipeline (Ball et al., 2010). A weighted adjacency matrix of brain regions for each infant was converted into a single vector of edge weights based on fractional anisotropy (FA), resulting in one matrix of n individuals by q edges, where n = 272 and q = 4005, adjusted for major covariates (post-menstrual age at scan (PMA), gestational age at birth (GA)) and ancestry. Saliva samples were collected using Oragene DNA OG-250 kits, and genotyped on Illumina HumanOmniExpress-24 v1.1 chip. The genotype matrix was converted into minor allele counts, including only SNPs with MAF ≥5% and 100% genotyping rate (556 227 SNPs). sRRR model parameters: SNPs at each iteration (n = 500), stability selection with 1000 subsamples of size 2/3 subjects, convergence criterion = 1x10-6, resulting in a ranking of all genome-wide SNPs based on their importance in the model. A null distribution was computed by running sRRR in the same way, additionally permuting the order of subjects within the phenotype matrix between each subsample during stability selection with 20 000 subsamples. Results sRRR detected a stable association between SNPs in the PPARγ gene and the imaging phenotype fully adjusted for GA, PMA and ancestry. SNPs in PPARγ were significantly over-represented among the variables with the uniformly highest ranking in the model, contributing to a broader significant enrichment of lipid-related genes among the top 100 ranked SNPs. Discussion In concordance with findings from two previous independent studies of a comparable cohort (Krishnan et al., 2016, Boardman et al., 2014), these results suggest a consistent association between inter-individual genetic variation in PPAR signalling and diffusion properties of the white matter in preterm infants. Conclusion This provides specific insight into how nutrition might be tailored with precision according to each infant’s genetic profile to optimize brain development.