Wednesday, October 9, 2019 - 4:00pm to 4:30pm
Event Calendar Category
LIDS & Stats Tea
Building and Room Number
The exponential growth in size of human genomic studies, with tens of thousands of observations, opens up the intriguing possibility to investigate the role of rare genetic variants in biological human evolution. A better understanding of rare genetic variants is crucial for the study of rare genetic diseases, as well for personalized medicine. A crucial challenge when working with rare variants, is to develop a statistical framework to assess if the observed sample is truly representative of the underlying population. In particular, it is important to understand (i) what fraction of the relevant variation present in human genome is not yet captured by available datasets and (ii) how to design future experiments in order to maximize the number of hitherto unseen genomic variants. We propose a novel rigorous methodology to address both problems using a nonparametric Bayesian framework. Our contribution is twofold: first,we provide an estimator for the number hitherto unseen variants which are going to be observed when additional samples from the same population are collected and study its theoretical and empirical properties. Moreover, we show how this approach can be used in the context of the optimal design of genomic studies. For this problem, under a fixed budget, one is interested in maximizing the number of genomic discoveries by optimally enlarging a dataset, trading off between the additional number of individuals to be sequenced and the quality of the individual samples.
Joint work with Stefano Favaro, Federico Camerlenghi and Tamara Broderick
Lorenzo Masoero is a fourth year PhD student in EECS at MIT, working with Tamara Broderick in the Machine Learning group. Prior to joining MIT, he obtained a Master's degree in Statistics and Applied Mathematics from Collegio Carlo Alberto, and an Undergraduate degree in Economics from University of Torino.