October 4, 2018
The National Science Foundation (NSF) has awarded LIDS faculty member and Professor of Mathematics Philippe Rigollet, along with his co-principal investigator Justin Solomon, X-Consortium Career Development Assistant Professor of Electrical Engineering & Computer Science and researcher with the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), a Critical Techniques, Technologies and Methodologies for Advancing Foundations and Applications of Big Data Sciences and Engineering (BIGDATA) grant for his research titled Statistical and Computational Optimal Transport for Geometric Data Analysis. The goal of the project is to develop a "geometric data analysis" toolbox based on optimal transport to tackle large datasets, and to create a common language for cross-disciplinary collaborations involving optimal transport and geometric data analysis.
“The theory of optimal transport has proven valuable to address data that is not a collection of individual points, but rather whole geometric objects,” says Rigollet. “Yet, understanding optimal transport as a statistical tool is still in its infancy.”
The nascent theory of computational optimal transport is still largely dissociated from statistics, and many methods do not account properly for sampling and measurement noise. To avoid the pitfalls of overfitting, Rigollet and Solomon propose to take a systematic statistical approach to geometric data analysis. With an understanding of the theoretical advantages and drawbacks of optimal transport for statistical modeling, this project will lead to scalable optimal transport algorithms with strong statistical guarantees.
Applications for this research include medical imaging, LiDAR for self-driving cars, single-cell RNA sequencing, and other diverse yet large-scale sources of data.
For more information about the project, please see: https://www.nsf.gov/awardsearch/showAward?AWD_ID=1838071&HistoricalAwards=false
About the award: The NSF BIGDATA program is a major initiative that aims to advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and complex data sets. These capabilities are needed to: accelerate the progress of scientific discovery and innovation; lead to new fields of inquiry that would not otherwise be possible; encourage the development of new data analytic tools and algorithms; facilitate scalable, accessible, and sustainable data infrastructure; increase understanding of human and social processes and interactions; and promote economic growth and improved health and quality of life.