Profs. Philippe Rigollet and Devavrat Shah help lead new NSF data science institute

October 6, 2020

Congratulations to LIDS members Prof. Philippe Rigollet (Math) and Prof. Devavrat Shah (EECS), who are co-PIs on the team leading the new NSF Foundations of Data Science Institute (FODSI).

Adam Conner-Simons I CSAIL (original article)

The National Science Foundation has awarded $12.5 million to establish a multidisciplinary institute—a collaboration between UC Berkeley and MIT—to improve our understanding of critical issues in data science, including modeling, statistical inference, computational efficiency, and societal impacts.

The director of the new institute, called the Foundations of Data Science Institute (FODSI), will be UC Berkeley Prof. Peter Bartlett, who has appointments in the university’s Departments of Statistics and of Electrical Engineering and Computer Sciences (EECS). The co-director will be professor Piotr Indyk, a principal investigator (PI) at MIT CSAIL. MIT’s co-PIs include CSAIL’s Jonathan Kelner and Ronitt Rubinfeld, as well as Philippe Rigollet and Devavrat Shah of the MIT Institute for Data, Systems and Society.

“Data science has emerged as the central science for the 21st century, a widespread approach to science and technology that permits empirical investigations at unprecedented scale and scope,” the team wrote in their winning proposal. “The explosion in the availability of data and growing awareness of the central role it can play in diverse domains from science, commerce and industry have added considerable urgency.”

The basis of data science started to form in the first half of the 20th century, combining the deductive and inductive traditions of mathematics to devise rigorous approaches to thinking about data and its role in scientific inquiry. But in the second half of the last century, new specialized disciplines such as computer science, mathematical statistics, control theory, information theory, and signal processing arose and were pursued individually, rather than as part of a larger whole. 

But now, data science is weaving these threads back together as it requires the expertise of mathematicians, statisticians, and theoretical computer scientists, among others, to make the most effective uses of massive datasets that increasingly affect how industry, academia, and government operate. The institute’s research themes include the complex interactions between decision makers, the data they use, and competing actors, as well as methods for making use of vast amounts of data, and the economic, social, and ethical implications of automated data analysis and decision-making.

“Under the banner of data science, those disciplines are now coming back together and we need to look at the theoretical foundations of all of them, across the breadth of issues raised by data science problems,” said Bartlett, who is also Associate Director of the Simons Institute for the Theory of Computing. “We're starting to see a confluence of efforts in pursuing a better understanding of how to solve scientific and societal problems by leveraging all of these disciplines. It’s important to consider possible solutions from many different perspectives.”

FODSI aims to meet this challenge, bringing together experts from many cognate academic disciplines to lay the theoretical foundations for the field of data science. It also aims to educate and mentor future leaders in data science at all levels, K-12 through postdoc, and to broaden participation and increase diversity in the data science workforce.

The new institute will convene public events, such as summer schools, research workshops and other collaborative research opportunities, that will serve the broader research community.  Many of these events will be hosted by the Simons Institute for the Theory of Computing, UC Berkeley's global center for collaboration in theoretical computer science.

“It was important to bring together participants from multiple universities to create this institution,” said Bartlett, who is also in Berkeley’s Division of Computing, Data Science and Society (CDSS) “Berkeley has a strong technical affinity with the work being done at our partner institutions. As well as the vigorous flow of ideas, these collaborations will give the institute a national reach.”

The institute will exploit strong connections with researchers at its industrial partners (Amazon, Google, LinkedIn, Microsoft, and VerizonMedia) to ensure engagement with the broad range of application domains that these partners represent.

With its collaborative structure and partnerships, the project is a good fit for the CDSS portfolio. CDSS fosters cross-campus partnerships, bringing together researchers in areas ranging from economics to social welfare, climate studies to public policy, computer science, electrical engineering, and statistics to biomedicine to apply tools like deep learning to solving societal problems. The new project builds on work by the EECS Department, which is jointly housed in CDSS and Berkeley’s College of Engineering, and the Statistics Department, which is in both CDSS and the Division of Mathematical and Physical Sciences.

The award was one of two made under the second phase of NSF’s Transdisciplinary Research in Principles of Data Science (TRIPODS) program. The two new projects build on 12 earlier Phase I projects and are closely tied to NSF’s Harnessing the Data Revolution (HDR) Big Idea.