Wednesday, March 4, 2020 - 4:00pm to 4:30pm
Event Calendar Category
LIDS & Stats Tea
Speaker Name
Anish Agarwal
Affiliation
LIDS
Building and Room Number
LIDS Lounge
A major bottleneck of the current Machine Learning workflow is the time consuming and error-prone engineering required to get data from a DB/warehouse so that an appropriately chosen prediction method can be applied to it. To address this challenge, the objective of this work is to explore the feasibility of direct integration of predictive functionality into a time series DB, i.e., a proof-of-concept predictive time series DB. Ideally, a predictive DB should not only have competitive statistical accuracy compared to state-of-the-art prediction algorithms (e.g. neural network-based approaches), it needs to additionally: (i) provide an intuitive user (SQL) interface for making a “predictive query”; (ii) ensure this additional predictive DB functionality does not have significant overhead, i.e., has comparable computational performance with respect to the alternative of layering a predictive model on top of a DB. As our main contribution, we explicitly instantiate such a proof-of-concept, tspDB (available at tspdb.mit.edu), which extends PostgreSQL by adding a “prediction index”. We show tspDB achieves both state of the art statistical performance (compared to modern neural network-based approaches) and is orders of magnitude more computationally efficient (in terms of time to build a prediction and time to answer a prediction query).
Anish is a 4th year Ph.D. student in LIDS advised by Prof. Munther Dahleh and Prof. Devavrat Shah. His interests are in high-dimensional statistics, systems for machine learning and the design of data marketplaces.