tspDB: Time Series Predict Database

Wednesday, March 4, 2020 - 4:00pm to 4:30pm

Event Calendar Category

LIDS & Stats Tea

Speaker Name

Anish Agarwal

Affiliation

LIDS

Building and Room Number

LIDS Lounge

A major bottleneck of the current Machine Learning workflow is the time consuming and error-prone engineering required to get data from a DB/warehouse so that an appropriately chosen prediction method can be applied to it. To address this challenge, the objective of this work is to explore the feasibility of direct integration of predictive functionality into a time series DB, i.e., a proof-of-concept predictive time series DB. Ideally, a predictive DB should not only have competitive statistical accuracy compared to state-of-the-art prediction algorithms (e.g. neural network-based approaches), it needs to additionally: (i) provide an intuitive user (SQL) interface for making a “predictive query”; (ii) ensure this additional predictive DB functionality does not have significant overhead, i.e., has comparable computational performance with respect to the alternative of layering a predictive model on top of a DB. As our main contribution, we explicitly instantiate such a proof-of-concept, tspDB (available at tspdb.mit.edu), which extends PostgreSQL by adding a “prediction index”. We show tspDB achieves both state of the art statistical performance (compared to modern neural network-based approaches) and is orders of magnitude more computationally efficient (in terms of time to build a prediction and time to answer a prediction query).

Anish is a 4th year Ph.D. student in LIDS advised by Prof. Munther Dahleh and Prof. Devavrat Shah. His interests are in high-dimensional statistics, systems for machine learning and the design of data marketplaces.