Thesis Defense: Addressing Missing Data and Scalable Optimization for Data-driven Decision Making

Thursday, May 6, 2021 - 1:00pm to 2:00pm

Event Calendar Category

LIDS Thesis Defense

Speaker Name

Dogyoon Song



Join Zoom meeting


Prof. Devavrat Shah (Advisor)
Prof. Pablo A. Parrilo (Advisor)
Prof. Gregory W. Wornell

Data-driven decision-making is now a necessary commodity in virtually every domain of human endeavor. In principle, if the collected data contain sufficient information, it is possible to build a useful model. Nevertheless, there are a few challenges: (1) the gathered data might have some values missing; and (2) building a model from data usually involves solving an optimization problem, which may require prohibitively large computational resources. My PhD thesis explores two research directions motivated by these challenges.

In the first part of this talk, I will consider the errors-in-variables regression problem and discuss the utility of data imputation in predictive modeling. To this end, I will revisit low-rank matrix completion and establish a novel analysis on the estimation error beyond the traditional mean squared error (Frobenius norm), focusing on the singular value thresholding algorithm. Thereafter, I will argue that the predictions based on the imputed covariate data are typically nearly as accurate as those made with covariates without corruption.

In the second part of the talk, I will discuss the tradeoff between the scalability and the quality of optimal solutions in the context of approximate semidefinite programming. Specifically, I ask the question: ``how closely can we approximate the set of unit-trace $n \times n$ positive semidefinite (PSD) matrices, denoted by $D$, using at most $N$ number of $k \times k$ PSD constraints?'' I will argue that any set $S$ that approximates $D$ within a constant approximation ratio must have superpolynomial semidefinite extension complexity. This result implies the impossibility of global, non-adaptive approximation of the cone of $n \times n$ PSD matrices by a polynomial number of $k \times k$ PSD constraints for any $k = o(n / \log n)$.