Thursday, September 14, 2017 - 4:00pm to 5:00pm
Event Calendar Category
Other LIDS Events
Sharif University of Technology
Building and Room Number
We consider the following class of fundamental statistical problems. We are given a number of independent identically distributed samples that are drawn from a certain distribution, and we are interested in either testing for the identity of the distribution (is it close or equal to a given nominal distribution?) or learning it within some accuracy. In both cases, we employ the total variation as the metric of interest on the space of distributions. We focus on the corresponding sample complexity, which is the number of samples required. It is well known that the sample complexity increases with the size of the domain (sample space) of the distribution, which is problematic when that domain is large.
In this talk we show that the sample-complexity is dramatically improved when the class of distributions of interest has a low-dimensional structure, that is, a parametric description with a relatively small number of parameters. Some special cases include mixtures of a few high-dimensional distributions, the Ising model, and exponential families.
In particular, we first develop general approaches for deriving upper and lower sample complexity bounds, applicable to the above described setting. We then provide improved upper bounds for the above mentioned special cases. For example, we show that the sample complexity of learning an Ising model on a complete graph with m edges within a total variation distance of $\epsilon$ is $O(m^1.5 / \epsilon^2)$, which improves upon the $O(m^2 \beta^2 / \epsilon^3)$ currently available bound on the sample complexity for the (easier) identity testing problem.
Arsalan Sharifnassab received a B.Sc. degree in Mathematics and also in Electrical Engineering, and an M.Sc. degree in Electrical Engineering, all from the Sharif University of Technology, Iran, where he is currently a PhD student, again in EE.
Over the last twelve months, Arsalan has been a visiting student in LIDS, MIT, working with John Tsitsiklis. Some of his research during this past year has been in the areas of high-dimensional statistics, network scheduling and routing, partially observed dynamic programming (POMDPs), and distributed algorithms.