Principal Differences Analysis: Interpretable Characterization of Differences between Distributions

Wednesday, March 22, 2017 - 4:30pm

Speaker Name

Jonas Mueller

Affiliation

CSAIL

Building and Room Number

LIDS Lounge

Abstract

I will introduce principal differences analysis (PDA), a method for analyzing differences between high-dimensional distributions which operates by finding the projection that maximizes the statistical divergence between the resulting univariate populations.  Unlike standard two-sample tests, PDA not only returns a p-value, but also quantifies how much each variable contributes to the overall difference between the populations.  Furthermore, this approach retains high statistical power (even at low sample-sizes) when the underlying differences are only over a sparse subset of the features.  Relying on the Cramer-Wold device, PDA requires no assumptions about the form of the underlying distributions, nor the nature of their inter-class differences.  While our broader framework can utilize any choice of metric between distributions, we provide algorithms for PDA using the Wasserstein distance.  Finally, I will highlight some existing theory and open questions relating the geometry of nonparametric distributions to projections thereof in the context of finite-sample estimation.

Biography