Practical Distance-Preserving Compression with Provable Guarantees

Thursday, November 9, 2017 - 4:30pm

Event Calendar Category

LIDS & Stats Tea

Speaker Name

Tal Wagner

Affiliation

CSAIL

Building and Room Number

LIDS Lounge

Distance-preserving compression (or quantization) is a widely used tool for speeding up machine learning and data mining tasks such as near-neighbor searches. The goal is to compress a large high-dimensional dataset so as to approximately preserve the distances between the elements. Current algorithms are heuristic in nature and are unsuitable for high-precision regimes. I will present a new algorithm that achieves both state-of-the-art performance on standard benchmark datasets, and provably near-optimal theoretical compression bounds.

Based on joint works with Piotr Indyk and Ilya Razenshteyn.

http://www.mit.edu/~talw/