Memory-Efficient Adaptive Optimization for Humungous-Scale Learning

Tuesday, April 23, 2019 - 4:00pm to Wednesday, April 24, 2019 - 4:55pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

Yoram Singer

Affiliation

Princeton University & Google

Building and Room number

32-G449 (Kiva)

Adaptive gradient-based optimizers such as AdaGrad and Adam are among the methods of choice in modern machine learning. These methods maintain second-order statistics of each model parameter, thus doubling the memory footprint of the optimizer. In behemoth-size applications, this memory overhead restricts the size of the model being used as well as the number of examples in a mini-batch. We describe a novel, simple, and flexible adaptive optimization method with sublinear memory cost that retains the benefits of per-parameter adaptivity while allowing for larger models and mini-batches. We give convergence guarantees for our method and demonstrate its effectiveness in training some of the largest deep models used at Google.

Yoram Singer is the head of Principles Of Effective Machine learning (POEM) research group in Google Brain and a professor of Computer Science at Princeton. He was a member of the technical staff at AT&T Research from 1995 through 1999 and an associate professor at the Hebrew University from 1999 through 2007. He is a fellow of AAAI. His research on machine learning algorithms received several awards.