Leveraging Language and Video Demonstrations for Robot Learning

Monday, September 27, 2021 - 11:30am to 12:30pm

Event Calendar Category


Speaker Name

Jeannette Bohg



Zoom meeting id


Join Zoom meeting



Humans have gradually developed language, mastered complex motor skills, created and utilized sophisticated tools. The act of conceptualization is fundamental to these abilities because it allows humans to mentally represent, summarize, and abstract diverse knowledge and skills. By means of abstraction, concepts that we learn from a limited number of examples can be extended to a potentially infinite set of new and unanticipated situations. Abstract concepts can also be more easily taught to others by demonstration.

I will present work that gives robots the ability to acquire a variety of manipulation concepts that act as mental representations of verbs in natural language instruction. We propose to use learning from human demonstrations of manipulation actions as recorded in large-scale video data sets that are annotated with natural language instructions. In extensive simulation experiments, we show that the policy learned in the proposed way can perform a large percentage of the 78 different manipulation tasks on which it was trained. We show that this multi-task policy generalizes over variations of the environment. We also show examples of successful generalization over novel but similar instructions.

I will also present work that enables a robot to sequence these newly acquired manipulation skills for long-horizon task planning. I will especially focus on work that grounds symbolic states in visual data to enable cloud loop task planning.


Jeannette Bohg is an Assistant Professor of Computer Science at Stanford University. She was a group leader at the Autonomous Motion Department (AMD) of the MPI for Intelligent Systems until September 2017. Before joining AMD in January 2012, Jeannette Bohg was a Ph.D. student at the Division of Robotics, Perception, and Learning (RPL) at KTH in Stockholm. In her thesis, she proposed novel methods towards multi-modal scene understanding for robotic grasping. She also studied at Chalmers in Gothenburg and at the Technical University in Dresden where she received her Master in Art and Technology and her Diploma in Computer Science, respectively. Her research focuses on perception and learning for autonomous robotic manipulation and grasping. She is specifically interested in developing methods that are goal-directed, real-time, and multi-modal such that they can provide meaningful feedback for execution and learning. Jeannette Bohg has received several Early Career and Best Paper awards, most notably the 2019 IEEE Robotics and Automation Society Early Career Award and the 2020 Robotics: Science and Systems Early Career Award.