Language Grounding to Vision, Robotics and Situated Communication


Motivations and Objectives

Language is learned through sensorimotor and sociolinguistic experiences from the physical world and interactions with humans. The ability to connect language units to their referents (e.g., physical entities, robotic actions) is referred to as grounding and plays an important role in multimodal language processing. For example, concrete action verbs often denote some change of state as a result of an action: “slice a pizza” implies the state of the object pizza will be changed from one piece to several smaller pieces. The change of state can be perceived from the physical world through different sensors. Given a human utterance, if the robot can anticipate the potential change of the state signaled by the verbs, it can then actively sense the environment and better connect language with the perceived physical world such as who performs the action and what objects and locations are involved. This connection will benefit many applications relying on human-robot communication.

Our thoughts and positions:

  • Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian. Experience Grounds Language. EMNLP, 2020.

Selected Recent Papers

Grounded Language Acquisition

Referential Grounding in Situated Communication