Multimodal Learning for Situated Language Understanding

Last updated on Jan 29, 2024 2 min read projects

asd

Motivations and Objectives

Using situated dialogue (in the virtual world) and conversational interfaces as our setting, we have investigated the use of non-verbal modalities (e.g., eye gaze and deictic gestures) in language processing and in conversation grounding. The virtual world setting not only has important applications in education, training, and entertainment; but also provides a simplified simulation environment to support studies on situated language processing toward physical world interaction.

Selected Recent Papers

Language & 3D Vision

Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David Fouhey, Joyce Chai. LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. ICRA, 2024.
Yichi Zhang, Jianing Yang, Jiayi Pan, Shane Storks, Nikhil Devraj, Ziqiao Ma, Keunwoo Peter Yu, Yuwei Bao, Joyce Chai. DANLI: Deliberative Agent for Following Natural Language Instructions. EMNLP, 2022.

Language & 2D Vision

Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai. Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?. EMNLP, 2023.
Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai. CycleNet: Rethinking Cycle Consistent in Text‑Guided Diffusion for Image Manipulation. NeurIPS, 2023.
Ziqiao Ma, Jiayi Pan, Joyce Chai. World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models. ACL, 2023. (Outstanding Paper Award)

Language & Eye Gaze

Zahar Prasov and Joyce Chai. Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue. EMNLP, 2010
Shaolin Qu and Joyce Chai. Incorporating Temporal and Semantic Information with Eye Gaze for Automatic Word Acquisition in Multimodal Conversational Systems. EMNLP, 2008.
Shaolin Qu and Joyce Chai. An Exploration of Eye Gaze in Spoken Language Processing for Multimodal Conversational Interfaces. NAACL, 2007.