Multimodal Learning for Situated Language Understanding
Motivations and Objectives
Using situated dialogue (in the virtual world) and conversational interfaces as our setting, we have investigated the use of non-verbal modalities (e.g., eye gaze and deictic gestures) in language processing and in conversation grounding. The virtual world setting not only has important applications in education, training, and entertainment; but also provides a simplified simulation environment to support studies on situated language processing toward physical world interaction.
Selected Recent Papers
Language & 3D Vision
- Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David Fouhey, Joyce Chai. LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. ICRA, 2024.
- Yichi Zhang, Jianing Yang, Jiayi Pan, Shane Storks, Nikhil Devraj, Ziqiao Ma, Keunwoo Peter Yu, Yuwei Bao, Joyce Chai. DANLI: Deliberative Agent for Following Natural Language Instructions. EMNLP, 2022.
Language & 2D Vision
- Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai. Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?. EMNLP, 2023.
- Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai. CycleNet: Rethinking Cycle Consistent in Text‑Guided Diffusion for Image Manipulation. NeurIPS, 2023.
- Ziqiao Ma, Jiayi Pan, Joyce Chai. World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models. ACL, 2023. (
Outstanding Paper Award )
Language & Eye Gaze
- Zahar Prasov and Joyce Chai. Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue. EMNLP, 2010
- Shaolin Qu and Joyce Chai. Incorporating Temporal and Semantic Information with Eye Gaze for Automatic Word Acquisition in Multimodal Conversational Systems. EMNLP, 2008.
- Shaolin Qu and Joyce Chai. An Exploration of Eye Gaze in Spoken Language Processing for Multimodal Conversational Interfaces. NAACL, 2007.