2

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Spatial expressions in situated communication can be ambiguous, as their meanings vary depending on the frames of reference (FoR) adopted by speakers and listeners. While spatial language understanding and reasoning by vision-language models (VLMs) …

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years and many approaches have emerged to advance their development. The remarkable achievements of foundation models have shaped the challenges and proposed methods for …

Natural Language Instructions for Intuitive Human Interaction with Robotic Assistants in Field Construction Work

The introduction of robots is widely considered to have significant potential of alleviating the issues of worker shortage and stagnant productivity that afflict the construction industry. However, it is challenging to use fully automated robots in …

Partition-Based Active Learning for Graph Neural Networks

We study the problem of semi-supervised learning with Graph Neural Networks (GNNs) in an active learning setup. We propose GraphPart, a novel partition-based active learning approach for GNNs. GraphPart first splits the graph into disjoint partitions …

Commonsense Reasoning for Natural Language Understanding: A Survey of Benchmarks, Resources, and Approaches

Report of 2017 NSF Workshop on Multimedia Challenges, Opportunities and Research Roadmaps

X-ToM: Explaining with Theory-of-Mind for Gaining Justified Human Trust

Detecting Clinically Related Content in Online Patient Posts

Collaborative Language Grounding Toward Situated Human-Robot Dialogue

Context-based Word Acquisition for Situated Dialogue in a Virtual World