With the emergence of a new generation of cognitive robots, the capability to communicate with these robots using natural language has become increasingly important. Verbal communication often involves the use of verbs, for example, to ask a robot to perform some tasks or to monitor some physical activities. Concrete action verbs often denote some change of state as a result of an action; for example, “slice a pizza” implies the state of the object pizza will be changed from one piece to several smaller pieces. The change of state can be perceived from the physical world through different sensors. Given a human utterance, if the robot can anticipate the potential change of the state signaled by the verbs, it can then actively sense the environment and better connect language with the perceived physical world such as who performs the action and what objects and locations are involved. This improved connection will benefit many applications relying on human-robot communication.
A new generation of interactive robots have emerged in recent years to provide service, care, and companionship for humans. To support natural interaction between a human and these robots, technology enabling situated dialogue becomes increasingly important. Situated dialogue is drastically different from traditional spoken dialogue systems, multimodal conversational interfaces, and tele-operated human robot interaction. In situated human robot dialogue, human partners and robots are situated and co-present in a shared physical environment. The shared surrounding significantly affects how they interact with each other and how the robot interprets human language and performs tasks. In the last couple years, we have started a couple projects on situated human robot dialogue, particularly focusing on how the situated-ness affects human robot language based interaction, and thus techniques for situated language processing and conversation grounding.
Given a large amount of textual data (e.g., news articles, Wikipedia, weblogs, etc.) available online, it has become increasingly important for techniques that can automatically process this data, for example, to extract event information, to answer user questions, and to make inferences. Along these lines, we are particularly interested in the role of commonsense, discourse and pragmatics in natural language processing and their applications.
Using situated dialogue (in virtual world) and conversational interfaces as our setting, we have investigated the use of non-verbal modalities (e.g., eye gaze and gestures) in language processing and in conversation grounding. The virtual world setting not only has important applications in education, training, and entertainment; but also provides a simplified simulation environment to support studies on situated language processing towards physical world interaction.
Specific projects include: