Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Y. Chai

December 2023

Abstract

In this paper, we ask, do Vision-Language Models (VLMs), an emergent human-computer interface, perceive visual illusions like humans? Or do they faithfully represent reality. We built VL-Illusion, a new dataset that systematically evaluate the problem. And among all other exciting findings, we found that although model’s humanlike rate is low under illusion, larger models are more susceptible to visual illusions, and closer to human perception.

Type

Conference paper

Publication

EMNLP

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Abstract

Yichi Zhang

Ph.D. Candidate

Jiayi Pan

None

Joyce Y. Chai

Professor