Transparent and Coherent Procedural Mistake Detection

Shane Storks, Itamar Bar-Yossef, Yayuan Li, Zheyuan "Brian" Zhang, Jason J. Corso, Joyce Y. Chai

November 2025

Abstract

Procedural mistake detection (PMD) is a challenging problem of classifying whether a human user (observed through egocentric video) has successfully executed a task (specified by a procedural text). Despite significant recent efforts, machine performance in the wild remains nonviable, and the reasoning processes underlying this performance are opaque. As such, we extend PMD to require generating visual self-dialog rationales to inform decisions. Given the impressive, mature image understanding capabilities observed in recent vision-and-language models (VLMs), we curate a suitable benchmark dataset for PMD based on individual frames. As our reformulation enables unprecedented transparency, we leverage a natural language inference (NLI) model to formulate two automated metrics for the coherence of generated rationales. We establish baselines for this reframed task, showing that while VLMs struggle off-the-shelf, their accuracy, coherence, and efficiency can be improved by incorporating these metrics into common inference and fine-tuning methods- though not without tradeoff. Lastly, our multi-faceted metrics visualize common outcomes, highlighting areas for further improvement.

Type

Conference paper

Publication

EMNLP

Transparent and Coherent Procedural Mistake Detection

Abstract

Shane Storks

Ph.D. Candidate

Itamar Bar-Yossef

Undergraduate Research Assistant

Zheyuan "Brian" Zhang

Graduate Research Assistant

Joyce Y. Chai

Professor