Visual assistants that can guide humans through complex tasks in physical environments have significant potential, yet their development is hindered by the high cost of human-in-the-loop data collection. We present BASIS (Bootstrapping Assistant modeling with Situated Interaction Simulation), a novel framework that fundamentally rethinks how visual assistants are developed and evaluated. Rather than relying on expensive human data collection, BASIS leverages simulation to bootstrap capable assistants through three interconnected stages: (1) Situated Interaction Simulation generates high-quality synthetic data through interactions between oracle assistants and simulated users; (2) Autonomous Model Development trains and continuously evaluates assistant models using this synthetic data; and (3) Real-User Validation verifies effectiveness with human users. We implement BASIS in Alexa Arena and demonstrate that our best model—despite being fine-tuned solely on synthetic data and operating under realistic perception conditions—enables real human users to achieve a 72.9% success rate, approaching the 88.6% performance of an oracle assistant with access to privileged information of perfect perception. Through detailed error analysis, we identify object identification as the primary bottleneck for current visual assistants. Our approach bridges the gap between interaction simulation and real human-AI collaboration, establishing a scalable pipeline for developing assistants that can effectively guide users through complex tasks. Project website: https://colm2025-329.github.io/