Sonic Stage
Auto-Generating Interactive Spatial Soundscapes from Dialogue Videos for Blind Viewers
Video Presentation
Abstract
We present Sonic Stage, a system that transforms dialogue videos into interactive spatial soundscapes, enabling BLV audiences to intuitively understand characters' actions and movements through immersive auditory cues. Sonic Stage conveys essential visual information during dialogue through three auditory techniques: (1) spatialized dialogue to represent spatial layout, (2) diegetic sound to convey character actions, and (3) interactive descriptions to provide context-specific visual details.
The Accessibility Challenges in Dialogue Videos
In scenes with lots of dialogue, there is little opportunity to insert audio descriptions (AD). Consequently, blind and low-vision (BLV) viewers often miss crucial visual information, such as charactersā actions, movements, and facial expressions.
Our Solution: Sonic Stage
Sonic Stage conveys essential visual information during dialogue using three auditory techniques: spatialized dialogue, diegetic sound, and interactive descriptions. These techniques enable BLV viewers to perceive on-screen actions within an immersive auditory experience.
Sonic Stage's Audio Spatialization Pipeline
To create a coherent auditory experience across camera changes, Sonic Stage reconstructs a 3D scene representation from dialogue videos and renders all auditory cues within a shared spatial soundscape. Its pipeline consists of three stages: (A) frame sampling, (B) scene reconstruction, and (C) soundscape optimization.
Technical Evaluation with Diverse Video Types
Sonic Stageās pipeline achieved 91.9% overall accuracy in character trajectory reconstruction across a diverse video set. It performed well in scenes with distinct backgrounds, even under fast motion and sparse views. The remaining errors mainly arise from two issues: (1) too few full or medium shots for robust spatial reconstruction, and (2) insufficient feature points for multi-view alignment.
User Evaluation with BLV Viewers
In a user study with 12 BLV viewers, Sonic Stage significantly improved video comprehension, spatial presence, and narrative engagement compared to a baseline modeled after SPICA, the state-of-the-art method for accessible video exploration.
Opportunities Across Diverse Video Genres
Sonic Stageās techniques could be extended to diverse video genres, including sketch comedy, opera, dance, and documentary. We hope this work inspires future research on immersive, interactive audio representations that improve video accessibility for blind audiences.
BibTeX
@article{SonicStage2026,
title={Sonic Stage: Automatically Generating an Interactive Spatial Soundscape to Facilitate Dialogue Video Comprehension for Blind and Low Vision Viewers},
author={Xu, Shuchang and Jin, Xiaofu and Jain, Gaurav and Zhang, Wenshuo and Qu, Huamin and Smith, Brian A. and Yan, Yukang},
booktitle={Extended Abstracts of the 2026 CHI Conference on Human Factors in Computing Systems},
year={2026},
url={https://doi.org/10.1145/3772363.3798425}
}