Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI

Marcus Elvain; Howard Pellorin

doi:10.20944/preprints202512.2815.v1

Submitted:

31 December 2025

Posted:

31 December 2025

You are already at the latest version

Abstract

Generative models have achieved remarkable success in producing realistic images and short video clips, but existing approaches struggle to maintain *persistent worldcoherence over long durations and across multiple modalities. We propose Multimodal Supervisory Graphs (MSG), a novel framework for world modeling that unifies geometry (3D structure), identity (consistent entities), physics (dynamic behavior), and interaction (user/agent inputs) in a single abstract representation. MSG represents the environment as a dynamic latent graph, factorized by these four aspects and trained with cross-modal supervision from visual (RGB-D), pose, and audio streams. This unified world abstraction enables generative AI systems to maintain consistent scene layouts, preserve object identities over time, obey physical laws, and incorporate interactive user prompts, all within one model. In our experiments, MSG demonstrates superior long-term coherence and cross-modal consistency compared to state-of-the-art generative video baselines, effectively bridging the gap between powerful short-term video generation and persistent, interactive world modeling. Our framework outperforms prior methods on metrics of identity consistency, physical plausibility, and multi-view geometry alignment, enabling new applications in extended reality and autonomous agent simulation.

Keywords:

multimodal

;

generative AI

Subject:

Computer Science and Mathematics - Computer Vision and Graphics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Multimodal Supervisory Graphs for PersistentWorld Modeling in Generative AI

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe