Preprint
Short Note

This version is not peer-reviewed.

Mono-Splat: Real-Time Photorealistic Human Avatar Reconstruction from Monocular Webcam Video via Deformable 3D Gaussian Splatting

Submitted:

30 December 2025

Posted:

31 December 2025

You are already at the latest version

Abstract
High-fidelity telepresence requires the reconstruction of photorealistic 3D avatars in real-time to facilitate immersive interaction. Current solutions face a dichotomy: they are either computationally expensive multi-view systems (e.g., Codec Avatars) or lightweight mesh-based approximations that suffer from the "uncanny valley" effect due to a lack of high-frequency detail. In this paper, we propose Mono-Splat, a novel framework for reconstructing high-fidelity, animatable human avatars from a single monocular webcam video stream. Our method leverages 3D Gaussian Splatting (3DGS) combined with a lightweight deformation field driven by standard 2D facial landmarks. Unlike Neural Radiance Fields (NeRFs), which typically suffer from slow inference speeds due to volumetric ray-marching, our explicit Gaussian representation enables rendering at >45 FPS on consumer hardware. We further introduce a landmark-guided initialization strategy to mitigate the depth ambiguity inherent in monocular footage. Extensive experiments demonstrate that our approach outperforms existing NeRF-based and mesh-based methods in both rendering quality (PSNR/SSIM) and inference speed, presenting a viable, accessible pathway for next-generation VR telepresence.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2025 MDPI (Basel, Switzerland) unless otherwise stated