Volumetric Video vs Gaussian Splatting

Comparison

Volumetric Video and Gaussian Splatting both promise to bring the real world into 3D — but they approach the problem from fundamentally different directions. Volumetric video is a capture-and-playback paradigm: multi-camera arrays record performances as sequences of 3D data, producing content that viewers can observe from any angle. Gaussian splatting is a reconstruction-and-rendering paradigm: a set of photographs is optimized into a cloud of 3D Gaussian primitives that can be rendered in real time from novel viewpoints.

In 2025–2026 the boundary between these two technologies has begun to blur. 4DViews, a leading volumetric capture studio, announced native Gaussian splatting output at SIGGRAPH 2025, and the production of Superman became the first major motion picture to ship dynamic Gaussian splatting sequences. Meanwhile, the Khronos Group released a glTF extension for Gaussian splats, signaling that the format is entering the same standardization pipeline that mesh-based volumetric video has occupied for years. Understanding where each technology excels — and where they converge — is essential for anyone building spatial computing experiences, immersive media pipelines, or real-time 3D applications.

Feature Comparison

DimensionVolumetric VideoGaussian Splatting
Primary purposeCapture and replay real-world performances as 3D video sequencesReconstruct static or dynamic scenes from photographs for real-time rendering
Capture requirementsDedicated multi-camera stages (50–100+ synchronized cameras) or emerging sparse-camera rigs (4–8 cameras)As few as 20–50 casual photographs; increasingly from phone video or drone footage
Output representationSequences of textured 3D meshes (one per frame), or newer neural/splat-based formatsCloud of millions of 3D Gaussian ellipsoids with color, opacity, and view-dependent appearance
Rendering speedMesh playback at 30–60 FPS in engines; streaming adds latencyReal-time 30–200+ FPS on desktop GPUs; 116 FPS on mobile devices with Mobile-GS (2025)
Visual qualityHigh fidelity for human subjects; quality depends on camera count and reconstruction pipelinePhotorealistic novel views with sharp detail on hair, foliage, reflections, and translucency
Dynamic content supportNative — designed for temporal sequences of moving subjectsEmerging via 4D Gaussian splatting; first major use in film production in 2025
File size and streamingGigabytes per minute of high-quality content; neural codecs improving compressionCompact (4.8 MB mobile-optimized scenes); glTF standardization enabling web streaming
Relighting and editingMesh-based output integrates with standard lighting and compositing toolsHistorically limited; disentangled Gaussian splatting (SIGGRAPH Asia 2025) enables relighting
Production maturityEstablished studios (Microsoft, Metastage, Dimension Studio) with commercial pipelinesRapid adoption: Zillow, Apartments.com, DJI, Esri, and The Foundry Nuke 17.0 all added support in 2025
StandardizationMPEG V3C and MIV standards for compression and streamingKhronos glTF KHR_gaussian_splatting extension (release candidate 2025)
AccessibilityRequires dedicated capture infrastructure and specialized playbackCapturable with a phone; viewable in browsers via WebGPU
AI/generative integrationNeural volumetric representations reducing camera requirements; generative 3D video from text emergingGenerative models (e.g., World Labs' Marble) produce complete splat scenes from images or text

Detailed Analysis

Capture Workflow and Accessibility

The single largest practical difference between these technologies is the barrier to entry. Volumetric video has historically required purpose-built capture stages costing millions of dollars, staffed by specialized technicians. Companies like Metastage and Dimension Studio operate these facilities as a service, but access remains limited to well-funded productions. AI-driven sparse-camera approaches are lowering the bar — Nokia's standards-based real-time volumetric communication system and 8i's mobile capture solutions both demonstrate that volumetric capture is moving beyond the studio — but the infrastructure remains substantially heavier than Gaussian splatting's requirements.

Gaussian splatting, by contrast, can reconstruct a scene from a set of photographs taken with a consumer camera or smartphone. DJI's integration of splatting into its Terra drone mapping software and Zillow's deployment in SkyTours real estate listings both illustrate how the technology meets users where they already are. For spatial computing developers who need to digitize real environments, splatting has become the pragmatic default.

Rendering Performance and Platform Reach

Gaussian splatting's architectural advantage in rendering speed is difficult to overstate. Because splats are rasterized via GPU-friendly projection and alpha compositing — rather than requiring per-ray neural network evaluation as in NeRF — frame rates of 100+ FPS are routine on desktop hardware. The Mobile-GS system demonstrated 116 FPS on mobile devices in 2025, with scene files compressed to just 4.8 MB. Combined with WebGPU browser rendering, this makes Gaussian splat scenes nearly as accessible as photographs on the open web.

Volumetric video playback, particularly for mesh-based sequences, demands more from the rendering pipeline. Each frame is a distinct 3D mesh that must be decoded, uploaded to the GPU, and rendered — a workflow that strains both bandwidth and compute. Neural compression codecs are improving streaming efficiency, but volumetric video remains a heavier lift for consumer devices, especially in mixed reality headsets where thermal and power budgets are tight.

Dynamic Content and Temporal Performance

This is where volumetric video retains its clearest advantage. It was designed from the ground up to capture motion — human performances, athletic events, live action. The output is inherently temporal: a sequence of frames representing continuous movement. For applications like immersive sports broadcasting, telepresence with life-sized 3D avatars, and concert replays, volumetric video provides a mature, proven pipeline.

Dynamic Gaussian splatting (4D splatting) is closing the gap rapidly. The use of dynamic splats in Superman (2025) marked a watershed moment for the technique in visual effects production. Research from SIGGRAPH Asia 2025 on disentangled Gaussian splatting demonstrated high-fidelity relightable volumetric video through geometry-appearance decoupling. However, 4D splatting workflows remain less mature than traditional volumetric pipelines for long-duration, multi-actor performances.

Visual Fidelity and Scene Complexity

Gaussian splatting excels at reproducing fine geometric detail and view-dependent appearance effects. Hair, foliage, fences, glass, and reflective surfaces — elements that frequently defeat mesh-based reconstruction — render convincingly as splats. The continuous, soft nature of Gaussian primitives naturally handles semi-transparent and fuzzy boundaries that would require extremely dense meshes to approximate.

Volumetric video's visual quality for human subjects is high when captured in professional studios with dense camera arrays, but the mesh reconstruction step can introduce artifacts on fine details. The advantage of mesh-based output, however, is compatibility with established compositing and lighting pipelines — artists can relight, reshade, and integrate mesh-based volumetric captures into scenes using familiar tools. The disentangled splatting research is bringing similar capabilities to Gaussian representations, but tooling maturity still favors meshes for post-production work.

Standardization and Ecosystem Maturity

Both technologies reached important standardization milestones in 2025. Volumetric video benefits from MPEG's V3C (Visual Volumetric Video-based Coding) and MIV (MPEG Immersive Video) standards, which provide a framework for compression, streaming, and interoperability. These are backed by major players like Nokia and Microsoft.

Gaussian splatting gained its own standardization anchor when the Khronos Group released a candidate glTF extension (KHR_gaussian_splatting) for storing splats in glTF 2.0 files. With glTF already the de facto interchange format for real-time 3D, this positions splats for broad tool and engine adoption. Esri's ArcGIS Pro, The Foundry's Nuke 17.0, and 3DVista's virtual tour platform all shipped native splat support in 2025, signaling that the ecosystem is maturing at an extraordinary pace.

The Convergence Path

The most significant trend in 2025–2026 is convergence. 4DViews — one of the longest-running volumetric capture studios — announced Gaussian splatting output at SIGGRAPH 2025, effectively acknowledging that splats are becoming the preferred rendering representation even for studio-captured volumetric content. Generative AI is accelerating this convergence: models can now synthesize Gaussian splat scenes from text or single images, while neural volumetric video systems can reconstruct 3D performances from monocular input.

The future likely involves volumetric capture workflows (multi-camera rigs, depth sensors, LiDAR) feeding into Gaussian splatting representations for delivery and rendering. The capture infrastructure of volumetric video and the rendering efficiency of Gaussian splatting are complementary, not competing — and the industry is building pipelines that combine both.

Best For

Immersive Sports Broadcasting

Volumetric Video

Live sports require capturing continuous, multi-person action across a full playing field. Volumetric video's purpose-built capture infrastructure and temporal pipeline handle this natively, though Gaussian splatting may handle replay rendering in future workflows.

Real Estate and Property Tours

Gaussian Splatting

Zillow and Apartments.com have already deployed splatting in production. A phone or drone captures enough data, file sizes are small enough for web delivery, and the photorealistic quality sells properties. No studio required.

Film and VFX Production

Depends on Shot

Dynamic human performances still favor volumetric capture stages for reliability, but static environments and set extensions increasingly use Gaussian splatting. The Superman production used both. Expect hybrid pipelines.

Telepresence and Remote Collaboration

Volumetric Video

Real-time communication with life-sized 3D avatars of actual people requires the temporal capture pipeline that volumetric video provides. Nokia's standards-based system demonstrates this is moving toward consumer readiness.

E-Commerce Product Visualization

Gaussian Splatting

Capture a product with a phone, generate a photorealistic 3D view in minutes, and embed it on the web via WebGPU. Splatting's low capture cost and browser-native rendering make it the clear choice for product pages.

Cultural Heritage and Site Preservation

Gaussian Splatting

Esri's ArcGIS integration and drone-based capture workflows make splatting ideal for digitizing historical sites, monuments, and landscapes with photorealistic fidelity and compact storage.

Live Concert and Event Replay

Volumetric Video

Capturing a dynamic, multi-performer event for later 6DoF replay requires the synchronous multi-camera infrastructure of volumetric video. 4D splatting is approaching viability here but is not yet production-ready for long-duration events.

Game Environment Creation

Gaussian Splatting

Photorealistic environments from real-world locations rendered at 100+ FPS align perfectly with game engine requirements. The glTF standardization and Nuke integration put splats directly into game art pipelines.

The Bottom Line

Gaussian splatting is the more transformative technology in 2025–2026. Its combination of casual capture (phone or drone), real-time rendering (100+ FPS on desktop, 116 FPS on mobile), compact file sizes, and rapid ecosystem adoption — from Khronos glTF standardization to production deployments at Zillow, DJI, and major VFX houses — makes it the default choice for most 3D capture and rendering scenarios. If you are building any application that needs to bring real-world environments into interactive 3D, start with Gaussian splatting.

Volumetric video retains an essential role for dynamic human performance capture. Live sports, telepresence, and cinematic performance capture still demand the synchronized multi-camera infrastructure and temporal pipelines that volumetric video provides. The market reflects this: volumetric video is projected to grow from $5.29 billion in 2026 to $35 billion by 2034, driven largely by sports, entertainment, and enterprise communication. But even here, the delivery format is shifting — studios like 4DViews now output Gaussian splats, using volumetric capture for acquisition and splatting for rendering.

The practical recommendation: use Gaussian splatting as your default 3D representation for static scenes, environments, products, and short dynamic sequences. Use volumetric video capture infrastructure when you need extended temporal performances of real people. And architect your pipeline to expect convergence — the capture stage of volumetric video feeding into Gaussian splatting representations is rapidly becoming the industry standard workflow.