Gaussian Splatting with Physical Dynamics Awareness – Use in Virtual Reality

Imagine picking up a virtual teddy bear, squeezing it, and watching it realistically deform in your hands while standing in your living room in virtual reality. This isn't science fiction – it's the new VR-GS system.

We're seeing an explosion in the demand for high-quality 3D content across industries. Hollywood films, architectural visualization, video games to virtual training simulations, they all have an appetite for realistic and interactive 3D experiences. But here's the catch: creating this kind of content has been a time-consuming and expertise-heavy process. VR-GS is a new system that solves this problem, a combination of advanced computer graphics techniques and physics simulations. How do we make virtual objects not just look real, but feel real when we interact with them?

Utilizing a technique called Gaussian Splatting, VR-GS reconstructs high-fidelity 3D scenes from regular photographs. When we combine it with physics engine that allows these objects to behave realistically in real-time, objects also respond to haptic input. How does all this work? I will explain what Gaussian Splatting is, how the new system works, and why it's a game-changer for interactive VR.

Gaussian Splatting and 3D Reconstruction

At the heart of this system is a technique called Gaussian Splatting (GS). GS is a highly efficient way to represent 3D scenes using a cloud of tiny 3D gaussian blobs. The traditional 3D models are made of polygons, but gaussians can be rendered incredibly quickly.

GS is remarkably good at capturing the nuances of real-world objects, and can reproduce surfaces with stunning fidelity. There is a difference from earlier techniques like Neural Radiance Fields (NeRF), which still struggles with real-time performance. The VR-GS system takes this foundation of NeRF and builds upon it. It automatically segments different objects in a scene, which is what creates the interactive layer with individual items. It employs smart inpainting techniques to fill in areas that might be obscured in the original photos – it means that when you move the virtual object, you don't see a "hole" where it used to be.

Perhaps most importantly, VR-GS bridges the gap between this visual representation and physical simulation. The system generates a simplified "cage" around each object, which applies realistic physics calculations in real-time. This means that when you poke or toss the virtual object, it responds in a way that it would in physical reality.

The Core Components of VR-GS

A. Segmented GS generation — The first step in creating a VR-GS scene is to break it down into manipulable objects, which is done by using AI-driven segmentation techniques. VR-GS automatically identifies and separates different objects in a scene, which means that when you reach out to grab that virtual coffee mug, the system knows exactly which gaussians belong to the mug and which belong to the table it's sitting on.

B. Inpainting for scene completion — One of the main challenges in reconstructing 3D scenes from photos is dealing with occlusions – those are areas hidden behind other objects. VR-GS uses 2D inpainting techniques and applying them to 3D space, which makes the system intelligently fill in these gaps. So when you move that virtual object, you'll see a realistic representation of the surface that was beneath it, even if it wasn't visible in the original photos.

C. Mesh generation for physics — Gaussian splatting is great for visual representation, but it's not good for physics calculations. That's where VR-GS's mesh generation comes in. The system creates a simplified tetrahedral mesh for each object, which is invisible to the user but important for simulating realistic physical behaviors. The mesh acts as a kind of skeleton to allow deformation and interaction in physically realistic ways.

D. Two-level embedding — It bridges the gap between the high-detail gaussian representation and the simpler physics mesh. By mapping gaussians to local tetrahedra, which are in turn mapped to the global mesh, VR-GS makes the deformations look smooth and natural with no "spiky" artifacts that can occur with simpler techniques.

The VR-GS Pipeline

This is how VR-GS transforms 3D scenes into interactive VR experiences, step by step:

Scene capture and reconstruction

The scene capture and reconstruction phase in VR-GS is the first step that transforms real-world environments into high-fidelity 3D representations suitable for VR interaction. The process begins with multi-view image capture, where a series of photographs or video frames are taken from various angles around the target scene. These images serve as the raw data for 3D reconstruction.

Next, camera calibration is performed using COLMAP, which is an open-source Structure-from-Motion (SfM) and Multi-View Stereo (MVS) algorithm. COLMAP analyzes the multi-view images to determine the intrinsic and extrinsic camera parameters for each view, establishes spatial relationships between images and provides a sparse point cloud of the scene.

The core of the reconstruction process utilizes 3D Gaussian Splatting as an advanced technique that represents the scene as a set of 3D Gaussian kernels. Here, each kernel is defined by learnable parameters:

1. Mean (μ): The 3D position of the kernel's center.

2. Opacity (σ): Controlling the transparency of the kernel.

3. Covariance (Σ): Determining the shape and orientation of the 3D Gaussian.

4. Spherical harmonic coefficients (C): Encoding view-dependent color information.

The parameters are optimized through a differentiable rendering process. The 3D Gaussians are projected onto 2D image planes and blended using α-compositing, the rendered images are compared to the input photographs, and the parameters are iteratively adjusted to minimize the difference between the rendered and captured images.

  1. Object-level processing

In the Object-level processing phase of the VR-GS pipeline, there are two main techniques to enhance the realism and completeness of 3D content:

Segmentation

Segmentation involves assigning specific objects in a 3D scene to unique labels to activate interaction and editing. Additional RGB attributes are incorporated into the Gaussian kernels that define the objects, which play a role in color representation during rendering, making it easier to distinguish between different objects. The segmentation process is optimized by using a segmentation loss (L_seg) function, which aligns the Gaussian-based scene reconstruction with the actual object boundaries from the captured multi-view images. As a method, this allows the Gaussian kernels to create accurate representations of the objects within the scene, and finally detailed and editable 3D models.

Inpainting

Inpainting addresses occluded areas in the scene where parts of objects may be hidden from view during the image capture process. The areas would normally appear as incomplete or missing in the reconstructed 3D model. 2D-guided 3D inpainting can be used as a solution – 2D image inpainting model fills in the missing textures and features of the occluded parts in the 3D space. It can be achieved by optimizing the 3D Gaussian kernels in the occluded areas using a 2D inpainting technique that fills in the gaps. Afterwards, virtual objects appear visually consistent and complete, even if parts of them were initially hidden during the capture process.

2. Physics-ready mesh generation

In this stage, the focus shifts towards preparing 3D content for real-time physical simulations. It involves two processes:

VDB representation of Gaussian centers

The 3D scene is initially represented by a collection of Gaussian splats (each represents a localized point in space with specific attributes such as opacity, color, and covariance), which is converted into a structured form using a VDB representation. VDB (Volumetric Data Blocks) is a sparse volumetric data structure that efficiently encodes the Gaussian kernels, and captures both the surface and internal geometry of the 3D scene. It creates highly detailed and also compact depiction of the object while preserving its essential physical attributes. The VDB framework is mostly suited for large-scale scenes, because it efficiently handles empty spaces between objects. This way the system is more scalable for real-time simulations.

Tetrahedral mesh generation using TetGen

Once the VDB representation is established, the next step is to generate a tetrahedral mesh for each object or group of objects in the scene. Tetrahedral meshes are needed for real-time physical simulations because they are the basis for calculating deformations, collisions, and other dynamics. Using the TetGen algorithm, there is a finite element mesh constructed by filling the object’s internal structure with interconnected tetrahedral elements. The volumetric data are being transferred from the VDB into a physical mesh that is ready for dynamic simulation.

The tetrahedral meshes are also the backbone for real-time deformation and collision detection in the interactive VR. They enable objects to move, deform, and interact with physical accuracy in response to user input and environmental forces.

3. Two-level embedding

The two-level embedding approach is responsible for the 3D Gaussians to follow the object’s overall deformation but also exhibit realistic behavior during interactions.

Local embedding

The first level of embedding involves a tight envelope of each individual Gaussian kernel within a small tetrahedron. The tetrahedron acts as a localized structure that captures the immediate geometry around the Gaussian. The advantage of using it is that it provides a rigid but simple structure that can move and deform with the Gaussian kernel to preserve the spatial relationships with neighboring points in the mesh. Each Gaussian kernel has its own dedicated tetrahedron, and there needs to be a high degree of precision in how the kernel moves and deforms. It helps mitigate spiking and stretching during object manipulation.

Global embedding

The second level takes the vertices of the local tetrahedra and embeds them into a larger simulation mesh that spans the entire object or scene. It is there to enable broad-scale deformations that are driven by physics simulations, such as elasticity or collisions. The global mesh captures the overall behavior of the object when it interacts with external forces and other objects in the environment.

The combination of local and global embeddings is the layered deformation approach. The local embedding takes care of the details of how each Gaussian behaves, and the global embedding is responsible for making the local deformations stay consistent with the broader physical behavior of the object.

4. Real-time physics simulation

In the real-time physics simulation phase, the system has to ensure that objects respond to physical forces in a way that is both realistic and computationally efficient. It is done via eXtended Position-based Dynamics (XPBD), and supporting techniques like strain energy constraints for elasticity and a velocity-based damping model to manage object movement and interactions.

eXtended Position-based Dynamics (XPBD)

XPBD is a powerful and flexible framework designed to simulate complex physical interactions in real time. It extends the traditional position-based dynamics (PBD) by adding extended functionality for handling sophisticated physics-based behaviors. In XPBD, the positions of objects are adjusted directly in each simulation step to satisfy constraints like collisions, volume preservation, and elasticity, without requiring complex calculations of forces or accelerations. XPBD is highly suitable for real-time VR applications with high expectancy for performance and responsiveness. With XPBD, the VR-GS system is able to realistically simulate object deformations and interactions with minimal computational overhead.

Strain energy constraints for elasticity

To model the elastic behavior of objects – how they stretch and deform when subjected to forces – we incorporate strain energy constraints that calculate how much an object’s shape has changed from its original form and apply forces to restore it to its natural state, similarly to how real-world elastic materials behave. The strain energy function is responsible for object deformation under pressure or collisions (stretch, compression, twist etc.).

Velocity-based damping model

The objective of this  model is for the objects not to behave in an overly bouncy or jittery way. It controls the rate at which an object’s velocity decreases over time, especially after collisions or when external forces come into play like gravity. The damping model mimics the effects of friction and air resistance by slowing down objects and preventing them from continuing to move indefinitely. By controlling this, the system avoids erratic or unrealistic motion.

5. Rendering and interaction

In this stage the system focuses on visual realism and responsiveness during user interactions with 3D objects. Two key components of this phase ensure that the virtual scene looks realistic and reacts smoothly to user input and physical simulations.

Custom rendering pipeline with dynamic shadow mapping

Custom rendering pipeline is specifically designed to handle the specifics of Gaussian splatting and mesh-based simulations, and is optimized for rendering 3D objects represented by Gaussian kernels for real-time visualization of complex scenes.

Dynamic shadow mapping adds visual realism to the scene. Traditional Gaussian splatting methods typically use static shadows baked into textures, but they do not adapt when objects move or deform. In VR-GS, the system generates real-time dynamic shadows that update according to the position and movement of light sources and objects in the scene. The dynamic shadows help with the depth perception, spatial relationships, and the positions of objects. The shadows move and deform naturally as objects change shape or position.

Deformation of Gaussians based on simulated mesh dynamics

The deformation of Gaussians is directly tied to the underlying simulated mesh dynamics. Each Gaussian kernel, which represents a small region of the 3D scene, is connected to a deformable mesh through the two-level embedding process. As the mesh undergoes deformations due to physical forces (again, collisions, stretching, compression etc.), the Gaussians themselves deform in real-time to reflect these changes. The deformation is there to ensure the visual representation of the object remains consistent with its physical simulation. For example, if an object is squeezed or bent, the Gaussian kernels will stretch or compress accordingly, and make the object look as if it is physically reacting to the force.

Conclusion

VR-GS is a breakthrough in the VR scene and the virtual 3D content. The core innovation is its ability to represent complex 3D objects through Gaussian kernels, which rendering fast and efficient, while simultaneously embedding these objects in a real-time physics simulation framework. This dual focus on both visual and physical accuracy means that virtual objects not only look real but also behave as they would in the real world when manipulated. The system’s custom **rendering pipeline** and **dynamic shadow mapping** further enhance immersion, making the experience feel authentic.

By making the creation of realistic, interactive 3D content more accessible and efficient, VR-GS is poised to revolutionize industries such as entertainment, architecture, and virtual training. As demand for high-quality 3D experiences continues to grow, systems like VR-GS will be at the forefront of delivering next-generation virtual reality interactions.