Stereoscopy and Depth Perception in XR — Utilizing Natural Binocular Disparity

The ability to perceive depth and three-dimensionality has been a core component of the human visual system for millennia. So it is no wonder that as devices for presenting virtual experiences have evolved, stereoscopy — the practice of presenting separate left-eye and right-eye images to create a perception of depth — has remained a fundamental technique. From early 3D movies to today's advanced extended reality technologies, stereoscopy continues to enhance realism, presence and immersion for users.

In today's XR tech development stereoscopy is playing an essential role in bringing all extended reality (virtual, augmented and mixed reality) to the mass market. Whether through VR headsets that present stereoscopic images, AR glasses that overlay 3D holograms onto the real world, or mixed reality experiences that blend physical and digital objects, stereoscopy is what allows these technologies to go beyond a "flat screen in a headset" and feel genuinely three-dimensional.

The sense of realism, immersion and perception of "being there" created through stereoscopic displays has been shown to dramatically impact user experience, engagement and performance for applications as diverse as spatial training, productivity tools, gaming, simulations and more. This has made it a driving focus for developers and a crucial differentiator for the XR systems employing it.

As extended reality moves closer to fulfilling its promise of augmenting human capability, stereoscopy remains essential to generating the depth cues, perception of scale, and sense presence needed for XR content to feel truly realistic and useful.

The Definition

Stereoscopy is a key technique enabling the sense of three-dimensional depth perception in XR. It works by presenting two separate images, one catering to the left eye and one to the right eye, thus mimicking the binocular vision of the human visual system. When our two eyes simultaneously view an object or scene from slightly different angles, our brain is able to integrate these differences and reconstruct a three-dimensional representation. This phenomenon, known as stereopsis, allows us to perceive depth and map our spatial environment.

In XR systems, we can achieve stereoscopy through displaying two offset images, corresponding to the left and right eye views, on the respective lenses of the headset. The minor angular disparity between these two images is sufficient to stimulate the depth cues necessary for the brain to perceive virtual objects as having volume, relief and three-dimensional pop-out from the screen.

Stereoscopy is a critical visual hack in XR. It essentially presents two-dimensional imagery on flat displays to nevertheless trick the brain into perceiving a fully immersive and volumetric virtual world. The depth cues are enabled by appropriate stereoscopic rendering and are essential for providing high degrees of realism, presence and immersion in virtual environments. It leverages the brain's natural binocular vision system to produce a convincing sense of three-dimensional space and volume.

Rendering the Scene

Stereoscopy is instrumental to transporting the user into immersive synthetic worlds and there are many techniques that can be used to achieve it in VR. The most common approach is to render the scene twice – once from the perspective of each eye – and then direct the corresponding images to each eye using special lenses or displays.

Anaglyph uses colored lenses and filters to separate the images, while polarized and shutter techniques use polarized lenses or synchronized shutters to alternately present the left and right views. This is a rather inexpensive technique, but can cause eye strain due to the necessary color filtering. Polarized and shutter techniques produce a flicker-free, high-quality 3D effect, but require more expensive polarized or shutter-based components. These are lens-based stereoscopic methods that provide the widest field of view and most convincing depth perception but also the most complex and expensive hardware.

Auto-stereoscopic displays do not require any lenses and are able to direct different perspectives to the left and right sides of the display using parallax barriers, lenticular lenses, or holograms. These autostereoscopic approaches are useful for mobile VR but typically cannot achieve as wide a field of view or depth effect as lens-based techniques. Some high-end VR systems use head-tracking to follow the movements of the user's head and continuously update the stereoscopic rendering to match their perspective in real-time. As the user pivots their head, the images seamlessly shift to match their view of the virtual space.

Head-tracking enhances the sense of presence within an experience but also increases cost, complexity, latency and power requirements. For most VR, head-tracking provides diminishing returns beyond a certain threshold of accuracy and responsiveness. There are benefits and trade-offs to each stereoscopic approach, from cost and convenience to depth effect and field of view.

There are continued improvements in hardware, optics and 3D rendering techniques which are expected to expand the reach and applications of stereoscopic VR. What could optimize quality and performance based on the depth and importance of different elements in a scene is the introduction of variable focus and resolution as a function of convergence. That is a matter of advances in light field, multi-view, and super multi-resolution stereoscopy which would enable for ultra-wide field of view 3D without sacrificing depth effect or frame rate.

Binocular Disparity and Depth Perception

Mimicking the natural binocular disparity between our eyes triggers the brain's Depth from Stereopsis ability. That is what creates our perception of depth and dimension, what allows us to judge distances of objects, and develop an enhanced sense of immersion within the scene. Even with relatively little binocular disparity, our visual system can generate a compelling perception of 3D depth and volume. Stereoscopy capitalizes on this ability by providing just enough difference between the left and right eye views to make shapes pop out and spaces feel expansive without causing visual strain.

The key is to calibrate the binocular disparity so that it feels natural and intuitive. Too much disparity can lead to trouble fusing the left and right eye images into a single 3D percept, while too little will not generate any sense of depth. We can carefully control and balance the disparity throughout a stereoscopic scene, and immersive depth cues can be then selectively applied where needed to make virtual objects feel close-up or far away.

How Depth Perception Creates Immersive Virtual Experiences

Immersive reality aim to translate the experience of presence into synthetic environments. One thing is to effectively convey a sense of spatial depth and dimension within virtual worlds, the other is to make those spaces feel openly navigable, visually compelling and even physically believable. But without strong 3D depth cues, the illusion of immersion falls apart, revealing the virtual scenes as flat 2D projections that lack volume and scale.

Depth perception develops from multiple depth cues working together, including binocular disparity, motion parallax, occlusion, lighting/shading, focus and texture gradients. By analyzing these cues, our visual system generates an impression of three-dimensional form and spatial relationships between objects. What XR systems must provide is analogous depth cues calibrated to the virtual environment in order to convince our minds and bodies of full immersion within the simulated space.

Binocular disparity makes objects pop out, depths feel expansive, and spaces seem genuinely voluminous. Even without surrounding peripheral vision, compelling binocular depth cues make a virtual world feel visually rich and spatially vast. Without them, flat 2D graphics cannot translate the sensation of inhabiting an immersive synthetic environment.

Tthe techniques for delivering depth perception continue improving and stronger 3D cues will always be needed to achieve true immersion beyond a superficial illusion of interactivity within virtual spaces. What is preferable is a natural, intuitive depth that makes spaces feel physically explorable, which is a key to high-fidelity virtual experiences.

Further down the line - or perhaps not so fart, rather in the near future - a seamless integration of multiple depth cues tailored to each experience will be crucial as broader adoption and more impactful applications follow. Depth perception is what brings the illusion of presence to life, so a high quality 3D must remain central to the nascent medium's promise and progress. An uncanny ability to perceive dimension shapes how XR shape-shifts human consciousness.

Road Ahead

As XR technologies mature and aim to move from niche to mainstream, the ability to convincingly represent depth, scale and position through stereopsis will remain essential. Looking ahead, the evolution of stereoscopic displays, from auto-stereoscopic screens to varifocal and holographic displays, holds the potential to make 3D content seamless and ubiquitous.

We can expect advances in foveated rendering and variable refresh rate technologies which will greatly improve the comfort and realism. And as more devices provide accurate 6DoF (6 degrees of freedom) head and eye tracking, the stereoscopic effect will adapt dynamically to provide each user with their own optimal 3D view.

In the long run, if extended reality is to truly augment human capability by extending our senses and spatial cognition, stereoscopic perception will need to become an invisible, ambient part of how we experience and interact with virtual content. The goal of making stereoscopy indistinguishable from natural vision may come closer to reality, and that would be the point of XR live up to its name.

 
Petra Palusova