Capturing Depth — Key Technologies to Stereoscopic Immersion

From early techniques using red and green filters to today's glasses-free screens — technologists have devised ingenious ways to achieve stereoscopic vision. The goal is to mimic the disparity between the two eyes' lines of sight (described in Stereoscopy and Depth Perception in XR — Utilizing Natural Binocular Disparity) that our brains evolved to interpret as depth.

In this article I will present a detailed overview of technology that aims at one thing: to trick our eyes and brain into perceiving a 3D scene just as vividly as the real world around you. Engineers have dreamed of such illusory 3D for centuries, and now thanks to creative technologies, they're making it a reality. Without further ado, since there’s a lot to cover, let’s start with the first technique.

Lens-Based Techniques

Lens-based stereoscopy presents slightly different views of a 3D scene to the left and right eyes using special lenses or filters. The aim is to achieve a natural and convincing depth effect by mimicking the binocular disparity of human stereopsis. There are three main lens-based approaches:

  1. Anaglyph — Anaglyph stereoscopy uses colored lenses, like red and cyan, to separate the left and right eye images. The left image is tinted red and the right image cyan, so that each eye perceives only its designated image. When viewed through the colored lenses, the eyes are able to fuse the images into a single 3D percept.

    Pros of anaglyph include its inexpensive and simple nature, but it also suffers from limited color fidelity due to filtering and typically provides a narrower depth effect and field of view than other lens techniques. It can also cause eyestrain and headaches due to the necessary color separation and fusion. When done poorly, anaglyph easily reveals its underlying 2D imagery, breaking the illusion of depth.

  2. Polarized — Polarized stereoscopy uses polarized lenses and alternates the left and right eye images with different polarizations, like horizontal and vertical. Each eye perceives only one image at a time, eliminating the need for color filters and resulting in a high-quality, flicker-free 3D effect.

    Expensive polarized lenses and displays are required, but this approach offers a wide field of view and very convincing depth perception when implemented properly. Some viewers report discomfort from the need to continually refocus their eyes on images with different polarizations.

  3. Shuttered — Shuttered stereoscopy synchronously alternates the left and right images using liquid crystal shutters. Like polarized stereoscopy, it provides a flicker-free, high-fidelity 3D experience with a wide field of view and strong depth effect. However, requiring precisely timed, synchronized image alternation introduces additional hardware complexity and potential for latency or "shutter flicker." Expensive shutter-based glasses and displays are also necessary, similar in cost to polarized components.

There are clear trade-offs to each lens-based stereoscopic technique in XR. Anaglyph is inexpensive but narrow in scope. Polarized and shutter stereoscopy offer a high-quality, convincing depth effect with a wide field of view but require more expensive, complex hardware. For many applications, these techniques may be overkill, while for others a compelling depth perception is essential.

The goal is to maximize benefits while minimizing costs, widening fields of view, and reducing potential for eye strain or discomfort. The new approaches could allow polarized and shutter stereoscopy to achieve near the depth effect and image fidelity of high-end systems at lower price points. Even so, for XR to reach mainstream audiences, simpler and more affordable stereoscopic techniques will likely still play an important role.

Autostereoscopy

Autostereoscopy refers to stereoscopic 3D display techniques that do not require any special glasses. Instead, they are able to provide binocular depth cues using displays alone. We can work with three main autostereoscopic approaches: parallax barriers, lenticular lenses, and holograms.

Parallax barriers and lenticular lenses work by redirecting light from different perspectives to the left and right eyes. They use layers of barriers, lenses, or both lenses and barriers that control the path of light to determine which information each eye perceives. The barriers and lenses are aligned so that each eye only sees the image from its designated viewpoint, creating binocular disparity and the perception of depth.

Pros of these techniques include convenience (no 3D glasses required) and suitability for mobile and immersive VR/AR. On the other hand, they also typically provide a more limited field of view (FOV) and shallower depth effect than other stereoscopic methods due to constraints on light redirection. The depth effect may look unnatural or "pop out" too much at close proximities. They also require precisely engineered barriers and lenses, adding to cost and complexity.

Holographic stereoscopy encodes multiple views into a single light field, allowing perception of depth at variable focal planes. It uses fewer optical elements, expanding field of view and depth effect compared to other autostereoscopic approaches. On the other hand, it requires expensive holographic materials and components to achieve, which limits its mainstream applicability. Some depth blurring or "sweet spot" effect is also typically perceived, where the 3D effect varies noticeably with minor head movements.

Overall, autostereoscopy is convenient, mobile-friendly and avoids the need for special glasses, but its depth effect is often more limited and shallow compared to lens-based stereoscopic techniques. For some applications like mobile VR, its benefits may outweigh its deficiencies, but in other cases, if a fully convincing sense of depth and immersion is necessary, it will require alternative approaches.

Going forward, improved barrier/lens designs, new optical materials, and multi-view/super multi-resolution techniques could enable wider adoption across XR. Even so, for mainstream audiences, it may remain most suitable and compelling for certain mobile use cases or as part of a multi-technique solution.

Head-tracking

The last and the most important one, which will get a lots of space in this article. Have you ever wondered how the video game characters know exactly where you're looking? Or how your smartphone can track your head movements to adjust the display? It's all thanks to head-tracking.

Head-tracking refers to the ability of XR systems to detect and respond to users' head movements in real-time. It tracks the orientation and position of the head using sensors, and continuously updates the stereoscopic rendering to match the user's perspective. This creates an immersive, responsive visual experience even as the user navigates naturally within the virtual space.

Head-tracking provides several key benefits for XR:

  1. It enables a wider field of view without breaking the illusion of immersion. Instead of limiting perspective to certain orientations or ranges of motion, head-tracking allows the user to look in any direction, expanding their view of the world.

  2. It enhances presence through responsive interactivity. Objects react instantly as the user looks around, reacting in real-time to their perspective and movements. This fosters the feeling of physically inhabiting and exploring the space.

  3. It reduces discomfort from perceptual mismatches. Without head-tracking, the visual and vestibular systems can provide contradictory information about orientation, causing dizziness or nausea. By updating rendering to match the user's head position, this is avoided, enabling more comfortable immersive experiences.

Head-tracking also allows for real-world obstacles and constraints to be detected, providing safety mechanisms for XR interfaces. The system is able to detect if the user is looking in a direction that would collide with real-world objects, automatically adjusting the virtual world rendering to avoid collision. This helps prevent the user from bumping into objects or tripping in the real world due to focused attention on the virtual.

In this case we deal with several significant costs and complexities. It requires additional sensors (like motion tracking cameras) and computing power to detect head motion, update perspective, and render the scene accordingly in real-time. This adds to system size, weight, cost, power consumption, and latency. It also introduces complexity that can negatively impact reliability and responsiveness.

For many XR use cases, these downsides outweigh the benefits of head-tracking. A narrower but still adequately immersive experience may be possible and preferable without the added constraints of real-time perspective adjustment. As such, head-tracking is often viewed as a premium feature, improving experiences for certain high-end consumer or professional use cases where its impact on presence and comfort justify its costs.

To further minimize hardware requirements, reduce latency, and improve responsiveness and robustness of head-tracking is a subject of research. New sensing techniques, more efficient rendering optimizations, and deeper learning algorithms could bring its capabilities to more mainstream XR at lower costs and complexities. Even so, for broad adoption, alternatives like limited range-of-motion interfaces, redirected walking, and hybrid head/eye-tracking solutions may play an important role by achieving immersion and interactivity benefits with fewer constraints.

To summarize, pros of head-tracking:

• High immersion: Immersive experience by continually updating perspective to match the user's head movements is unparalleled. It allows users to look freely in any direction without breaking the illusion of presence in the virtual space. Responsive interactivity and matching of visual and vestibular cues enhances the feeling of physically inhabiting and exploring the world. It translates natural motion into presence, empathy and exploration within immersive spaces.

• Reduced discomfort: Without head-tracking, contradictory information from vision and balance can cause symptoms like dizziness, nausea and eye strain. By updating rendering to match the user's head position, these issues are avoided. This techniques aligns virtual and vestibular sensory cues for wellbeing and acceptability.

• Safety mechanisms: It can detect if the user is looking in a direction that could collide with real-world objects. It automatically adjusts the virtual world rendering to avoid potential collisions, preventing the user from bumping into objects or tripping due to focused attention on the virtual space. It basicallyacts as a safety guard that maintain the boundary between immersive illusion and physical reality.

Cons of head-tracking:

High cost: It requires additional expensive sensors (motion tracking cameras) and more powerful processors to detect motion, update perspective, and render the scene accordingly in real-time. This adds significantly to system size, weight, cost, and energy usage. The added hardware and computing requirements of head-tracking increase expense, bulk, power draw and fragility.

Increased complexity: There’s additional complexity that can negatively impact reliability, responsiveness, and ease of use. Precisely synchronized motion tracking and rendering is required, and complex algorithms are needed to detect motion, update perspective, and avoid collisions — all with minimal latency. Bugs or delays can reduce immersion. The complex, precision-dependent nature of head-tracking amplifies opportunity for issues that undermine presence, comfort and safety.

• High latency and power requirements: The constant sensing, processing and rendering necessary for head-tracking requires low latency, fast response times, and considerable computing power — especially for complex, dynamic virtual environments. This results in higher energy usage, heat generation, and cost. Lower-end systems may struggle to achieve the performance needed. The demanding, constantly-active requirements of head-tracking necessitate more expensive, less efficient hardware to support presence and safety.

• Diminishing returns: While head-tracking enhances immersion up to a point, the benefits level off beyond a certain threshold of accuracy, responsiveness, and range of motion. Very high precision or unlimited motion tracking provides diminished returns for most use cases, increasing costs more than value. There is a practical limit to the immersion gain and interactivity benefit. The desirability and feasibility of increasingly advanced head-tracking capability decreases rapidly relative to additional cost and complexity.

As mentioned, head-tracking provides diminishing returns beyond a certain threshold of accuracy and responsiveness for most XR systems. These are the limitations this poses to XR systems:

Practical limits to immersion gain — While head-tracking enhances immersion and presence up to a point by more closely matching perspective and motion, the benefits level off beyond a certain accuracy and responsiveness threshold for most use cases. There is a limit to how much more immersed a user can feel with incrementally improved tracking beyond this threshold.

Point of diminishing marginal utility — Additional investment in sensor precision, processing power, and response time provides less and less value as performance improves beyond the level needed to avoid perceptible issues. The marginal utility of extra accuracy or decreased latency continues to decrease rapidly relative to additional cost and complexity.

Factors beyond tracking — Other design elements like content, interfaces, social presence, and applicability to real-world tasks have a greater impact on immersion and experience quality beyond a certain performance threshold. At its limit, the most advanced tracking system cannot compensate for weaknesses in these other factors.

Technical and practical constraints — There are inherent limits to the precision, accuracy and responsiveness that is feasible, affordable and sustainable across systems and use cases. Extreme head-tracking performance may be possible but impractical for broad adoption. There comes a point where constraints outweigh potential benefits.

Use case variability — Different applications and user groups have different needs in terms of the indispensability and value of ultra-high precision head-tracking. What enhances presence for one goal or audience may be unnecessary for another. One-size-fits-all solutions often provide diminishing returns.

Head-tracking is instrumental in progressing XR by translating movement into deeper levels of presence, responsiveness, and realism within synthesized worlds. Even with limitations, its ability to detect and respond to natural motion in real-time has shaped how we perceive, navigate, learn, work, communicate and play within immersive spaces. By achieving presence without prohibitive constraints, its possibilities for presence may align with those of human senses and empathy itself.

Conclusion

As stereoscopy technologies continue to advance, becoming smaller, higher resolution and less obtrusive, we may soon reach a point where the illusions they create are effectively indistinguishable from unmediated visual perception. But even now, the revolutionary effect of artificially inducing true 3D vision through stereoscopy remains an impressive triumph of technological ingenuity. It demonstrates what we can achieve when we think creatively to reproduce and even augment how our amazing human senses naturally operate.

Title image credit: Carolina Costa