"The addition of this new and exciting science to the surgical armamentarium is an important step and is a virtual certainty"

The quote above is not in response to the recent ground-breaking live VR surgery performed by Shafi Ahmed. In fact I wrote these lines in the British Journal of Surgery 22 years ago in an article for my mentor, the pioneering surgeon Ara Darzi. It was 1994 and I was in my last year of medical school. Perhaps I should have been focussing on my impending final exams, but the potential for virtual reality in education, training and teleconferencing were intoxicating. Re-reading the article, I cringe at my gushing enthusiasm - "it may be five or even 10 years before computers are capable of producing convincing images...". I glossed over the fact that the Silicon Graphics computers required to run the system cost $60,000 or more and the upper end of performance was 15 frames per second.

To many, the current excitement about VR may seem equally overblown, but as I've previously grazed my knees falling off the Hype Cycle, I know that this time it's different...

A recent Forbes article by Tom Goodwin argues that VR needs to be classified in order to understand the content and experiences. Tom's "6 dimensions of virtual reality" are categorised as:-

  1. 360 photos
  2. 360 video emulation (watching a 2D screen in a 3D virtual environment)
  3. 360 video
  4. Directional movement
  5. Interactivity (input devices)
  6. Haptic feedback

Whilst l think that this framework is useful, it does have major issues. In particular, it mixes VR outputs (screen performance) with VR inputs (head tracking, limb tracking). Also, it omits one of the most important features for immersion and presence, namely 3D audio.

I propose a classification of input devices that view VR platforms from the perspective of the level of interactivity and chronological introduction. Viewed in this way, the missing link that stops VR from becoming truly immersive become clear.

1st Generation VR Input

1st Generation VR Input

First generation VR input
A first generation input device can be considered the motion sensor linked to the screen and display driver which translates the head position to the scene. This enables basic interaction by creating a pointer (reticle) on the screen.

Second generation VR input

VR systems literally and metaphorically made a step forward when the software was able to interact with the user’s limbs. A variety of pointing devices were introduced in the 1990s, notably gloves and treadmills which enable the wearer to move around and include a representation of their hands in the virtual scene. Even without tactile feedback, the introduction of wireless and camera-based limb tracking considerably improved interactivity.

2nd Generation VR Input

2nd Generation VR Input

3rd Generation VR Input

3rd Generation VR Input

Third generation VR input

Until recently, wearable eye-tracking has been a niche and comparatively expensive technology, mostly confined to academic uses and market researchers. However, the potential for foveated rendering has increased interest with the promise of a marked reduction in the computational demands of high resolution, low latency image display. The other benefit of adding eye-tracking to VR is that it enables more realistic interactions between the user and virtual characters. Speech recognition, a technology that has also benefited from the smartphone revolution can add to eye-tracking by enabling categorical commands such as looking at a door and saying ‘open’. Integrated eye-tracking HMDs from Fove and others will be available in late 2016. 

Fourth Generation VR Input

Facial expressions are the important missing element from VR interactions. HMDs with depth cameras attached have been used to visualise the lower face, but whether this method proves popular in future is yet to be seen. Three potential reasons why this approach may be problematic relate to i) the way humans interact, ii) ergonomic concerns and iii) computational and battery life considerations. One of the things that eye tracking research has told us, is that during face to face interactions we infer a lot of information from the eye region. Surprise, anger, disgust and a genuine (Duchenne) smile all require visibility of the brow area and the skin typically covered by the HMD. Hao Li for Oculus research has incorporated stretch sensors in the foam interface of the HMD to derive information from behind the headset, and it will be interesting to see how this performs when the final version is released.

4th Generation VR Input

FaceTeq is a platform technology that uses novel sensor modalities to detect the minute electrical changes that occur when facial muscles contract. With each facial expression a characteristic wave of electrical activity washes over the skin and this can be detected non-invasively and without the need for cameras. Our light-weight, low-cost open platform will herald the 4th generation of VR. Researchers, developers and market researchers will undoubtedly be the initial adopters. However the real advance will be the ability to enable face-to-face social experiences.

There are so many areas where facial expressions in VR could improve communication and interactivity. We will be opening our platform and are excited to see what ideas developers come up with. At Emteq, we're passionate about fostering the 4th generation of VR and are looking forward to partnering with headset manufacturers and content creators. Follow us to learn more about the possibilities and to stay up to date with developments.