The last month was a busy one for VR. In terms of content, VR on the Lot showcased some of the exciting work we can expect in the coming months. From a hardware perspective, announcements of Google Daydream and at Oculus Connect 3 have crystallised the importance of an untethered experience for VR. From Google, the launch of Daydream was pretty much as expected, and well received. Importantly, the launch their own highly specified Pixel phone to enable vertical integration of their ecosystem is an important step to improving mobile VR experiences..
Samsung has prioritised voice input and an AI engine to stay competitive with Apple and Google's assistants. From a VR perspective, their purchase of Viv (the AI assistant platform from the makers of Siri) demonstrates a recognition that for mobile VR to work well, voice interaction will be pivotal.
At Oculus Connect 3, Facebook announced their untethered VR headset which is not directly competitive with their Samsung partners, but may represent a way to plug the gap in their potential user base of:
- high end = CV1,
- middle end = untethered Oculus,
- low end = Samsung Gear.
Mark Zuckerberg said he anticipates that like many communication platforms, VR will eventually be about 40% social, with the rest of the market being concerned with games, enterprise and content consumption. But something is missing to make this a truly sociable experience. Currently, the main players in social VR are Altspace, High Fidelity, and vTime. All of these platforms suffer with the same issue of blank-faced, unemotional avatars. High Fidelity has demonstrated the capability to enable facial expressions once the hardware is available.
Facebook's demo of their virtual avatars was an interesting approach to the problem of emotional interaction. Michael Booth, their head of social VR said that they have developed a language “...that triggers your avatar to make certain emotions...we can’t just be a blank presence." These so-called "VR emoji" require users to use hand gestures in order to create facial expressions...
- Shake your fist = “angry.”
- Put your hands on your face = “surprise.”
- Thrust your hands in the air = "joy"
This is a neat solution whilst the camera-based facial tracking is still being developed, although the computational demands of having an eye tracker and lower facial tracker mean that this will not be coming to mobile VR any time soon. Using hands to indicate emotional interaction has its issues, which we'll outline below.
- Multi-tasking is bad for social communication
Apart from the obvious difficulties of expressing emotions whilst performing a task with your hands, VR emojis will inevitably lead to miscommunication. Look at the images below- shaking the fists may indicate positive or negative emotions- but the face provides the salient information.
Also, consider this image- the context is what's important, but facial expressions will always provide more information than hand gestures.
2. A camera-based solution for facial expression analysis has implications both in terms of computational cost, and form factor. A couple of startups are using a depth camera to track the lower face. As mentioned above, the lower face have gives very limited information about the emotion of a wearer. Movements of the lips without analysis of the eye region can lead to totally wrong conclusions about underlying emotions. Consider images below of emotions research pioneer Dr Paul Ekman. Which of the two smiles is genuine, A or B? Without being able to measure the subtle contractions of the muscles that surround the eye, one might assume that both smiles are the same. Importantly, whereas in older people the skin around the eyes readily crinkles (so-called "crows feet") in younger individuals with tight elastic skin, subtle muscle activations may not be visible which is why electromyography is often used by researchers investigating subtle, unconscious or rapid changes.
Our solution is based on a low cost multisensor interface and artificial intelligence that makes it swappable with any HMD. Rather than having a camera-based solution that cantilevers from the HMD, with the added computational cost.
The image below from Eyetribe shows what an eye tracking study teaches us about face to face interaction. We spend the majority of time looking at the eye region as this provides the most important information about a persons emotions. We look at the mouth area predominantly to help with speech comprehension (see the McGurk effect)
To examine the potential for social interaction in VR, it is useful to consider previous research on the taxonomy of non-verbal communications which includes-
- Body proximity
- Eye contact
- Facial expressions
Changes in body posture are used to indicate interest (e.g. turning towards a person in conversation) as well as leaning in. The study of how people use and perceive the space that surrounds them is called proxemics. It is well recognised that the space between the sender and receiver of a message influences how the message is perceived. Different cultures maintain different standards of personal space, which are affected by gender, social situation and individual preferences. Problems reported with early social VR in Kent Bye's Road to VR podcast suggest that his is an issue that will need to be addressed as adoption increases
Gestures come in different forms such as emblems to represent accepted meanings such as the outstretched palm "talk to the hand" or the "Ok" sign made with index finger and thumb
Gestures can function cross-culturally to enable communication across language barriers. Using limb gestures to create facial expressions is clearly a stopgap until technologies such as Emteq's become integrated into HMDs. Anyone who has spent time using the Leap Motion or Myo Armband will know how tiring it can get trying to maintain constant hand movements. There is a more fundamental problem- hands are designed to "do" whilst faces are designed to "tell". If your hands are telling, who's doing the doing?
Posture is an important social signal, and to 'lean in' has a more literal meaning than the one communicated by Sheryl Sandberg. Numerous studies have shown that movement of a persons head is toward a point of interest (e.g. whilst concentrating). Again, context is key because this finding does not hold true if hand movements are not taken into account. Sometimes the head will lean forward when the individual is bored and resting their head on their hand.
Eye contact is vital to create an emotional connection. However just tracking eyes gives limited information....
- Eye tracking -> What they are looking at
- Emotion tracking -> How they are responding -> Why are the looking at it
If at lunch a large cockroach walked across your plate, you would definitely look at it whether your an insect lover or not. But it's your facial expression that would instantly communicate how you actually felt.
Touch (or haptic feedback) is currently a technology area being investigated in a number of R&D departments and startups. One of the key limitations is force feedback, which is difficult to achieve without bulky hardware.
Face the future...
We've built a technology that measures facial muscle activity through the skin and enables more immersive interactions. You can talk to the hand, but let's face it...there is a better way.
For more information visit Emteq.net