Exploring natural and multimodal applications

In the second module of the MIT CSAIL HCI degree, we’re discussion natural and multimodal interactions and their applications.

The discussion element for this module asked us to:

Think about the natural and nontraditional interfaces that you encountered in the enrichment activity in this unit. What potential opportunities do they represent? What part of multimodal or multi-sensory interfaces would you be most excited to be able to use?
Bethany LaPenta, MEng at Massachusetts Institute of Technology

Today, a great deal of work is happening in conversational user interfaces (UI), whether that is chatbots based on type interaction or voice. The seemingly rapid advancements in them manifesting from work in artificial intelligence (AI), natural language processing (NLP) and big data.

Companies including Amazon, Apple and Google releasing early adoption devices in the space have meant that training algorithms have had vast datasets to learn from – as have the data scientists and analysts sat behind them. It has meant that learning about user speech patterns and intent is being rapidly fine tuned and repeated – as long as you speak English that is.

Already devices like Amazon Echo are finding meaningful use-cases, creating accessible interaction methods for people who have otherwise been unable to interact with machines. A common issue with the current state of technology is the reliance on Glass and touch interactions which are more nuanced than most realise and entirely inaccessible for some people with arthritis to name just one example.

Mic Check, One, Two

I think the reason for this influx of voice UI devices and widespread adoption is that they are quickly surpassing novelty factor and become of-use, but there is something else at play that I believe is enabling its development. It is contextually responsive.

The feedback method, or how I get a response from a command is matched to the input method. If I ask Alexa to do something, it is predominantly going to have some form of audio output, whether that is Alexa acknowledging what I have requested, or because the task is to return some kind of audio content. It feels right because you have asked for something to be done and the response is contextual – I exchanged audio for audio.

This is where I find the current work in virtual reality (VR), augmented reality (AR) and mixed reality (MR) to be significantly flawed with no sense of resolve on the horizon. Physical, or simulated haptic feedback.

Touch has a memory

The best we’ve got in terms of creating physical feedback for interactions has been the invention of the rumble pack in games consoles. But the first rumble pack appeared on the Nintendo N64 in 1997, that’s 20 years with very little progression outside of being able to program more depth responses to add emphasis to the on-screen activity – complimenting a visual/audio output caused by a physical input.

N64 Rumble Pak circa 1997
N64 Rumble Pak circa 1997


You have been shot, your car is off the road,  danger is ahead, something big is approaching in an FMV sequence, all of these things will trigger a rumble pack within a controller but they’re not simulating the actual experience and the response is localised to your hands.

There have been a few experiments in the 20 years since to commercially expand on physical haptics, although with the current immersive gaming world beginning to prosper, there are a number of manufacturers of rumble vests such as those produced by KOR-FX. But event these seemingly military grade rigs fail to match the fact that our skin is a giant sensor, but the vest has localised hit points – and nobody is writing software that has that level of accuracy to the hit boxes anyway!

But don’t be mislead to this being a new concept, there have been developments in haptic suits, gloves, vests and just about every other garment since the early 90s.

Are we designing for ghosts?

The greatest difficult in the new technologies is at some point in time you are attempting to interact with a simulation of a physical item but you cannot grasp it, feel its texture, weight, density and all those other touch sensory behaviours we rely on, is it hot or cold, dry or wet?

It makes me wonder whether we are trying to simulate fantasy worlds or remote locations for humans to better interact, or experience them, or preparing us all for an afterlife where we exist disembodied and void of touch sensory capabilities, destined to forever float through our surroundings and to never collide with another atom ever again.

Patrick Swayze & Demi Moore, Ghost
Patrick Swayze & Demi Moore, Ghost

This is where I would like see advancements in interaction design.

We’ve already got off to a rocky start. Leap, Microsoft and Oculus have all created their own set of broken gesture based interactions for the environments they’re projecting and none of them are natural, continuing the broken unnatural interaction design that has been created since the dawn of computers.

The interaction pattern designs created by Microsoft through the Holo Lens project for example are already a mess of new interaction gestures for basic commands. The bloom for opening menus, point, pinch and even an attempt at multimodal combinations of gaze and voice commands still make for an awkward stunted experience.


Virtual needs Physical components

In 2015 I was blown away by the augmented reality sandbox created at UCLA. It demonstrated where I felt these new technologies were heading – the ability to use physical manipulations to simulate virtual experiences.

I wonder whether any of us are looking in the right direction when we are exploring these virtual realities. Should we be thinking more about complimenting the physical with meta-data?

There are some insightful concepts emerging, notably off the back of a simple post by Luke Wroblewski of Google that has gained traction called What Would Augmented Reality Look Like? Whether the shift towards peppering the environment with meta-data is the right approach or not, I’m uncertain but it feels like a good direction to go in.

This post is part of a discussion module on the MIT HCI Degree.