The potential for speech recognition to augment and enhance mobile banking has been expressed several times in this group over the past few months. It makes sense. After all, m-banking apps have the potential to offer a vast array of options to search and
navigate, all of which can result in a poor, time-consumning user experience, compounded by the fact that we all have fat fingers when it comes to mobile screens and keypads. For actions like checking account balance, displaying recent transactions or latest
statement, or initiating payments, voice input is quick and easy. Contextualised speech recognition gives us a simple, fast and convenient method of interacting with our mobile apps, and what could be more intuitive than speaking to them! It might not
always be appropriate for every circumstance, but speech should be available as an additional modality for when the user wants a fast and easy way to search, navigate and initiate actions in a single step (utterance).
So what is the state-of-the-art in contextualised speech recognition, and can it be made to work reliably given a wide range of accents and the multitude of ways a user might request an action? The short answer is ‘yes’! The “smarts” in this technology
are twofold: first, in recognising spoken words, including domain jargon, given the variety of accents and regional dialects. This is a task that can be done very reliably provided the appropriate language modelling is done. More complex is the
understanding of a complete spoken utterance so it can be mapped to a specific and relevant action on the device. This is where natural language processing comes in – and even here, significant advances have been made over the past few years, aided by
greater computing power and speed. For a particular application domain such as m-banking, language models and interpretation components need to be developed and combined with sophisticated machine learning techniques so that a domain-specific natural language
understanding system can quickly learn and refine itself constantly as it’s used, leading to ever more reliable exploitation of the app’s context.
So speech recognition and natural language understanding have come a long way and Apple’s Siri has shown us how it can be applied in a general way and integrated with several applications on a device. For an application like m-banking, what’s needed is a
domain-specific speech recognition and understanding capability, for which the technology and expertise is out there for it to be developed and deployed. It just needs the banks to embrace it and trial it because it will greatly enhance the user experience
on mobile devices.