Blog article
See all stories »

Contextualised Speech Recognition ready for prime time

The potential for speech recognition to augment and enhance mobile banking has been expressed several times in this group over the past few months. It makes sense. After all, m-banking apps have the potential to offer a vast array of options to search and navigate, all of which can result in a poor, time-consumning user experience, compounded by the fact that we all have fat fingers when it comes to mobile screens and keypads. For actions like checking account balance, displaying recent transactions or latest statement, or initiating payments, voice input is quick and easy. Contextualised speech recognition gives us a simple, fast and convenient method of interacting with our mobile apps, and what could be more intuitive than speaking to them! It might not always be appropriate for every circumstance, but speech should be available as an additional modality for when the user wants a fast and easy way to search, navigate and initiate actions in a single step (utterance).

So what is the state-of-the-art in contextualised speech recognition, and can it be made to work reliably given a wide range of accents and the multitude of ways a user might request an action? The short answer is ‘yes’! The “smarts” in this technology are twofold: first, in recognising spoken words, including domain jargon, given the variety of accents and regional dialects. This is a task that can be done very reliably provided the appropriate language modelling is done. More complex is the understanding of a complete spoken utterance so it can be mapped to a specific and relevant action on the device. This is where natural language processing comes in – and even here, significant advances have been made over the past few years, aided by greater computing power and speed. For a particular application domain such as m-banking, language models and interpretation components need to be developed and combined with sophisticated machine learning techniques so that a domain-specific natural language understanding system can quickly learn and refine itself constantly as it’s used, leading to ever more reliable exploitation of the app’s context.

So speech recognition and natural language understanding have come a long way and Apple’s Siri has shown us how it can be applied in a general way and integrated with several applications on a device. For an application like m-banking, what’s needed is a domain-specific speech recognition and understanding capability, for which the technology and expertise is out there for it to be developed and deployed. It just needs the banks to embrace it and trial it because it will greatly enhance the user experience on mobile devices.


Comments: (3)

A Finextra member
A Finextra member 12 June, 2012, 12:39Be the first to give this comment the thumbs up 0 likes

Nice thought. Would like to differ a bit though.

Speech recognition technology (at different maturity levels) has been around for a long time now. However, I have not heard it as a mainstream success in any retail business. The key challenges that hurdle its widespread growth are consistency & security.

From a m-banking perspective it is very risky to leverage speech recognition for key functionalites like login, payments initiation, payments authorization and change of personal details. Further, I doubt very much if banking regulations will accomodate this. For non-transactional functionalites in m-banking, speech recognition can mitigate the risks of 'fat finger' typo errors. However, the cost associated to adopt speech recognition only for this few set of m-banking features (that does not generate direct revenue)  may not prove to be a valid business case for banks.

A Finextra member
A Finextra member 12 June, 2012, 13:38Be the first to give this comment the thumbs up 0 likes

Seeva, thanks for your comment.

I should clarify – I wasn’t implying that speech recognition directly impacts the bottom line – it clearly does not, but I would argue that indirectly, it does. There are plenty of industries where speech recognition has been successfully deployed, primarily as a “value-add” to improve usability and convenience, particularly so on mobile devices. Healthcare, in-vehicle navigation, government, education... the list goes on. The size and success of Nuance is testimony to the success of speech recognition, and why it’s a key part of Apple’s and Google’s strategies.

As for security, there are two points: first, in any well designed speech app that involves a financial transaction or modification of personal data, or a payment initiation, the final execution step has to be a touch or click so the user can review and verify what’s on the screen before confirming. So, I don’t think it need impact “banking regulations” at all. Second, with regard to login or other types of authorisation or authentication, voice biometrics has now advanced to the point that it is actually more secure than a password or PIN. E.g. see .

So my argument is that speech recognition is an additional modality that represents a significant value-add in the form of simplicity, speed and convenience. No other modality can reduce a lengthy multi-level navigation/search into a single step. As I’m in the business of speech recognition, I know customers value this enough to want to pay for it. Here’s a practical example: think of in-play sports betting. Mobile sports betting apps are complex and offer many hundreds of markets and within each market sometimes hundreds of possible options. Navigating to a particular market to view odds or place a bet can be a lengthy process and usability is one of the more common gripes of users of betting apps. Voice input offers a tremendous improvement in convenience and speed, and that, in turn, can mean increased customer loyalty. You can view a speech-enabled sports betting app prototype in action in this short youtube video: . It's easy to see how the voice feature offers a significant user benefit.

Ketharaman Swaminathan
Ketharaman Swaminathan - GTM360 Marketing Solutions - Pune 14 June, 2012, 10:54Be the first to give this comment the thumbs up 0 likes

You're probably right about the relevance of voice recognition in healthcare, government education, sports gambling and other apps. But, the functionality set currently offered in many mobile banking apps - a/c balance, last 5 transactions, mini statement, fund transfer, etc. - is so sparse that the icons are located comfortably apart on the screen, even in relatively small (2-3") screens, so fat finger is not a big problem. Of course, all that could change in the next generation of mobile banking apps and voice recognition could become important for them.

Member since




More from member

This post is from a series of posts in the group:

Innovation in Financial Services

A discussion of trends in innovation management within financial institutions, and the key processes, technology and cultural shifts driving innovation.

See all

Now hiring