Imagine being able to talk to your bank account and conduct transactions without the need for a phone or computer. That's the reality of voice-based banking, a fast-growing industry expected to reach $3.7 billion by 2031 (according to
Allied Market Research). Recently, however, a big BUT emerged that threatens the whole industry.
Back in 2020, I was a big proponent of voice banking myself and firmly believed it would be the next big thing. I was also seriously considering integrating voice payment technology into
SDK.finance's digital banking solution.
Let me share a quote from one of my 2020 emails, "At SDK.finance, we believe that a voice-based payment experience is the future of FinTech. Therefore, we are working on 'Voice Payment Technology".
We even went as far as creating a demo and started using
Voiceflow to design a voice-based experience, but eventually the development was paused, much to my regret.
Today, I have been deeply concerned about the future prospects of voice-based interfaces since hearing about Microsoft's recent release of VALL-E. Microsoft promises to clone any voice from just 2-3 seconds of an original recording. It sounds unbelievable,
but if it's true, the world of voice-based remote communications will never be the same.
Since voice cloning allows the creation of fake audio clips or voice commands that sound exactly like the person's original voice, it raises many questions about the possible misuse of this technology. I will not even talk about cases where you receive a
call from an unknown number and hear someone who sounds like your father, daughter or wife. Let us just focus on FinTech-related threats.
The security of voice-enabled remote banking systems is becoming an important issue for financial institutions that support this feature and for customers whose financial data could be compromised. Imagine anyone being able to clone your voice and communicate
with your bank on your behalf: for example, to check your balance or transfer your money to an external account. This sounds like very bad news for voice-based interfaces around the world.
In light of this, our decision not to move forward with voice-based interfaces - which felt like a serious shortcoming - now seems like the right strategy. "Doing nothing" has won out over a significant investment in development).
So, in this article, I would like to take a closer look at voice cloning and the dangers it poses to the voice banking industry.
How big is the voice banking market?
The voice banking market is a relatively new and fast-growing industry, driven by advances in artificial intelligence and increasing demand for voice-based services and technologies. According to a
Markets and Markets report, voice banking is a subset of the larger voice technology market, which is expected to be worth more than $127 billion by 2024.
Voice banking market by technology
Source: Allied Market Research
Allied Market Research, the global voice banking market has seen remarkable growth and was valued at $984.6 million in 2021 and is expected to continue its upward trajectory, projected to reach a staggering$3.7 billion by 2031, growing at a CAGR of 14.5%
from 2022 to 2031.
This significant growth can be attributed to several factors, such as the growing popularity of voice-based services for banking and financial transactions, increasing demand for customized voice experiences, and widespread adoption of voice-activated virtual
assistants and chatbots.
Key players on the voice banking arena
Fiserv, two-thirds of consumers either don't believe or are unaware that voice banking is possible. Nevertheless, quite a number of financial institutions have picked up on this trend.
ING was among the first banks to introduce a voice control function in its app back in 2014, which used Nuance’s voice-recognition software to launch Inge, the assistant. Some banks introduced their own digital
assistants and chatbots, while others opted to integrate their services with popular voice assistants such as Siri, Google, and Alexa.
Capital One, for example, allows consumers to use Alexa for hands-free payments, balance inquiries and expense tracking.
Barclays has integrated with Siri to enable quick mobile payments to contacts without opening or logging into the banking app.
Santander in the UK has updated its technology to allow customers to use voice commands to make transactions, transfer money and report lost or stolen cards.
Visa, in partnership with
Abu Dhabi Islamic Bank, is introducing biometric voice and voice-based authentication for e-commerce. The system uses biometric sensors built into a standard smartphone.
In addition, some banks have launched their own digital assistants, such as
Bank of America's Erica, a financial management assistant that can help consumers monitor their loans.
However, the latest voice cloning developments can pose serious obstacles on the way of voice banking and make the companies using it seek for ways to prevent possible fraud.
Voice cloning technology: what’s this?
The news that "Microsoft is working on an AI called VALL -E that can clone your voice from a 3-second audio clip" resonated with the community. And that’s not it - VALL -E not only can clone a person's voice from a 3-second audio clip, but also can synthesize
other words from it, process text in 23 languages and capture the context and meaning of a sentence rather than just translating it word for word. Impressive, isn’t it?
The technology behind voice cloning is based on deep-learning algorithms trained on large datasets of recorded speech samples from a given speaker. These algorithms analyze the speech patterns, intonation and other characteristics of the speaker's voice and
use this information to create synthetic speech that sounds like the original voice.
Voice cloning can be used in a variety of ways, such as for creating digital voice assistants and chatbots, for adding voice overs to videos and other multimedia content, and even for creating synthetic voices for people with speech disabilities.
The risks of voice cloning technology to the banking and payments industry
In spite of the many ways this technology can be used for the good purposes, there are numerous concerns about its potential misuse, like creating synthetic voices of public figures or others without their consent for malicious purposes, or payment fraud
through voice banking.
For example, cybercriminals used the voice-cloning capabilities of AI technologies to
clone the voice of a German company’s CEO. They called his counterpart in the United Kingdom and had him transfer $243,000 to their account. As a result, payment companies must consider that voice-based interfaces, such as voice assistants, may be vulnerable
to hacking through voice cloning technology. If an attacker can clone a person's voice with just a short audio clip, they could impersonate that person and potentially gain access to their sensitive data.
With a cloned voice, an attacker could bypass bank voice authentication systems, voice-activated locks in homes or offices, or other systems that rely on voice recognition. In addition, attackers could use a cloned voice to make fraudulent phone calls or
send phishing emails that appear to be from a person the victim trusts.
For example, AI voice cloning was used in a huge
$35 million heist being investigated by investigators in Dubai, who have warned about the use of this new technology by cybercriminals. The criminals used the voice-altering technology to pose as a company executive who needed the money transfer for a takeover.
The United Arab Emirates representatives, who are investigating the crime as it affected local businesses, believe it was a sophisticated scheme involving at least 17 people who moved the stolen money to accounts around the world.
Many banks in the United States and Europe utilize voice authentication to permit customers to access their accounts via telephone. Some of them promote voice identification as equivalent to fingerprint identification, providing a secure and user-friendly
interaction with their services. However, one experiment shows that biometric security based on voice is not reliable, especially in a world where synthetic voices can be generated in a few minutes.
Voice cloning technology
allowed the user to trick the Lloyds Bank security system and log into the account using a service offered by ElevenLabs, an AI voice firm.
This makes it possible to spoof any person's voice and steal information from their bank account, and many banks offer similar voice verification services. Even The Consumer Financial Protection Bureau that regulates the financial industry, showed concern
with data security: “We expect that any firm follows the law, regardless of technology used.”
The cases above highlight the need for financial institutions to implement robust security measures and additional authentication methods to prevent voice cloning attacks used to manipulate biometric authentication systems and gain access to sensitive financial
Ways to prevent voice banking fraud
To minimize voice hacking risks, it's important to use multifactor authentication and other security measures in conjunction with voice-based interfaces.
Fintech companies should implement multi-factor authentication methods that include a combination of voice biometrics, passwords and other forms of identification. In the context of preventing voice cloning in the fintech space, multi-factor authentication
can help increase the security of voice-based transactions and prevent fraud.
Robust voice biometric systems
Organizations should deploy robust voice biometric systems that use advanced algorithms to detect attempts at voice cloning. Such systems can also detect other forms of voice manipulation, such as replay attacks or impersonation.
Although voice biometrics and voice cloning are both based on a person's voice, they are two different technologies that serve different purposes. Voice biometrics is a technology that uses a person's unique voice characteristics to verify their identity.
It analyzes various voice characteristics such as pitch, tone, and rhythm to create a unique voiceprint for each person.
Biometrics are already used in many fintech apps for verification (e.g., Apple's thumbprint or facial recognition, which sometimes serve as login credentials for fintech apps). But voice biometrics could be even more secure, as the human voice contains more
than 100 unique identifiers.
Continuous authentication is a method of continuously monitoring user behavior and activity to detect suspicious activity. This can include monitoring the user's voice pattern and behavior during a call or transaction to detect anomalies.
Conversational AI for transaction verification
Conversational AI enables automated systems to respond to human speech in real time to create engaging and realistic conversations. This technology can serve two crucial purposes in fraud detection: firstly, it can enhance consumers' confidence in voice
bots, and secondly, it enables voicebots to gather the necessary data to authenticate legitimate transactions and identify and reject fraudulent ones.
Education and awareness
If recordings of your customers are available online, whether on social media, YouTube or your employer's website, there may well be a secret battle going on to control your voice without you knowing. Financial institutions can educate their customers and
employees about the risks of voice cloning and how to protect themselves from such attacks. This can include providing guidance on how to verify the identity of the caller or the legitimacy of the transaction before revealing sensitive information.
Regular testing and evaluation
Financial institutions should regularly test and evaluate their voice recognition systems to ensure they are robust enough to detect and prevent voice cloning attacks. This may include testing the system with a variety of cloned voice samples and continuously
updating the system with the latest security patches and updates.
Therefore, preventing voice cloning fraud in fintech requires a multi-layered approach that combines advanced technology, user training, and regular testing and evaluation. It's also important for companies and individuals to be aware of the latest developments
in voice cloning technology and potential security threats, and to take appropriate protective measures to protect from these threats.
Allied Market Research. (2022). Voice Banking Market by Technology: Global Opportunity Analysis and Industry Forecast, 2021–2031.
Fiserv. (2021). Voice Banking: Understanding the Benefits and Limitations of This Emerging Technology.
Markets and Markets. (2019). Voice Technology Market by Component (Hardware and Software), Application (Assistance and Access, and Authentication and Verification), Deployment Mode, Organization Size, Vertical, and Region - Global Forecast to 2024.
Merchant Savvy. (2022). Voice-Activated Payments and Banking: 5 Benefits and 5 Risks.
Sandhya, N. (2022, January 31). Voice Cloning: A New Frontier in Cybersecurity Threats. The Economic Times.
Visa. (2021). Abu Dhabi Islamic Bank, Visa and Globee® Launch World's First Biometric Voice and Voice-Based Authentication for E-commerce.