Russia's Tinkoff Bank is to sell the proprietary speech recognition software behind its chatbot Oleg to corporate customers.
Tinkoff VoiceKit features deep neural network models for speech recognition and synthesis developed by Tinkoff over the recent years as part of its AI First strategy.
With terabytes of data and many thousands of hours of human speech to learn from, it can correctly recognise up to 95% of the words in a spoken phrase regardless of the audio quality, says the bank, eliminating any noise in a phone conversation, as well as handling crystal clear speech.
In 2018, Tinkoff embraced such neural network models as WaveNet, Tacotron 2 and Deep Voice to roll out a proprietary speech synthesis technology, creating voices that are almost indistinguishable from genuine human speech.
The bank says the kit can be used for a variety of purposes, from the creation of voice assistants and robots to man the phones at call centres to the production of audiobooks and voice-overs in video editing. A production system for individual users is also in the works.
Vyacheslav Tsyganov, VP and CIO at Tinkoff, comments: "We had a strong team of developers, 80 video cards, more than 15 thousand hours of audio from public sources, many thousands of hours of phone conversations coming through our call centre, the Kolmogorov supercomputer and a voice-over actor ready to spend five months recording the speech synthesis material. Over the three years, we have timestamped more than 4.5 thousand hours of speech and trained deep neural network models to create what is now available as Tinkoff VoiceKit."
The technology will be made available as an API, both for live recognition and for batch offline processing.
Says Tsyganov: "If the customer needs system reconfiguration or an on-site solution, we will seek to engage major integrators to help out. Mobile SDKs for iOS and Android are in the pipeline too."