Artificial intelligence, especially in the case of machine learning is really at its best when it is able to work in a large data set, such as text. The problem is that a lot of data in the world is not available in text form. Instead, it is available in spoken words, audio recordings or even in live events. This makes voice transcription a very important goal when you look at AI in general. If you want to find out more about some of the emerging AI tech that is being used in voice transcription, then take a look below.
The Current Market for AI
Right now, the market for AI transcription is split between incumbents and start-ups. They are approaching the market very differently, but there are some exceptions here. Bigger players are offering speech to text and packaging it as an API, or as part of a much bigger product. There are some exceptions though, start-ups are exploring business models so that they can sell transcription software directly to any customers that they have.
Major tech companies such as Google, Microsoft and even Baidu are all involved in the space, but that involvement runs from research projects. A lot of the commercial projects that are happening right now do seem to focus way more on dictation, or transcribing one voice. The computer can then be trained ahead of time. Verbit is a leader in transcription and is currently way ahead in the tech world. Some might say that they can even be compared to some of the big names around too.
The Big Names
Some of the big names include Google, Amazon, Nuance, Cisco and even Apple. They have been looking into voice recognition since the early 90’s and the research has been pushed forward with the emergence of various virtual assistants. This includes Alexa, Google Voice and even Cortana. MS Artificial Intelligence has made headlines quite a few times. The first was when they published a paper which showed that they have archived parity with various human transcribers. This was done in a bid to try and increase the accuracy overall.
When you look outside the US, you will soon see that Baidu, a Chinese tech giant has been a huge leader when it comes to AI. They have implemented a very deep learning network so that they can create the deep speech project. In this day and age, world-class speech recognitions are only able to function if they use data from third-party providers, or if they recruit graduates from the speech and language tech programs that are being used right now.
Baidu believes in a highly simplified pipeline, and they also believe that speech recognition tech should be democratized. This would then make it similar to neural networks and even recognition software too. When you look at the commercial side, you will also see that Baidu has SwiftScribe, which is in beta. This can transcribe from a recording but it is in its very basic stages. It cannot transcribe in real-time and this puts it behind its competitors. Of course, there are many other advancements to come but right now it looks like things are moving in the right direction.