Event Coverage

NVIDIA GTC 2015: Day 3 – Baidu on Deep Speech and Baidu Eye

By John Law - 20 Mar 2015

GTC 2015: Day 3 – Baidu on Deep Speech and Baidu Eye

Andrew Ng, Chief Scientist of Baidu, Chairman and Co-Founder of Coursera.

It’s our third and final day here at GTC 2015. As you already know (and as we’ve mentioned in our last two articles), Deep Learning is a big issue to NVIDIA, and it’s a primary theme throughout this year’s conference.

Image recognition, speech recognition, and behavioral recognition. These are the basic elements through Deep Learning is built upon.

Taking the stage today was Andrew Ng, Chief Scientist of Baidu, Chairman and Co-Founder of Coursera. In all honesty, we actually find it to be both a little ironic and interesting that the direct competitor to Google in the PRC was one of the adopters of Deep Learning.

Andrew says that Deep Learning is like building a rocket, with large neural networks acting as the engine, and data as the fuel.

Andrew likened the idea of Deep Learning for Baidu to that of building a rocket, with its large neural networks acting as the engine that drives the process of Deep Learning, and the data that is collected, sifted and then compiled as the fuel. To explain this in layman's terms to us, he showed us a picture on how Baidu’s own Deep Learning AI was taught how to identify a coffee mug within a picture. From there, the engine would then begin to scour the Internet for images of different mugs via its deeply embedded neural network.

Once again, image recognition is seen here as part of Deep Learning.

But to get to this speed of searching, Andrew credited NVIDIA’s CUDA and GPU technology as the backbone to Baidu Deep Learning functions and capabilities.

During his keynote, Andrew also explained his take on the idea of Deep Learning, and how Baidu was, rather than going at the concept in a broader spectrum, Andrew said that he was more focused on a more specific section of Deep Learning: Speech recognition. More specifically, speech-to-text input and translation.

A diagram of how Baidu's Deep Speech program operates.

Speech-to-Text input on Baidu's own Deep Speech program allows their engine to capture each word accurately, even with all the background noise.

It’s true that the more popular search engine, Google, has already got their own speech-to-text input and translator firmly embedded into their ‘OK, Google’ function, but what we saw Baidu’s own Deep Speech AI do was actually noteworthy. Andrew’s demonstration of Baidu’s Deep Speech showed how the company’s engine was able to directly pick up the speech from a clip, word for word, with near-perfect consistency. Even with the ambient background noise set at a considerably noisy level, the Deep Speech program was still able to audibly capture the words that were spoken within the clip.

Meet Li Chongyang. Li is blind, but thanks to Baidu's Deep Speech initiative, though, Li is able to go about his daily life like any other person.

The Deep Speech program wasn’t just a concept that Andrew has gotten Baidu to just work on, it was already being implemented with a person in the PRC. Li Chongyang, a 24-year-old boy in China, depends primarily on Baidu’s Deep Speech technology to help him get through his day-to-day activities. The reason behind his dependency? Li is blind. Thanks to Baidu, however, Andrew says that Li’s spirit was always high and that thanks to Deep Speech, he didn’t feel handicapped at all.

Andrew announced the Baidu Eye, a wearable headset that is built around the concept of Deep Learning.

In relation to Deep Speech, Andrew also announced that Baidu was already developing their own wearable headset. It’s called the Baidu Eye, and at a glance, it was easy to mistake the device as a Google Glass headset. The difference with the Baidu Eye, however, was that there was no visual aid attached to the headset. Instead, the headset merely has a camera located on the right frame, which Andrew says will scan images, objects and environments in real time, and then convey the relevant information of the subject to the wearer via an earpiece which fits inside their left ear.

In closing, Andrew mentioned that at the rate Deep Learning is progressing and with their new Baidu Eye soon to be released for consumers, it was imminent that the concept of wearables and the Internet of Things (IoT) would dramatically be redefined, thanks to no small part to the speed in which GPU technology has allowed search engines to gather information queried by people more accurately and quickly.

That’s all from us here at GTC 2015. Hopefully, we’ll be back next year to give you all the updates you need from NVIDIA’s annual conference.