News Categories

re:Invent 2017: AWS announces real-time video intelligence, speech and language tools

By Michael Low - on 30 Nov 2017, 8:00am

re:Invent 2017: AWS announces real-time video intelligence, speech and language tools

Amazon Web Services (AWS) announced three Amazon AI services at last year's re:Invent conference, each focusing on voice, text, and image recognition, respectively. This year, developers are introduced to Amazon Transcribe, Amazon Translate, Amazon Comprehend, and Amazon Rekognition Video. All four highly-accurate machine learning (ML) services are scalable and cost effective, allowing developers to create apps that unlock meaningful insights within data that are stored in an Amazon S3 data lake.

Starting today, developers can rely on Amazon Rekognition Video to perform video analysis – either in real time or batch processing – in their apps via the Rekognition API. Apart from detecting faces, celebrities, objects, and potentially inappropriate content, it can also track multiple people and compare faces for a wide range of use cases, such as missing person or background checks, even when they are partially obscured in the video. Amazon Rekognition Video also does automatic timestamp generation in different parts of the video to label faces, activities, time of day, location, and more. AWS will, of course, be adding new labels to the service as it learns from new images and videos daily.

Also available today is Amazon Comprehend, which leverages natural language understanding (NLU) and deep learning techniques to derive key insights from textual data. People, places, brands, key phrases, or sentiments expressed are extracted to  determine the meaning and relationships in the text. For example, this can be used to identify positive and negative product reviews, or to group a series of articles by similar subject matter. Comprehend continually learns from a wide range of information sources, while the integration with AWS Glue allows end-to-end analytics of documents and texts from other AWS data sources, such as Redshift, RDS, and DynamoDB.

Joining Amazon Polly are Amazon Transcribe and Amazon Translate. The former is an automatic speech recognition service that analyzes audio files in S3 – including MP3, WAV, and low fidelity audio – and outputs accurate, fully-punctuated text of the transcribed speech. The service currently supports English and Spanish, and is expected to recognize multiple speakers and support custom vocabulary and more languages in the near future. The latter, on the other hand, uses advanced neural machine translation to deliver more accurate and fluent translations between English and Arabic, French, German, Portugese, Simplified Chinese, and Spanish. More languages will be supported in 2018.

Amazon Transcribe and Amazon Translate are available in preview.