Googleが言語モデルを使わずに高精度な自動音声認識を可能にする「SpecAugment」を開発

機械学習・人工知能(AI)分野で第一線を走るGoogleは、Cloud Speech-to-Textのように、音声を自動で認識してテキストに変換する自動音声認識技術を研究しています。Google AIの研究者が、言語モデルを使用せずに最先端の自動音声認識モデルのパフォーマンスを向上させる技術「SpecAugment」を開発したと発表しました。

[1904.08779] SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
https://arxiv.org/abs/1904.08779

Google AI Blog: SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition
https://ai.googleblog.com/2019/04/specaugment-new-data-augmentation.html

Google's SpecAugment achieves state-of-the-art speech recognition without a language model | VentureBeat
https://venturebeat.com/2019/04/22/googles-specaugment-achieves-state-of-the-art-speech-recognition-without-a-language-model/

言語モデルとは、言語において単語と単語の関係を数学的に表したもの。本来であればただの音でしかない音声を、「単語列に対してどういう単語が来るか」を学習することで、意味のある文章に変換できるというわけです。そのため、自動音声認識を可能にするAIは言語モデルに基づいて訓練が行われる必要があります。

☆出典は：

世界の新製品とビジネストレンド情報