概要

本記事では、Azure Cognitive Servicesで提供されているSpeech serviceのText-to-Speechサービスを用いて、テキストの音声合成をおこなうWebアプリを構築する手順を説明します。WebアプリケーションフレームワークとしてASP.NET Core 5.0を使用します。

1. Text-to-Speechサービスの概要

1.1 Text-to-Speechサービスとは？

Text-to-Speechサービスは、Azure Cognitive ServicesのSpeech serviceが提供するサービスの1つで、テキストの音声合成ができるサービスです。Speech serviceはText-to-Speechの他にも、Speech-to-TextやVoice assistantsなどのサービスを提供しています。

Speech serviceの概要

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/overview

対応言語

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support

価格

https://azure.microsoft.com/ja-jp/pricing/details/cognitive-services/speech-services/

1.2 Text-to-Speechサービスの利用方法

Azureサブスクリプション内でSpeech serviceリソースを作成し、SDKおよびREST APIを通じてサービスを利用します。SDKはC#、C++、Java、Python、JavaScriptなどの言語で利用することができます。

Text-to-Speechのドキュメント

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/index-text-to-speech

2. 準備

Webアプリ構築の準備として.NETのインストールとSpeech serviceリソースの作成をおこないます。

2.1 .NETのインストール

開発環境に.NETをインストールします。本記事では .NET 5.0を使用します。

https://docs.microsoft.com/ja-jp/dotnet/core/install/

2.2 Speech serviceリソースの作成

(1) Azure Portal（https://portal.azure.com）にアクセスし、[リソースの作成]を選択します。

(2) 検索窓に speech と入力してEnterキーを押し、[音声]の[作成]、[音声]の順にクリックします。

(3) 名前、サブスクリプション、場所、価格レベル、リソースグループの各項目を入力して、[作成]をクリックします。本記事では場所を東日本、価格レベルをFree F0としています。

(4) 作成したSpeech serviceに移動し、リソース管理の[キーとエンドポイント]を表示します。後ほどアプリで利用するため、キー１および場所についてメモしておきます。

3. 音声合成アプリの構築

Speech SDK for JavaScriptを用いて、テキストの音声合成をおこなうWebアプリを構築します。Speech serviceのキー情報をサーバから取得し、入力されたテキストをSpeech serviceを用いて音声合成して読み上げます。

※ 本記事ではLinux上で開発をおこなっていますので、Windowsなど別OSで開発される場合は、NuGetパッケージのインストール方法などを適宜読み替えてください。

3.1 Webアプリの作成

(1) 最初にベースとなるWebアプリを作成して起動します。

$ dotnet new webapp -o TextToSpeech

$ cd TextToSpeech

$ dotnet run

(2) ブラウザを起動してアプリに接続します。接続先URLは https://localhost:5001 です。

(3) 端末でCtrl＋Cを押してアプリを終了します。

3.2 サーバ側の設定

(1) 必要なパッケージを追加します。端末で以下のコマンドを実行します。

$ dotnet add package Newtonsoft.Json

$ dotnet add package Microsoft.CognitiveServices.Speech

(2) appsettings.jsonにキー情報を記載します。2.2 (4) でメモしたキー１および場所をそれぞれ記載します。

appsettings.json

{

"Logging": {

"LogLevel": {

"Default": "Information",

"Microsoft": "Warning",

"Microsoft.Hosting.Lifetime": "Information"

}

"AzureSpeech": {

"Key": "Your Key",

"Region": "Your Region"

"AllowedHosts": "*"

}

(3) Speech serviceを扱うためのSpeechクラスを作成します。

$ mkdir Services

$ touch Services/Speech.cs

Services/Speech.cs

using System.Threading.Tasks;

using Microsoft.Extensions.Logging;

using Microsoft.Extensions.Configuration;

using Microsoft.CognitiveServices.Speech;

namespace TextToSpeech.Service

{

public interface ISpeech

{

object GetCredential();

}

public class Speech : ISpeech

{

private readonly ILogger<Speech> _logger;

private readonly SpeechConfig _speechConfig;

public Speech(ILogger<Speech> logger, IConfiguration configuration)

{

_logger = logger;

// appsettings.jsonからSpeechの資格情報を取得

var configs = configuration.GetSection("AzureSpeech");

_speechConfig = SpeechConfig.FromSubscription(configs["Key"], configs["Region"]);

}

public object GetCredential()

{

return new { Key = _speechConfig.SubscriptionKey, Region = _speechConfig.Region };

}

(4) Startup.csのConfigureServicesにおいて依存性の注入をおこないます。

Startup.cs

～省略～

using Microsoft.Extensions.DependencyInjection;

using Microsoft.Extensions.Hosting;

using TextToSpeech.Service;

namespace TextToSpeech

{

～省略～

public void ConfigureServices(IServiceCollection services)

{

services.AddRazorPages();

services.AddScoped<ISpeech, Speech>();

}

～省略～

(5) Index.cshtml.csを修正します。

Pages/Index.cshtml.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Threading.Tasks;

using Microsoft.AspNetCore.Mvc;

using Microsoft.AspNetCore.Mvc.RazorPages;

using Microsoft.Extensions.Logging;

using TextToSpeech.Service;

namespace TextToSpeech.Pages

{

public class IndexModel : PageModel

{

private readonly ILogger<IndexModel> _logger;

private readonly ISpeech _speech;

public string CredStr;

public IndexModel(ILogger<IndexModel> logger, ISpeech speech)

{

_logger = logger;

_speech = speech;

var cred = _speech.GetCredential();

CredStr = Newtonsoft.Json.JsonConvert.SerializeObject(cred);

}

public void OnGet()

{

}

3.3 クライアント側の設定

(1) Speech SDK for JavaScriptをダウンロード（https://aka.ms/csspeech/jsbrowserpackage）してwwwroot/lib/speech/ ディレクトリに展開します。

$ cd ~/Downloads

$ unzip SpeechSDK-JavaScript-1.17.0.zip

$ cd –

$ cp -r ~/Downloads/SpeechSDK-JavaScript-1.17.0 wwwroot/lib/speech

(2) JavaScriptファイルを作成します。

$ touch wwwroot/js/speech.js

wwwroot/js/speech.js

'use strict';

let synthesizer = null;

let inText = '';

let language = 'ja-JP';

function synthesizeStart() {

// synthesizerの初期化

const speechConfig = SpeechSDK.SpeechConfig.fromSubscription(cred.Key, cred.Region);

speechConfig.speechSynthesisLanguage = language;

const audioConfig = SpeechSDK.AudioConfig.fromDefaultSpeakerOutput();

synthesizer = new SpeechSDK.SpeechSynthesizer(speechConfig, audioConfig);

inText = $('#intext').val();

$('#notice').text('');

$('#spkbtn').attr('disabled', true);

// 音声合成の開始

synthesizer.speakTextAsync(

inText,

result => {

if (result.reason === SpeechSDK.ResultReason.SynthesizingAudioCompleted) {

$('#notice').text('音声合成が完了しました');

} else {

$('#notice').text(result.errorDetails);

}

synthesizer.close();

$('#spkbtn').attr('disabled', false);

error => {

$('#notice').text(error);

synthesizer.close();

$('#spkbtn').attr('disabled', false);

}

)

}

// ボタンクリック時の処理

$('#spkbtn').click(function () {

synthesizeStart();

});

(3) Index.cshtmlを以下のように変更します。

Pages/Index.cshtml

@page

@model IndexModel

ViewData["Title"] = "Text to speech";

}

<div>

<label>スピーカー出力</label>

<button type="button" id="spkbtn" class="btn btn-primary btn-block">Start</button>

</div>

<p>入力テキスト <span id="notice" class="text-danger"></span></p>

</div>

</div>

@section Scripts{

let cred = @Html.Raw(Model.CredStr);

</script>

}

3.4 Webアプリを起動してテストします。

(1) アプリを起動してブラウザから接続します。接続先URLは https://localhost:5001 です。

$ dotnet run

(2) 入力テキスト欄に適当な文言を入力し、Startボタンを押して音声合成を開始します。

(3) 端末でCtrl＋Cを押してアプリを終了します。

3.5 解説

(1) サーバ側の設定では、Speechクラスを定義してcsにおいて依存性の注入をおこなこうことで、IndexModelからSpeechインスタンスにアクセスできるようにしています。IndexModelではSpeech serviceのキー情報を取得して文字列化し、ブラウザ側から参照できるようにCredStrにキー情報を格納しています。

依存性の注入について

https://docs.microsoft.com/ja-jp/aspnet/core/fundamentals/dependency-injection?view=aspnetcore-5.0

(2) クライアント側の設定では、CredStrキー情報を取得して、synthesizerの初期化をおこないsynthesispeakTextAsync()メソッドで音声合成を開始しています。synthesizer.speakTextAsync()メソッドには入力テキストの他に、終了時の処理とエラー発生時の処理を引数として与えています。

Speech SDK for JavaScript

https://docs.microsoft.com/ja-jp/javascript/api/microsoft-cognitiveservices-speech-sdk/?view=azure-node-latest

Speech SDK for JavaScriptのセットアップ

https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstarts/setup-platform?tabs=dotnet%2Clinux%2Cjre%2Cbrowser&pivots=programming-language-javascript

4. まとめ

本記事では、Speech serviceのText-to-Speechサービスを用いて、テキストの音声合成をおこなうASP.NET Core Webアプリを構築する手順を説明しました。本記事では実装していませんが、音声のWAVファイルへの出力や声質（話者）の変更をおこなうことも可能です。デフォルトの設定でも自然な日本語を合成することができますが、必要な場合は音声合成マークアップ言語（SSML）を用いることで速度の調整や発音の改善などをおこなうこともできます。

～～連載記事一覧～～

【第２回】 Azure Cognitive Services Speech serviceのText-to-Speechサービスを用いて音声合成をおこなう

概要

目次

Text-to-Speechサービスの概要

準備

音声合成アプリの構築

まとめ

1. Text-to-Speechサービスの概要

1.1 Text-to-Speechサービスとは？

1.2 Text-to-Speechサービスの利用方法

2. 準備

2.1 .NETのインストール

2.2 Speech serviceリソースの作成

3. 音声合成アプリの構築

3.1 Webアプリの作成

3.2 サーバ側の設定

3.3 クライアント側の設定

3.4 Webアプリを起動してテストします。

3.5 解説

4. まとめ

～～連載記事一覧～～

概要

目次

Text-to-Speechサービスの概要

準備

音声合成アプリの構築

まとめ

1. Text-to-Speechサービスの概要

1.1 Text-to-Speechサービスとは？

1.2 Text-to-Speechサービスの利用方法

2. 準備

2.1 .NETのインストール

2.2 Speech serviceリソースの作成

3. 音声合成アプリの構築

3.1 Webアプリの作成

3.2 サーバ側の設定

3.3 クライアント側の設定

3.4 Webアプリを起動してテストします。

3.5 解説

4. まとめ

～～連載記事一覧～～

関連記事