Documentation

Text-To-Speech

Along with playing wav files, you can also use text-to-speech. There is currently two implementations for TTS:

  • MicroSoft Azure Speech
  • Google Cloud Text-to-Speech

Setup

Modify your voice.properties and add your Speech properties:

# Azure Text-To-Speech
tts.azure.subscriptionKey=PutYourSubscriptionKeyHere
tts.azure.region=PutYourRegionHere
tts.azure.voice=en-US-JennyNeural

# Or Google Text-To-Speech
tts.google.credentials.path=YourCredentialFileHere.json
tts.google.gender=Female
tts.google.languageCode=en-US
tts.google.name=

Alternatively, you can also add Speech properties this way:

// For Azure
var sipVoiceProperties = new SipVoiceProperties(loggerFactory)
{
    TtsAzureSubscriptionKey = "",
    TtsAzureRegion = ""
    TtsAzureVoice = "en-US-JennyNeural"
};

// for Google
var sipVoiceProperties = new SipVoiceProperties(loggerFactory)
{
    TtsGoogleCredentialsPath = "",
    TtsGoogleGender = ""
    TtsGoogleLanguageCode = ""
    TtsGoogleName = ""
};

Next you need to instantiate your text-to-speech factory and inect it into the LineManager.

// set up azure TTS
var ttsFactory = new AzureTtsFactory(loggerFactory, sipVoiceProperties);
// or set up Google TTS
var ttsFactory = new GoogleTtsFactory(loggerFactory, sipVoiceProperties);

// Inject the tts factory into the line manager
using var lineManager = new LineManager(loggerFactory, sipVoiceProperties, sipPlugin, ttsFactory);

PlayTextToSpeech

This method will play the message as a wav stream.

var message = "Hello world.";
await line.PlayTextToSpeechAsync(message, cancellationToken);

Text-To-Speech Caching

TTS caching provides a mechanism for saving the generated audio to a WAV file. TTS generation can be costly and caching provides a way to limit the TTS generation. If the WAV file exists and the text hasn't changed, no TTS generation will happen. Instead, the WAV file will be used.

The difference between using PlayFile vs PlayTextToSpeech with ITextToSpeechCache:

  • PlayFile will generate a WAV and use the filename to play the WAV file. This version supports older Dialogic plugins. ITextToSpeechCache objects must have a WAV file name defined because PlayFile requires one.
  • PlayTextToSpeech will still generate the WAV file for caching purposes but will pass the WAV stream to the PlayWavStream method. PlayWavStream is only support with the SipSorcery plugin. ITextToSpeechCache objects do NOT need a WAV file name defined but you will not be partaking in the Caching part.

// get a reference to the TTS caching factory 
var ttsCacheFactory = line.TextToSpeechCacheFactory;

// generate the TTS Cache
var ttsCache = ttsCacheFactory.Create("Thank you for using the IVR Toolkit.", $"{WAV_FILE_LOCATION}/ThankYou.wav");

// pass the cache to PlayFileAsync
await line.PlayFileAsync(ttsCache, cancellationToken);

// alternatively pass the cache to this method
await line.PlayTextToSpeechAsync(ttsCache, cancellationToken);

The message also supports Azure Speech Synthesis Markup Language (SSML). Common interpret-as values:

Value Description
cardinal Reads numbers as cardinal numbers (e.g., 123 → "one hundred twenty-three").
ordinal Reads numbers as ordinal numbers (e.g., 1st → "first").
characters Reads each character separately (e.g., ABC → "A B C").
spell-out Spells out the word letter by letter.
date Reads text as a date (formats: YYYY/MM/DD, MM-DD-YYYY, etc.).
time Reads text as a time (e.g., 14:30 → "two thirty PM").
telephone Reads a phone number properly.
address Reads an address naturally.
digits Reads numbers as individual digits (e.g., 123 → "one two three").
fraction Reads fractions properly (e.g., ½ → "one-half").
unit Reads units of measurement (e.g., 10kg → "ten kilograms").
expletive Censors explicit words (e.g., "damn" → "d*"**).

var message = "Thank you for using the <say-as interpret-as='characters'>IVR</say-as> Toolkit.";
await line.PlayTextToSpeechAsync(message, cancellationToken);

Prompt

To use Text-To-Speech with Prompt or MultiTryPrompt, you use one of the method overloads that uses ITextToSpeechCache. Specify null for the fileName if you don't want to cache the TTS audio.

// get a reference to the TTS caching factory 
var ttsCacheFactory = line.TextToSpeechCacheFactory;

// generate the TTS Cache
var ttsCache = ttsCacheFactory.Create("For this simple demonstration, press <say-as interpret-as='characters'>1234</say-as> followed by the pound key.", $"{WAV_FILE_LOCATION}/Press1234.wav");

// play tts and wait for digits to be pressed
var result = await line.PromptAsync(ttsCache, cancellationToken);

This method will always do TTS because the fileName was not specified

// get a reference to the TTS caching factory 
var ttsCacheFactory = line.TextToSpeechCacheFactory;

// generate the TTS Cache - do not specity the fileName path
var ttsCache = ttsCacheFactory.Create("For this simple demonstration, press <say-as interpret-as='characters'>1234</say-as> followed by the pound key.");

// play tts and wait for digits to be pressed
var result = await line.PromptAsync(ttsCache, cancellationToken);