Documentation
Text-To-Speech
Along with playing wav files, you can also use text-to-speech. There is currently two implementations for TTS:
- MicroSoft Azure Speech
- Google Cloud Text-to-Speech
Setup
Modify your voice.properties and add your Speech properties:
# Azure Text-To-Speech
tts.azure.subscriptionKey=PutYourSubscriptionKeyHere
tts.azure.region=PutYourRegionHere
tts.azure.voice=en-US-JennyNeural
# Or Google Text-To-Speech
tts.google.credentials.path=YourCredentialFileHere.json
tts.google.gender=Female
tts.google.languageCode=en-US
tts.google.name=
Alternatively, you can also add Speech properties this way:
// For Azure
var sipVoiceProperties = new SipVoiceProperties(loggerFactory)
{
TtsAzureSubscriptionKey = "",
TtsAzureRegion = ""
TtsAzureVoice = "en-US-JennyNeural"
};
// for Google
var sipVoiceProperties = new SipVoiceProperties(loggerFactory)
{
TtsGoogleCredentialsPath = "",
TtsGoogleGender = ""
TtsGoogleLanguageCode = ""
TtsGoogleName = ""
};
Next you need to instantiate your text-to-speech factory and inect it into the LineManager.
// set up azure TTS
var ttsFactory = new AzureTtsFactory(loggerFactory, sipVoiceProperties);
// or set up Google TTS
var ttsFactory = new GoogleTtsFactory(loggerFactory, sipVoiceProperties);
// Inject the tts factory into the line manager
using var lineManager = new LineManager(loggerFactory, sipVoiceProperties, sipPlugin, ttsFactory);
PlayTextToSpeech
This method will play the message as a wav stream.
var message = "Hello world.";
await line.PlayTextToSpeechAsync(message, cancellationToken);
Text-To-Speech Caching
TTS caching provides a mechanism for saving the generated audio to a WAV file. TTS generation can be costly and caching provides a way to limit the TTS generation. If the WAV file exists and the text hasn't changed, no TTS generation will happen. Instead, the WAV file will be used.
The difference between using PlayFile vs PlayTextToSpeech with ITextToSpeechCache:
- PlayFile will generate a WAV and use the filename to play the WAV file. This version supports older Dialogic plugins. ITextToSpeechCache objects must have a WAV file name defined because PlayFile requires one.
- PlayTextToSpeech will still generate the WAV file for caching purposes but will pass the WAV stream to the PlayWavStream method. PlayWavStream is only support with the SipSorcery plugin. ITextToSpeechCache objects do NOT need a WAV file name defined but you will not be partaking in the Caching part.
// get a reference to the TTS caching factory
var ttsCacheFactory = line.TextToSpeechCacheFactory;
// generate the TTS Cache
var ttsCache = ttsCacheFactory.Create("Thank you for using the IVR Toolkit.", $"{WAV_FILE_LOCATION}/ThankYou.wav");
// pass the cache to PlayFileAsync
await line.PlayFileAsync(ttsCache, cancellationToken);
// alternatively pass the cache to this method
await line.PlayTextToSpeechAsync(ttsCache, cancellationToken);
The message also supports Azure Speech Synthesis Markup Language (SSML). Common interpret-as values:
Value | Description |
cardinal | Reads numbers as cardinal numbers (e.g., 123 → "one hundred twenty-three"). |
ordinal | Reads numbers as ordinal numbers (e.g., 1st → "first"). |
characters | Reads each character separately (e.g., ABC → "A B C"). |
spell-out | Spells out the word letter by letter. |
date | Reads text as a date (formats: YYYY/MM/DD, MM-DD-YYYY, etc.). |
time | Reads text as a time (e.g., 14:30 → "two thirty PM"). |
telephone | Reads a phone number properly. |
address | Reads an address naturally. |
digits | Reads numbers as individual digits (e.g., 123 → "one two three"). |
fraction | Reads fractions properly (e.g., ½ → "one-half"). |
unit | Reads units of measurement (e.g., 10kg → "ten kilograms"). |
expletive | Censors explicit words (e.g., "damn" → "d*"**). |
var message = "Thank you for using the <say-as interpret-as='characters'>IVR</say-as> Toolkit.";
await line.PlayTextToSpeechAsync(message, cancellationToken);
Prompt
To use Text-To-Speech with Prompt or MultiTryPrompt, you use one of the method overloads that uses ITextToSpeechCache. Specify null for the fileName if you don't want to cache the TTS audio.
// get a reference to the TTS caching factory
var ttsCacheFactory = line.TextToSpeechCacheFactory;
// generate the TTS Cache
var ttsCache = ttsCacheFactory.Create("For this simple demonstration, press <say-as interpret-as='characters'>1234</say-as> followed by the pound key.", $"{WAV_FILE_LOCATION}/Press1234.wav");
// play tts and wait for digits to be pressed
var result = await line.PromptAsync(ttsCache, cancellationToken);
This method will always do TTS because the fileName was not specified
// get a reference to the TTS caching factory
var ttsCacheFactory = line.TextToSpeechCacheFactory;
// generate the TTS Cache - do not specity the fileName path
var ttsCache = ttsCacheFactory.Create("For this simple demonstration, press <say-as interpret-as='characters'>1234</say-as> followed by the pound key.");
// play tts and wait for digits to be pressed
var result = await line.PromptAsync(ttsCache, cancellationToken);