Complete guide to Speech Synthesis Markup Language for advanced voice control
Add emotional expression to your speech with the mstts:express-as tag.
<mstts:express-as style="STYLE" styledegree="1-2">
Your text here
</mstts:express-as>
Soft, quiet whisper tone
<mstts:express-as style="whispering" styledegree="2">
I have a secret
</mstts:express-as>
Happy, upbeat tone
<mstts:express-as style="cheerful">
What a wonderful day!
</mstts:express-as>
Sorrowful, melancholic tone
<mstts:express-as style="sad" styledegree="1.5">
I miss you so much
</mstts:express-as>
Angry, forceful tone
<mstts:express-as style="angry" styledegree="2">
How could you do this?!
</mstts:express-as>
Other styles: excited, friendly, hopeful, terrified, shouting, unfriendly
styledegree (optional): Controls emotion intensity from 0.01 (very subtle) to 2.0 (very strong). Default is 1.0 when omitted.
Add silence between words or sentences with the break tag.
Hello <break time="500ms"/> World
Or use strength:
Hello <break strength="medium"/> World
250ms - Quarter second500ms - Half second1s - One second2s - Two secondsnone - No pausex-weak - Extra weakweak, medium, strongx-strong - Extra strongControl speaking rate, pitch, and volume with the prosody tag.
<prosody rate="RATE" pitch="PITCH" volume="VOLUME">
Your text here
</prosody>
Predefined:
Percentage:
rate="-50%" to "+100%"
Predefined:
Percentage:
pitch="-50%" to "+50%"
Predefined:
Decibels:
volume="-10dB" to "+10dB"
<prosody rate="slow" pitch="-20%" volume="soft">
This is spoken slowly, with lower pitch, and quietly.
</prosody>
<speak>
<voice name="en-US-JennyNeural">
<mstts:express-as style="sad">
I can't believe you're gone.
</mstts:express-as>
<break time="700ms"/>
<mstts:express-as style="hopeful">
But I know we'll meet again someday.
</mstts:express-as>
</voice>
</speak>
<speak>
<voice name="en-US-AriaNeural">
<mstts:express-as style="whispering">
The door creaked open slowly.
</mstts:express-as>
<break time="1s"/>
<mstts:express-as style="terrified" styledegree="2">
Someone was inside!
</mstts:express-as>
</voice>
</speak>
<speak>
<voice name="en-US-GuyNeural">
<prosody rate="fast" pitch="+10%">
This is read quickly with a higher pitch!
</prosody>
<break time="500ms"/>
<prosody rate="slow" pitch="-20%">
Now it's slow and deep.
</prosody>
</voice>
</speak>