πŸ“š SSML Documentation

Complete guide to Speech Synthesis Markup Language for advanced voice control

😊 Emotions & Styles

Add emotional expression to your speech with the mstts:express-as tag.

Syntax:

<mstts:express-as style="STYLE" styledegree="1-2"> Your text here </mstts:express-as>

Available Styles:

whispering

Soft, quiet whisper tone

<mstts:express-as style="whispering" styledegree="2"> I have a secret </mstts:express-as>

cheerful

Happy, upbeat tone

<mstts:express-as style="cheerful"> What a wonderful day! </mstts:express-as>

sad

Sorrowful, melancholic tone

<mstts:express-as style="sad" styledegree="1.5"> I miss you so much </mstts:express-as>

angry

Angry, forceful tone

<mstts:express-as style="angry" styledegree="2"> How could you do this?! </mstts:express-as>

Other styles: excited, friendly, hopeful, terrified, shouting, unfriendly

styledegree (optional): Controls emotion intensity from 0.01 (very subtle) to 2.0 (very strong). Default is 1.0 when omitted.

⏸️ Pauses & Breaks

Add silence between words or sentences with the break tag.

Syntax:

Hello <break time="500ms"/> World Or use strength: Hello <break strength="medium"/> World

Time Values:

  • β€’ 250ms - Quarter second
  • β€’ 500ms - Half second
  • β€’ 1s - One second
  • β€’ 2s - Two seconds

Strength Values:

  • β€’ none - No pause
  • β€’ x-weak - Extra weak
  • β€’ weak, medium, strong
  • β€’ x-strong - Extra strong

🎚️ Prosody (Voice Modulation)

Control speaking rate, pitch, and volume with the prosody tag.

Syntax:

<prosody rate="RATE" pitch="PITCH" volume="VOLUME"> Your text here </prosody>

Rate (Speed)

Predefined:

  • β€’ x-slow, slow
  • β€’ medium (default)
  • β€’ fast, x-fast

Percentage:

rate="-50%" to "+100%"

Pitch

Predefined:

  • β€’ x-low, low
  • β€’ medium (default)
  • β€’ high, x-high

Percentage:

pitch="-50%" to "+50%"

Volume

Predefined:

  • β€’ silent, x-soft, soft
  • β€’ medium (default)
  • β€’ loud, x-loud

Decibels:

volume="-10dB" to "+10dB"

Example:

<prosody rate="slow" pitch="-20%" volume="soft"> This is spoken slowly, with lower pitch, and quietly. </prosody>

✨ Complete Examples

Emotional Storytelling

<speak> <voice name="en-US-JennyNeural"> <mstts:express-as style="sad"> I can't believe you're gone. </mstts:express-as> <break time="700ms"/> <mstts:express-as style="hopeful"> But I know we'll meet again someday. </mstts:express-as> </voice> </speak>

Dramatic Reading

<speak> <voice name="en-US-AriaNeural"> <mstts:express-as style="whispering"> The door creaked open slowly. </mstts:express-as> <break time="1s"/> <mstts:express-as style="terrified" styledegree="2"> Someone was inside! </mstts:express-as> </voice> </speak>

Voice Modulation

<speak> <voice name="en-US-GuyNeural"> <prosody rate="fast" pitch="+10%"> This is read quickly with a higher pitch! </prosody> <break time="500ms"/> <prosody rate="slow" pitch="-20%"> Now it's slow and deep. </prosody> </voice> </speak>