Skip to content

About different Microsoft speech systems

gexgd0419 edited this page Feb 27, 2024 · 1 revision

Microsoft has created a few speech systems over the years.

SAPI 5

Microsoft Speech API version 5, or SAPI 5, has been the default built-in Windows speech system since Windows XP. The default voice has been changed and updated several times: Microsoft Sam on Windows XP, Microsoft Anna on Windows 7, and Microsoft Zira Desktop on Windows 10 and later. But these are all SAPI 5 voices.

You can check your installed SAPI 5 voices in Control Panel > Speech (Windows XP), or Control Panel > Speech Recognition > Text to Speech (Windows Vista and later). If you are using Windows 8 and later versions, make sure to look for the "old" Control Panel, instead of the "new" System Settings.

SAPI 5 works on a wide range of Windows versions, including the latest Windows 11. It is also extensible: developers can make third-party SAPI 5 TTS engines/voices, users can install voices from Microsoft or third parties, and all applications can utilize these installed voices.

Windows Runtime / OneCore speech

There's a new speech system built into Windows 10/11. It's originally for the Windows Mobile/Phone, but then it also appears on the desktop Windows 10/11 systems to provide speech support for UWP apps. The registry key for this speech system is called Speech_OneCore, so here I will refer to it as the "OneCore speech".

To check the OneCore voices, go to the "new" System Settings > Time & language > Speech, where it lets you choose a default voice and preview each voice, just like the Speech Properties dialog from Control Panel. But those are different: voices listed in the System Settings are OneCore voices, while voices listed in the Control Panel dialog are SAPI 5 voices.

UWP apps can only use OneCore voices, not SAPI 5 voices. Desktop apps can use both, but as they are different systems and provide different APIs, if you want to use both, you will need to write code for both.

Some Microsoft voices has two versions for both systems: Microsoft Zira as a OneCore voice, and Microsoft Zira Desktop as a SAPI 5 voice.

As for the extensibility, Microsoft says the following in the Windows Runtime SpeechSynthesizer documentation:

Only Microsoft-signed voices installed on the system can be used to generate speech.

So third-party WinRT voices, at least those not signed by Microsoft, just cannot work.