What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents

Sunday, February 18, 2007


A prompt text, known as a ptext, defines one or more items using a simple formatting scheme.
Here is a basic prompt that plays some text using TTS (text-to-speech):

"Welcome to Inky's Travel".

PTexts are Java strings. Here's another prompt:

"Welcome to Inky's Travel. {audio:logo.wav}"

This prompt contains two items: a TTS phrase and an audio file. Items are delimited by '{' and '}'. The delimiters are optional for the first item. This is equivalent:

"{Welcome to Inky's Travel. }{audio:logo.wav}"

PTexts can contains as many items as you want. They will be rendered as a prompt tag, (or possibly as a nomatch or noinput tag)

<prompt>Welcome to Inky's Travel. <audio src="http://myIPaddress/logo.wav"></audio>

For convenience, audio items can be specified without the "audio:" prefix. The following is equivalent to the previous prompt. The prefix is optional if the filename ends in ".wav" and contains no whitespace characters.

"{Welcome to Inky's Travel. }{logo.wav}"

You can add pauses as well using "." inside an item. Each period represents 250 msec. Pause items contain only periods (otherwise they're considered as TTS). Here's a 750 msec pause.

"{Welcome to Inky's Travel. }{...}{logo.wav}"

Model variables can be prompt items by using a "$M." prefix. The value of the model is rendered.

"The current price is {$M.price}"

The most recent user input can also be played back, like this:

"You chose {$INPUT}"

fields (aka. member variables) of a flow object can be items by wrapping them in '%'. If a flow class has a member variable: int m_numPassengers; then you can play this value in a prompt like this:

"There are {%numPassengers%} passengers"

If you're familiar with SSML then you can use raw prompt items, that have a "raw:" prefix. These are output as it, and can contain SSML tags.

"That's a <emphasis>big</emphasis> order!"

Lastly, there are id prompt items, which are references to an external prompt string in an XML file. This is useful for multi-lingual apps, or for changing prompts after deployment. See Prompt Ids and Prompt XML Files.


Let's summarize. There are seven types of prompt items:
  • "audio:" audio prompts
  • "M$." model values
  • "%value%" field values (of currently executing flow object)
  • ".." pause (250 msec for each period)
  • "raw:" raw SSML prompts
  • "id:" id prompts
  • TTS prompt (any prompt item that doesn't match one of the above types is played as TTS)

Prompt Conditions

By default, all the prompts in a flow object are played. However there are occasions when the playing of a prompt needs to be controlled by a condition. Conditions are evaluated when the flow object is executed; if the condition returns false the prompt is not played.

Condition Description
none always play prompt
PlayOnce only play the first time the flow is executed. If the flow is re-executed (the same flow object executes more than once in a row), don't play the prompt. PlayOnce are useful in menus where the initial prompt may contain information that should only be played once.
PlayOnceEver only play once during the entire session (phone call).
PlayIfEmpty only play if the given model variable is empty (""). Useful if you want to play a prompt as long as something has not yet occured.
PlayIfNotEmpty only play if the given model variable is not empty ("")

Prompt Rendering

Prompts are rendered using a pipeline of steps. The order of the steps has been chosen to maximize usefulness.

Step Description
1 Apply condition. If false then return
2 Resolve ids. Read external XML file and replace each prompt id with its specified prompt text
3 Evaluate model values
4 Call fixup handlers in the flow objects. The IFlow method fixupPrompt allows a flow object to tweak TTS prompt items
5 Merge runs of of TTS items into a single item
6 Do audio matching. An external XML file defines TTS text for which an audio file exists. The text is replaced with the audio file.

The result is a list of TTS and/or audio items that are sent to the page writer.

Audio matching

Audio matching is a technique that lets you use TTS for the initial speech app development. Once the app is fairly stable, record audio files for all the prompts. Then you ceate an audio matching xml file that lists each audio file and the prompt text it replaces. Now when the SpeakRight application runs, matching text is automatically replaced with the audio file. No source code changes are required.

The match is a soft match that ignores case and punctuation. That is a prompt item "Dr. Smith lives on Maple Dr." would match an audio-match text of "dr smith lives on maple dr".

Audio matching works at the item level. Do we need to suport some tag for spanning multi items???

No comments: