What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents

Sunday, February 18, 2007


Grammars define the actual spoken phrases that will be recognized. The grammar defines the return values (called slots). Grammars are an important abstraction layer in SpeakRight, because they abstract away user input values from how that input is exactly specified. Synonyms can map to the same user input value. Both "Los Angeles" and "LA" could map to the result input value city="Los Angeles". Multi-lingual apps use this feature; the spoken phrases are in the target language but the results are in the default language (usually English).

SpeakRight supports three types of grammars: external grammars (referenced by URL), built-in grammars, and inline grammars (which use a simplified GSL format). Grammars work much like prompts. You specify a grammar text, known as a gtext, that uses a simple formatting language:
  • (no prefix). The grammar text is a URL. It can be an absolute or relative URL.
  • inline: prefix. An inline grammar. The prefix is followed by a simplified version of GSL, such as "small medium [large big] (very large)".
  • builtin: prefix. One of VoiceXML's built-in grammars. The prefix is followed by something like "digits?minlength=3;maxlength=9"
Grammars, like prompts, can have a condition. Currently the only condition is DTMFOnlyMode (explained below).

When a flow object is rendered, the grammars are rendered using a pipeline, that applies the following logic:
  • check the grammar condition. If false then skip the grammar.
  • parse an inline grammar into its word list
  • parse the builtin grammar
  • convert relative URLs into absolute URLs
External Grammars

The grammar text is a URL. It can be an absolute URL (eg. http://www.somecompany.com/speechapp7/grammars/fruits.grxml), or a relative URL. Relative URLs (eg. "grammars/fruits.grxml") are converted into absolute URLs when the grammar is rendered. The servlet's URL is currently used for this.

The grammar file extension is used to determine the type value for the grammar tag
  • ".grxml" means type="application/srgs+xml"
  • ".gsl" means type="text/gsl"
  • all other files are assumed to be ABNF SRGS format, type="application/srgs"
A Grammar Editor is helpful, such as the wonderful GRXML editor that comes with the (free) Microsoft SASDK.

Built-In Grammars

TDB. Built-ins are part of VoiceXML 2.0, but optional. They are also intended for prototyping, and it's recommended that applications use full, properly tuned grammars.

Inline Grammars

GSL is (I believe) a propietary Nuance format. SpeakRight uses a simplified version that currently only supports [ ] and ( ).
A single utterance can contain one or more slots. The simplest type of directed dialog VUIs use single slot questions, such as "How many passengers are there?". SR only supports single slot for now.

Grammar Types

There are two types of grammars (represented by the GrammarType enum).
  • VOICE is for spoken input
  • DTMF is for touchtone digits
The SpeakRight class Question holds up to two grammars, one of each type.

DTMF Only Mode
Speech recognition may not work at all in very noisy environments. Not only will recognition fail, but prompts may never play due to false barge-in. For these reasons, speech applications should be able to fall back to a DTMF-only mode. This mode can be activated by the user by pressing a certain key, usually '*'. Once activated, SpeakRight will not render any VOICE grammars. Thus the VoiceXML engine will only listen for DTMF digits.

A grammar represents a series of words, such as "A large pizza please". The application may only care about a few of the words; here, the size word "large" is the only word of importance to the app. These words are attached tonamed return values called slots. In our pizza example, a slot called "size" would be bound to the words "small", "medium", or "large". Any of those words would fill the slot.

Slots define the interface between a grammar and a VoiceXML field. The field's name (shown below) defines a slot that grammar must fill in order for the field to be filled.

Any grammar that fills the slot "size" can be used.

A single utterance can fill multiple slots, as in "I would like to fly to Atlanta on Friday."
SpeakRight doesn't yet support multi-slot questions..

No comments: