SpeakRight supports three types of grammars: external grammars (referenced by URL), built-in grammars, and inline grammars (which use a simplified GSL format). Grammars work much like prompts. You specify a grammar text, known as a gtext, that uses a simple formatting language:
- (no prefix). The grammar text is a URL. It can be an absolute or relative URL.
- inline: prefix. An inline grammar. The prefix is followed by a simplified version of GSL, such as "small medium [large big] (very large)".
- builtin: prefix. One of VoiceXML's built-in grammars. The prefix is followed by something like "digits?minlength=3;maxlength=9"
When a flow object is rendered, the grammars are rendered using a pipeline, that applies the following logic:
- check the grammar condition. If false then skip the grammar.
- parse an inline grammar into its word list
- parse the builtin grammar
- convert relative URLs into absolute URLs
The grammar text is a URL. It can be an absolute URL (eg. http://www.somecompany.com/speechapp7/grammars/fruits.grxml), or a relative URL. Relative URLs (eg. "grammars/fruits.grxml") are converted into absolute URLs when the grammar is rendered. The servlet's URL is currently used for this.
The grammar file extension is used to determine the type value for the grammar tag
- ".grxml" means type="application/srgs+xml"
- ".gsl" means type="text/gsl"
- all other files are assumed to be ABNF SRGS format, type="application/srgs"
Built-In Grammars
TDB. Built-ins are part of VoiceXML 2.0, but optional. They are also intended for prototyping, and it's recommended that applications use full, properly tuned grammars.
Inline Grammars
GSL is (I believe) a propietary Nuance format. SpeakRight uses a simplified version that currently only supports [ ] and ( ).
A single utterance can contain one or more slots. The simplest type of directed dialog VUIs use single slot questions, such as "How many passengers are there?". SR only supports single slot for now.
Grammar Types
There are two types of grammars (represented by the GrammarType enum).
- VOICE is for spoken input
- DTMF is for touchtone digits
DTMF Only Mode
Speech recognition may not work at all in very noisy environments. Not only will recognition fail, but prompts may never play due to false barge-in. For these reasons, speech applications should be able to fall back to a DTMF-only mode. This mode can be activated by the user by pressing a certain key, usually '*'. Once activated, SpeakRight will not render any VOICE grammars. Thus the VoiceXML engine will only listen for DTMF digits.
Slots
A grammar represents a series of words, such as "A large pizza please". The application may only care about a few of the words; here, the size word "large" is the only word of importance to the app. These words are attached tonamed return values called slots. In our pizza example, a slot called "size" would be bound to the words "small", "medium", or "large". Any of those words would fill the slot.
Slots define the interface between a grammar and a VoiceXML field. The field's name (shown below) defines a slot that grammar must fill in order for the field to be filled.
Any grammar that fills the slot "size" can be used.
A single utterance can fill multiple slots, as in "I would like to fly to Atlanta on Friday."
SpeakRight doesn't yet support multi-slot questions..
No comments:
Post a Comment