What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents

Tuesday, December 4, 2007

The StringTemplate Template Engine

The StringTemplate template engine is a popular choice for generating markup text in Java. It comes from Terrence Parr, the inventor of ANTLR.

SpeakRight uses StringTemplate (ST) for all its VoiceXML generation. When a flow object is rendered, it is first converted into a SpeechPage object. A SpeechPage is not VoiceXML-specific, and allows SpeakRight to output other formats such as SALT, or whatever you want. It's also the glue that ST requires. SpeechPages are rendered using one of the ISpeechPageWriter classes. For testing, an HTML page writer is available. The main page writer though is VoiceXMLPageWriter.

VoiceXMLPageWriter into VoiceXML. A StringTemplate file defines the format for prompts, grammars, fields, forms and other VoiceXML tags. This gives a lot of flexibility. If your VoiceXML platform has special requirements, simply modify the speakright.stg template file.

Matt Raible on web frameworks

Matt Raible has a fascinating video comparing web frameworks. Comparisons are tricky since frameworks are changing rapidly, with multiple releases per year. However, he makes an interesting aside about the (lack of) value of visual IDEs. JSF comes with a drag-and-drop IDE that is "appealing to managers", but "if one wants to develop anything substantial, we're going to have to get down and dirty with the code."

I've been espousing a code-based approach for speech applications for while. Indeed that's the whole premise of the SpeakRight framework. Any substantial app will use dynamically generated code, and not be pages of handwritten markup text.

Things are a bit simpler for speech applications. The following criteria for comparing web frameworks don't apply

  • Bookmarkable URLs.
  • Avoiding the double-POST problem
  • AJAX
  • Massive scalability. Web applications may involve millions of users but speech apps are still orders of magnitude smaller.
  • Page decoration. The vast topic of graphical design doesn't exist in a speech app. Persona is as close as one gets to "decoration".

Monday, December 3, 2007

NBest

NBest is a very useful feature for handling similar sounding words. Normally a speech rec engine finds the grammar rule that is the best match for the user's speech utterance. Large grammars can suffer from substitution errors where the wrong rule is matched: caller says "Boston" but the engine selects "Austin". NBest helps the application sort out this type of ambiguity.

When enabled, NBest is a request to the speech rec engine to return the top N matches, sorted in order of decreasing confidence level. N is usually a small number, such as 4. Remember that the NBest value is a maximum; fewer results may be returned.

In SpeakRight, NBest is enabled using the QuestionFlow method enableNBest.


flow.enableNBest(4); //up to 4 results


When the SRResults come back for that question, you can check for NBest results. The SRResults method hasNBest which indicates that more than one result was returned.

NBest Pruning
The simplest thing an application can do, is check the NBest results in validateInput, and use additional application logic to select the most likely result. This is called NBest pruning. For example, if the user is asked for her account number, each result can be checked against the database. If only one result is a valid account number, the application could assume that's what the caller said.


String probableAccountNumber = "";
int matches = 0;
for (int i = 0; i <>
String value = results.getNBestValue(i);
if (CheckAccountNumber(value)) { //check against the database
probableAccountNumber = value;
matches++;
}
}
if (matches == 1) {
results.replaceInput(probableAccountNumber); //let's use it!
}



NBest Confirmation
A more common use for NBest is to do confirmation. When NBest results are returned, the application confirms each NBest result, stopping as soon as the user says "yes".

C: What city?
H: Boston
C: (returns 'Austin' and 'Boston' as NBest results) Do you want Austin?
H: No
C: Do you want Boston?
H: Yes
C: Great. Flying to Boston on what date?
...

The application may still want to prune the NBest results, re-ordering the results according to the most likely answers. This way the first confirmation question is more likely to be the correct one. This is an important part of NBest -- using additional context information and application logic to improve on the speech rec engine's results.

Pass an NBestConfirmerFlow and the question flow object to a ConfirmationWrapper object that will manage the confirmation process. It will ask the user to confirm values until the caller accepts one (by saying "yes" or whatever your confirmation grammar uses for acceptance). If the caller says "no" to all NBest values, then the question is asked again, and the process repeats. You can override NBestConfirmerFlow to adjust this behaviour.

Note that NBest confirmation is an extension of basic confirmation. A YesNoConfirmerFlow confirms a single result, while a NBestConfirmerFlow confirms multiple results.

ConfirmationWrapper cw = new ConfirmationWrapper(new AskCity(),
new NBestConfirmer("yesno.grxml"));


Skip Lists
A skip list is a list of words that the application will not confirm because the caller has already rejected them. This is an optional feature of NBestConfirmerFlow. Enable it with the enableSkipList method. If the caller says "no" to all NBest values, then the question is asked again. Before beginning confirmation, NBestConfirmerFlow will remove from the new NBest results any values that were rejected during the previous round of confirmation questions. If this results in only a single NBest result, then there is no need for confirmation.

C: What city?
H: Crosston
C: (returns 'Austin' and 'Boston' as NBest results) Do you want Austin?
H: No
C: Do you want Boston?
H: No
C: (asking the question again) Let's try again. What city?
H: Crosston
C: (returns 'Austin' and 'Crosston' and 'Aulston' as NBest results. Austin is removed.)Do you want Crosston?
H: yes
C: Got it. Flying to Crosston on what date?
...

If you don't use a skip list, the application can infuriatingly confirm the same wrong result again and again.

NBest With SROs
SpeakRight Reusable Objects (SROs) are pre-built flow objects for gathering common data such as numbers, dates, etc.

To enable NBest for an SRO, use its enableNBest method. This will use an SROConfirmNBest confirmer object. If you need to use a custom confirmer, call enableNBest followed by setConfirmer to pass in your custom confirmer.


SRONumber flow = new SRONumber("tickets", 1, 10);
flow.enableNBest(4); //up to 4 results