What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents
Powered By Blogger

Thursday, April 26, 2007

Simpsons Demo

SimpsonsDemo is a simple voting application. It's a working speech application that demonstrates the use of SpeakRight in a non-trivial app. In SimpsonsDemo, users can call in and vote for a character from the Simpsons TV show, one vote per phone call. After voting they can hear the current standings and other information about the application. In order to make the application more complex, each character is assumed to have a related character who they are closest to. For Mister Burns this might be Smithers (and vice versa!).

Development Process

Productivity of developers is important. When code is quick to create, it's quick to change (and test!). This encourages an agile development process.

A SpeakRight application is developed in stages.

Stage 1. The application flow objects are created and wired together. Concentrate on the Model and the overall callflow. Don't worry about grammars and prompts at this point. Use inline grammars and default prompts. Do all testing using the keyboard-based interactive tester (and unit tests). The goal at this stage is to get the callflow logic working. External data access can be done at this stage, or mocked out, and done later in parallel with other stages.

Stage 2. Define the grammars and prompts. Use external grammars that take into consideration pre-amble, post-amble, and various ways of saying things ("LA" and "Los Angeles"). An application prompt XML file should be created that defines the main and error prompts for your flow objects. Deploy the app as a servlet and test using the HTML mode, where the app can be executed using a web browser. This will flush out missing files and other errors.

Stage 3. Deploy the app to the VoiceXML platform. At this point the callflow logic should be already complete and well tested, and the prompts and grammars defined. All that remains is to listen for mis-pronounciations, poor prosody, and other VUI-level errors.

The idea is to get as much testing done before deploying to the VoiceXML platform, where testing becomes much slower and more difficult (especially automated testing). Of course, it's not a fixed waterfall approach; you may, for example, need to prototype some VUI design issues on the VoiceXML platform before tackling stage 1.

Description of the Call Flow
When a user calls in they choose a character by saying the character's name, such as "Mister Burns". The application plays a short description of the character and asks if the user wishes to vote for this character. If yes, the vote is recorded and the user is taken to the main menu. Otherwise the user is asked if they want to hear about the related character. If the user says yes, the related character's description is played and the user has the opportunity to vote for the
character as before. If the user chooses not to hear about the related character then they are asked to choose another character.

The main menu has four options.
  • choose character select a Simpsons character and vote. described above.
  • voting results. Hear the voting results. Results are played in sets of three. The top three characters are listed. If the user says 'next', then the next three characters are listed.
  • call statistics. Lists # of calls, average call duration, and other statistics.
  • speakright. Plays a description of the SpeakRight framework.
Sample Call
(For brevity we've left off examples of error handling for silence and no-input errors)

Computer: Welcome to the Simpsons Demo where you can vote for your favorite Simpson's character. {pause} Please say the name of a character, such as "Mister Burns".
Human: Homer
C: Homer Simpson is the show's main character. His signature annoyed grunt "D'oh!" has been included in the Oxford English Dictionary. Do you want to vote for Homer?
H: No
C: Do you want to hear about Marge, Homer's wife?
H: Yes
C: Marge Simpson is the well-meaning and patient wife of Homer. Do you want to vote for Marge?
H: Yes.
C: Vote recorded. {pause} Main Menu. You can say 'choose character', 'voting results', 'call statistics', or 'hear about speakright'.
H: call statistics
C: There have been 413 calls with average length of 65 seconds. The average completion is...
H: Hangup


Pseudo-Code for the Call Flow

Let's write the call flow logic as a series of actions with some basic pseudo-code to represent branching and looping. Labels are marked in bold.

Welcome
A: ChooseCharacter
B: SayCharacterInfo
VoteForCharacterYesNo
if yes then goto MainMenu
HearAboutRelatedCharacterYesNo
if yes then goto B else goto A

MainMenu: MainMenu
if 'character' then goto A
else if 'results' then SayVotingResults
else if 'statistics' then SayCallStatistics
else if 'speakright' then SaySpeakRightInfo

Writing the Call Flow in Java
In SpeakRight, the pseudo-code for a callflow can be converted into Java code in a fairly simple way. Each action becomes a flow object, and is represented by a class derived from one of the SpeakRight base classes, such as a PromptFlow that plays some audio output. A series of flow objects are executed in sequence.

Where branching is required, the getNext method of a flow object can be overridden to add the branching logic. getNext returns either a flow object to be executed, or an event object which causes execution to jump to a previous point. Event objects and event handlers act like "throw" and "catch" respectively.

The application's data is stored in the model, a special class generated by the SpeakRight tool MGen. The model can be used to hold user input (such as the currently selected Simpson's character) and retrieved data (eg. from a database). It can also hold control data that is used to control the execution path.

Let's get started. The outermost flow object represents the entire callflow, and is always derived from SRApp.


public static class SimpsonsDemo extends SRApp
{
public SimpsonsDemo()
{
addPromptFlow("Welcome");
add(new MainLoop());
}
}

The constructor adds two sub-flows: a welcome prompt and a loop flow object. The welcome is only played once and the remainder of the callflow is done in a loop since the user can go back and forth from selecting a character to the main menu as many times as he or she likes.

The MainLoop class defines a model variable M. By convention, SpeakRight will inject a value for M at runtime automatically.

public static class MainLoop extends BranchFlow
{
public Model M;

The first method called in a flow object is its onBegin method. Here we initialize the two main model values. nextAction is used to control what MainLoop does next, and currentCharacterId is the currently chosen Simpsons' character.


@Override
public void onBegin() {
M.nextAction().set("A");
M.CurrentCharacterId().clear();
}

The next method to be called is getFirst. If nextAction is set to choose a character then we build a sequence of sub-flows for selecting and voting for a character. Otherwise we return the main menu flow object.

 
@Override
public IFlow branch() {
if (M.nextAction().get() == "A") {
BasicFlow flow = new BasicFlow();
if (M.getCurrentCharacterId() == 0) { //no char selected?
flow.add(new AskCharacter());
}
flow.add(new new PromptFlow("{$M.CurrentCharacterId}"));
flow.add(new VoteYesNo());
flow.add(new RelatedCharacterYesNo());
return flow;
}
else {
MainMenu menu = new MainMenu();
return menu;
}
}

AskCharacter is a flow object that asks the user to enter a character name.


public static class AskCharacter extends BaseSROQuestion {

public AskCharacter() {
super("character");

m_main1Prompt = "Say the name of a Simpson's character";
m_slotName = "x";
m_modelVar = "currentCharacterId";
}
}

The voting flow object is a yes/no question. The VoteYesNo class asks the question and then it's getNext method handles the result. If 'yes' is input then record the vote and jump to the main menu. The GotoMainLoopEvent event object is used to do this.

public static class VoteYesNo extends SROYesNo {
public VoteYesNo()
{
m_main1Prompt = "Do you want to vote for {$M.CharacterName}";
}

@Override
public IFlow onYes() {
return new MainLoop.GotoEvent(MainLoop.BRANCH_MAIN_MENU);
}
}


The GotoMainLoopEvent is caught by MainLoop in its onCatchGotoBranchEventmethod. Here we set the branching condition M.nextAction.

protected void onCatchGotoBranchEvent(GotoBranchEvent ev)
{
log("branch. action: " + ev.m_action);
M.nextAction().set(ev.m_action);
}

MainLoop is a BranchFlow with it's loop-forever set. MainLoop will be executed again, and depending on nextAction do either the main menu or choose-a-character.

SimpsonsDemo has an interactive tester (for keyboard testing), and an auto-tester (see Automated Testing).

Thursday, April 19, 2007

SpeakRight Reusable Objects (SROs)

One of the biggest challenges for a speech app framework is to maximise the reuse of common VUI dialogs. Collection of comon data elements should be reusable. This includes as time, date, currency, numbers, phone numbers, and zip code. Another area of commonality is user interface elements: login, main menu, "hot word" commands, list traversal, the enter-or-cancel pattern, and confirmation.

SpeakRight provides a set of reusable speech objects called SROs. They are configurable and extensible. Here is a list of the ways SROs can be "tweaked":

  • SROs have a full set of prompts, with main, silence, no-reco, and help prompts. Up to four escalations of each can be defined.
  • any or all prompts can be replaced. Each SRO has a subject, a word such as "flights", which is used to build prompts. Changing the subject word is the simplest way to adjust the prompts. There is an extension point for handling the plurality of subject words ("flight", "flights"). Or the entire prompt can be replaced.
  • prompts can be conditional, such as a prompt that only plays the first time an SRO is executed.
  • prompts can be defined at compile time, at runtime in code, or in external XML files.
  • grammars are replaceable. Inline grammars or grammar files can be used. The only restriction on a grammar is that it uses the slot names required by the SRO.
  • validation code can be added. This server-side code inspects use input and either accepts it or causes the SRO to re-execute.
  • Model binding. An SRO has a model variable name. When set, the user input results are bound to the model (i.e. stored in the model for later use by the app)
  • Command phrases can be added to an SRO. A common data entry pattern is the enter-data-or-say-cancel pattern. SROs have a list of command phrases that you can add to.
  • confirmation can be added. An SRO has a confirmation plug-in that can be used to add various forms of confirmation (explicit, implicit, or confirm-and-correct).


The current list of SROs is:
  • SROCancelCommand
  • SROChoice
  • SROConfirmYesNo
  • SRODigitString
  • SROListNavigator
  • SRONumber
  • SROOrdinalItem
  • SROTransferCall
  • SROYesNo
Adding new SROs is simple. Code-generation (using StringTemplate) generates a base class (eg. genSRONumber) that manages prompts and grammars. You only need derive the actual SRO class to add specific logic.