The SpeakRight Framework: May 2007

What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents

Thursday, May 24, 2007

List of Flow Objects

Flow objects are the building blocks of SpeakRight applications. Here is the list of available objects:

BranchFlow Performs branching in the callflow based on an application-defined condition
ChoiceFlow Branches based on user input, such as in a menu
DisconnectFlow Hangs up the call
FlowList a sequence of flow objects, optionally ending with an AppEvent
GotoUrlFlow Redirects to an external URL
LoopFlow Iterates over a sequence of sub-flows
NBestConfirmerFlow confirms NBest results
PromptFlow Plays one or more prompts
QuestionFlow Asks the user a question. Has built-in error retries for silence and nomatch.
RawContentFlow Ouput raw VoiceXML
RecordAudioFlow record the caller's voice to an audio file.
SRApp The root flow object
TransferFlow Transfer the call
YesNoConfirmerFlow used to confirm a single result

Additional flow objects can be created by implementing the IFlow interface.

There are also SROs (SpeakRight Reusable Objects) which you can use.

Call Control

The focus of SpeakRight is to be a framework for voice user interfaces. As such, it's support for call control is fairly limited. Consider using CCXML for applications with heavy use of call control. Or the RawContentFlow to generate platform-specific transfer and conferencing features.

SpeakRight catches the disconnect event (usually connection.disconnect) in order to do a final postback. This results in the DISCONNECT event being thrown and onDisconnect being invoked. SRApp provides a default handler for this.

There are a number of call control flow objects.

DisconnectFlow
This flow object plays a final prompt and hangs up.

SROTransferCall
This flow object transfers the call using one of the VoiceXML 2.1 transfer types: Blind, Bridged, or Consultation. The specified destination parameter is a string, such as

"tel:+2560001112222"

The format of the dial string is often platform specific.

TransferFlow has two prompts. An initial prompt called main is played before the transfer is initiated. A transferFailed prompt is played if the transfer fails to complete. Both these prompts can be overriden in an app-specific prompt XML file.

If the transfer fails because, for example, the destination is busy, then onTransferFailed is invoked. It plays the transferFailed prompt. Or you can override onTransferFailed and provide your own behaviour.

TransferFlow
This flow object is a low-level object. We recommend the use of SROTransferCall instead.

RawContentFlow
This flow object is an escape hatch. The application can supply any VoiceXML it likes. RawContentFlow may be useful for invoking platform-specific call control features.

Thursday, May 17, 2007

Optional Sub-Flow Objects

It's common in a speech application for some sections of the callflow to be optional. if the user is a preferred customer do X. Or if the app has forceLogin enabled then do Y.

Applications can simply create or not create certain flow objects based on the required logic. However, an alternative solution is provided called optional sub-flows. The BasicFlow and LoopFlow classes have been enhanced. The flow objects they contain (called sub-flows) can indicate that they don't wish to run by returning false from shouldExecute. When this occurs, the sub-flow is skipped and the next sub-flow is executed.

The advantage of optional sub-flows is that the decision on whether to run or not can be deferred until a sub-flow is executed. The initialization code doesn't need to handle this.

However there are a restrictions: the final sub-flow cannot be optional. Because if the final sub-flow returns null from its getFirst, there's nothing to execute.

Example code that creates the callflow using a BasicFlow object:

//callflow creation..
BasicFlow flow = new BasicFlow();
flow.add(new PromptFlow("Welcome"));
flow.add(new RegisterUser());
flow.add(new MainMenu());
app.add(flow);

And the class definition for the optional sub-flow looks like this:

class RegisterUser extends BaseSROQuestion {
public Model M;

@Override public boolean shouldExecute() {
return M.userHasRegistered;
}
//..rest of class definition omitted..

Note. I'm not completely happy with this feature. It's handy but the cant-be-last restriction will be easy to forget and will only fail at runtime. Full test coverage is required!

Wednesday, May 16, 2007

Supported Platforms

SpeakRight has been tested on these platforms:

Voxeo Evolution (free hosting site http://community.voxeo.com/). VoiceCenter 5.5
Voxeo Prophecy platform version 8.0 beta

The main mechanism for porting to a new framework is to customize the template file; see StringTemplate template engine.

VoiceXML Tags Supported

Here's an alphabetical list of VoiceXML tags supported by SpeakRight

assign name expr used to track dialog state
audio src play audio file
block
break time time is in msec
catch connection.disconnect So app gets a final postback
disconnect
exit
field name
filled
form one per page
goto next To goto an external URL
grammar type src type can be "text/gsl", "application/srgs+xml", or"application/srgs".
noinput count bargein main prompt(s), one per escalation
nomatch count bargein main prompt(s), one per escalation
prompt count bargein main prompt(s), one per escalation
submit next namelist method
transfer type dest connecttimeout
var name expr used to track dialog state

In addition, the RawContentFlow can be used by an app to output custom VoiceXML. It's useful for features not yet supported by SpeakRight.

You can customize the VoiceXML; see the StringTemplate template engine.

Thursday, May 10, 2007

Version 0.0.3 Released

Latest code drop is available at SourceForge (see Download link on the left). The project has been elevated to alpha status and can be used to build some real apps. The SimpsonsDemo app is an example of this, and is included as part of the release.

Thursday, May 3, 2007

Automated Testing

Computers are good at doing repetitive work. Few things are more repetitive than testing software -- so let the computer do it!

SpeakRight provides an automated tester SRAutoTester. It runs your callflow in a test harness where user input comes from strings that you provide. It checks the progress through the callflow to validate the application logic. Tests can be run directly on a developer's machine, no VoiceXML platform is needed.

SRAutoTester uses the same test harness as SRInteractiveTester, so you can test manually (using the keyboard) and auto-test, and vice versa.

The format of the string you give is a set of commands separated by semi-colons, where each command's format is: cmd[~ExpectedCurrentFlowObject]

The commands are

"e" echo. toggle echo to log of VoiceXML on/off
"g" go. simulate the current VoiceXML page ending and posting back. Causes SpeakRight to proceed to the next flow object. Can contain user input, such as "go chicago" or if a slot is set use ":" like this "go city:chicago"
"q" quit

Let's do a quick example. An app begins with a welcome prompt then asks for user id and password. If the user input is a valid login then the app proceeds to "MainMenu" otherwise it plays "LoginFailed".

The commands for testing a good login are: g;g 4552;g 1234;g~MainMenu;q
Let's break that down:

"g" means run the first flow object, which is the welcome prompt
"g 4552" is the user id
"g 1234" is the password
"g~MainMenu" validates that we're at the MainMenu flow object

To test a bad login: g;g 9999;g 9999;g~LoginFailed;q

Note that these tests are high-level tests of the callflow. They allow requirements such as "A bad login does not proceed to the Main Menu". Testing the details of the VUI prompts and grammars need to be done as well; either with another SpeakRight tester (TBD) or on the VoiceXML platform itself.

Wednesday, May 2, 2007

Benefits of a Code-Based Approach

Sometime in the 1980s, voice applications (called IVRs) appeared. And drag-and-drop toolkits followed. IVR apps were structured much like a flowchart since the user navigated the callflow using one of 12 DTMF keys. A visual programming model seemed appropriate for IVR development. The tools work well on small projects of up to fifty nodes or so. On larger apps the visual approach breaks down. It become hard to navigate an app with hundreds or thousands of nodes. Code changes become tedious; try changing the MaxRetries value from 3 to 4 in all GetDigits nodes in a 200 node app! Also, the architectural weakness of visual programming becomes more apparent as size increases. It's programming model is really a 1960s FORTRAN model based on GOTOs and global variables. Structured programming features, let alone object-oriented features are simply not supported.

Drag-and-drop toolkits remain viable for DTMF apps because the apps are simple. Speech applications are much more complicated that the equivalent DTMF app. There are roughly nine times as many prompts (escalated versions of the main prompt and silence and nomatch prompts). Confirmation needs to be done since speech recognition is never 100% accurate. Lastly, speech apps are more complicated because, released from the limitations of 12 DTMF keys, they try to do more. This complexity means that speech apps need a more powerful development environment, such as the Java programming language.

The first wave of speech applications were written directly in VoiceXML. Again this is simple for small apps but doesn't scale. A large app has many voicexml files, and the relationship between them is not clearly shown. A login.vxml file may submit its results to main_menu.vxml, but that is not apparent in looking at a list of files. Raw VoiceXML does not have any modern programming constructs such as inheritance or design patterns. Lastly, unit testing and debugging are difficult.

This brings us to the final option: a code-based approach. Write the application in Java.

IDEs are powerful
Use the full power of a good IDE with refactoring support, code assist (AKA Intellisense), unit testing, debugging, and integrated source control. The Eclipse IDE, for example, is used by millions of programmers. It will be improved and extended at a far faster rate than any proprietary toolkit. And Eclipse is free.

Better Debugging
Java IDEs have real debuggers. Enough said.

Better Testing
Java IDEs have excellent unit testing. SpeakRight provides a keyboard-based interactive tester, and an HTML mode for executing an app using an ordinary web browser (HTML is generated instead of VXML).

More tools
There are source code tools for code coverage, profiling, generating documentation and design diagrams. Source code control tools allow important questions such as 'what's changed since last week' to be answered.

Code is flexible
Source code is extremely flexible. Unlike drag-and-drop tools that offer only a few levels of granularity, code can be organized and combined in many ways. Let's look at the ways code can be used.

See also Matt Raible On Web Frameworks

Configuration
An object can be configured by settings its properties. This allows re-usable objects to be customized for each use. The customization can be done in code

flow.m_maxAttempts = 4;

or it can come from external configuration files. SpeakRight allows prompts and grammars to reside in XML files that can be changed post-deployment without having to rebuild the app.

Sub-classing
The DRY principle is Don't Repeat Yourself. DRY reduces the amount of source code and makes modifications simpler. Java inheritance is one way of centralizing common code. Any class deriving from a base class gets the base classes' behaviour automatically. In our example concerning GetDigit nodes, the MaxRetries value can be defined a single place (in the base class). Changing the value in the base class causes the change to ripple down to all derived classes.

Java inheritance is flexible because values or behaviour can be overridden at any point in the class heirarchy.

Composition
Composition is the process of assembling together objects into useful components. Modern frameworks make much use of interfaces and extension points. An extension point allows behaviour to be changed by plugging in different implementations of it. In SpeakRight, confirmation is an extension point where different types of confirmation can be plugged-in: yes-no confirmation, confirm-and-correct confirmation, and implicit confimation.

Extension points increase re-use because the number of options multiply. If you have four types of GetNumber objects and three types of confirmation, you have twelve types of GetNumber-And-Confirm behaviour to choose from.

A menu is another example of an extension point. A menu is basically a question followed by a switch statement. Both are extension points that allow flexible menus to be created that still share the common base code.

Refactoring
Refactoring is the process of improving code quality without changing the external behaviour. Common code can be pulled into methods or classes. Interfaces and extension points can be added to increase the flexibility of a class. Code can be packaged into namespaces and libraries. Inherited behaviour can be overridden.

A code-based approach allows all these modern software development techniques to be applied to speech apps.

Changing The Framework
SpeakRight is open-source so everything is available to you for modification.

Everything is a Flow Object
In SpeakRight, the callflow is built out of flow objects. Everything from a low-level "yes-no" question, to a form with multiple fields, up to the app itself are flow objects.

Flow objects participate in generating content (VoiceXML). A flow object is notified of each prompt being rendered, and allowed to modify it. Flow objects can control which VoiceXML is generated, and if needed, the entire VoiceXML rendering can be replaced (it's another extension point).

Flow objects participate in deciding the execution path through the callflow. Because they return an IFlow object to be run next, it's easy to inject additional behaviour when needed. Confirmation is done this way.

Consider a VUI dialog for traversing a list of items. The common behaviour is the commands "next" and "previous" (and possibly "first" and "last"). These move to a new item and say its value, or play an error message if the end of the list has been reached. List traversal is a common VUI feature, but difficult to make into a re-usable artifact in a non-code-based approach. With code however, this is a standard sort of OO design task.

Prompts and grammars are made into fields. Default values are provided but can be overridden or configured using getter and setter methods.
The list is a generic Java Collection, allowing it to be a list of anything. An IItemFormatter interface is created so the rendering of a Java object (string, integer, XML, whatever) into a prompt becomes an extension point. The default formatter just uses toString.
SpeakRight's flow objects are pause-able. This means that a list traverser can pause while another VUI dialog runs, and resume when it finishes. A list traverser can now be a main menu for an app that works on a list of items (such as flights to select from). Additional commands can be added, so that in addition to the traversal commands, the menu can accept additional commands such as "details", "accept", and "search". All of this is built on top of the existing list traversal class; no code duplication is required.

And unlike a drag-and-drop toolkit where a list traversal node has a fixed set of features, there are no restrictions in a code-based approach. You want "next" to wrap-around when it reaches the end of the list? No problem.

Less Code, Less Testing
A code-based approach, by promoting re-use and the DRY principle, reduces the size of the application. This has many benefits:

Faster development. Less time spent in tedious repetitive work.
Easier to change. A class hierarchy is like the paragraph styles in a word processor. Rather than sprinkling formatting all over the document, it's kept in a few styles (base classes) where it's easily managed and changed.
Consistent Voice User Interface. Shared code leads to shared behaviour, which leads to a consistent user interface.
Reduced testing. This is a huge gain. When common code is used, it only needs to be tested once, even though its used multiple times in the app. For example, yes-no confirmation can be plugged in to the confirmation extension point of a flow object. Once you've validated that it works in one flow object, there's no need to test all other flow objects since they share the same (base class) code.

Flexible Development
A code-based approach lacks the artificial boundaries of drag-and-drop toolkits. Development can begin by using existing classes and configuring them as needed. When you find the same VUI dialogs appearing in multiple places, sub-classing can be used to centralize a common configuration, such as a MyGetPassword class. As more new classes are created they can be combined into a class heirarchy in order to share common code, with extension points added where variability is needed. When classes are re-used in other projects they can be packaged as a library.

Prompt Ids and Prompt XML Files

Tuning a speech application often involves changing prompts, to re-word a question or improve an error message. This should be possible without having to rebuild the app. Speakright uses prompt ids to provide this feature.

A prompt id is a ptext item beginning with "id:", such as "id:outOfRange". When the prompt is rendered, a set of XML files are searched to find an entry for that id. The entry looks like this:

<prompt name="outOfRange" def="true">That value is out of range. </prompt>

The prompt text for the id is a full PText; it can contain references to other ids, for example. It's an error if a prompt id cannot be found in any of the XML files.

Which set of XML files? Well you get to define them using SRRunner.registerPromptFile, usually one per app. The framework itself may register some; each SRO has its own prompt XML. The registration may be permanent (for the life of the app), or temporary (for the current flow object execution). The list of XML files is searched in reverse order so that your XML files are searched first, and framework XML files searched last.

Prompt Groups

Another useful feature is the ability to build an app using rough prompts, and then finalize the prompts later without having to do any code changes. Prompt Groups does this. Each flow object has a prompt group. The default value is the flow object's name. Prompt ids are looked up twice. First the prefix is added, so for an id "id:outOfRange" in a flow object "MyMenu" the first lookup is "id:MyMenu.outOfRange". If this prompt id is found, the value in the XML file is used. If not, then a second lookup without the prefix is done, which for our example would be "id:outOfRange".

All SROs use prompt ids with default values (see Prompts in SROs). The default prompts are usually good enough to get your app logic up and tested. Then you can create an app-specific prompt XML file and register it (using SRRunner.registerPromptFile). Now you can define the prompt text for all the flow objects at your leisure. No code changes needed!

The SpeakRight Framework