What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents

Wednesday, May 2, 2007

Benefits of a Code-Based Approach

Sometime in the 1980s, voice applications (called IVRs) appeared. And drag-and-drop toolkits followed. IVR apps were structured much like a flowchart since the user navigated the callflow using one of 12 DTMF keys. A visual programming model seemed appropriate for IVR development. The tools work well on small projects of up to fifty nodes or so. On larger apps the visual approach breaks down. It become hard to navigate an app with hundreds or thousands of nodes. Code changes become tedious; try changing the MaxRetries value from 3 to 4 in all GetDigits nodes in a 200 node app! Also, the architectural weakness of visual programming becomes more apparent as size increases. It's programming model is really a 1960s FORTRAN model based on GOTOs and global variables. Structured programming features, let alone object-oriented features are simply not supported.

Drag-and-drop toolkits remain viable for DTMF apps because the apps are simple. Speech applications are much more complicated that the equivalent DTMF app. There are roughly nine times as many prompts (escalated versions of the main prompt and silence and nomatch prompts). Confirmation needs to be done since speech recognition is never 100% accurate. Lastly, speech apps are more complicated because, released from the limitations of 12 DTMF keys, they try to do more. This complexity means that speech apps need a more powerful development environment, such as the Java programming language.

The first wave of speech applications were written directly in VoiceXML. Again this is simple for small apps but doesn't scale. A large app has many voicexml files, and the relationship between them is not clearly shown. A login.vxml file may submit its results to main_menu.vxml, but that is not apparent in looking at a list of files. Raw VoiceXML does not have any modern programming constructs such as inheritance or design patterns. Lastly, unit testing and debugging are difficult.

This brings us to the final option: a code-based approach. Write the application in Java.

IDEs are powerful
Use the full power of a good IDE with refactoring support, code assist (AKA Intellisense), unit testing, debugging, and integrated source control. The Eclipse IDE, for example, is used by millions of programmers. It will be improved and extended at a far faster rate than any proprietary toolkit. And Eclipse is free.

Better Debugging
Java IDEs have real debuggers. Enough said.

Better Testing
Java IDEs have excellent unit testing. SpeakRight provides a keyboard-based interactive tester, and an HTML mode for executing an app using an ordinary web browser (HTML is generated instead of VXML).

More tools
There are source code tools for code coverage, profiling, generating documentation and design diagrams. Source code control tools allow important questions such as 'what's changed since last week' to be answered.

Code is flexible
Source code is extremely flexible. Unlike drag-and-drop tools that offer only a few levels of granularity, code can be organized and combined in many ways. Let's look at the ways code can be used.

See also Matt Raible On Web Frameworks

Configuration
An object can be configured by settings its properties. This allows re-usable objects to be customized for each use. The customization can be done in code

flow.m_maxAttempts = 4;

or it can come from external configuration files. SpeakRight allows prompts and grammars to reside in XML files that can be changed post-deployment without having to rebuild the app.

Sub-classing
The DRY principle is Don't Repeat Yourself. DRY reduces the amount of source code and makes modifications simpler. Java inheritance is one way of centralizing common code. Any class deriving from a base class gets the base classes' behaviour automatically. In our example concerning GetDigit nodes, the MaxRetries value can be defined a single place (in the base class). Changing the value in the base class causes the change to ripple down to all derived classes.

Java inheritance is flexible because values or behaviour can be overridden at any point in the class heirarchy.

Composition
Composition is the process of assembling together objects into useful components. Modern frameworks make much use of interfaces and extension points. An extension point allows behaviour to be changed by plugging in different implementations of it. In SpeakRight, confirmation is an extension point where different types of confirmation can be plugged-in: yes-no confirmation, confirm-and-correct confirmation, and implicit confimation.

Extension points increase re-use because the number of options multiply. If you have four types of GetNumber objects and three types of confirmation, you have twelve types of GetNumber-And-Confirm behaviour to choose from.

A menu is another example of an extension point. A menu is basically a question followed by a switch statement. Both are extension points that allow flexible menus to be created that still share the common base code.

Refactoring
Refactoring is the process of improving code quality without changing the external behaviour. Common code can be pulled into methods or classes. Interfaces and extension points can be added to increase the flexibility of a class. Code can be packaged into namespaces and libraries. Inherited behaviour can be overridden.

A code-based approach allows all these modern software development techniques to be applied to speech apps.

Changing The Framework
SpeakRight is open-source so everything is available to you for modification.

Everything is a Flow Object
In SpeakRight, the callflow is built out of flow objects. Everything from a low-level "yes-no" question, to a form with multiple fields, up to the app itself are flow objects.

Flow objects participate in generating content (VoiceXML). A flow object is notified of each prompt being rendered, and allowed to modify it. Flow objects can control which VoiceXML is generated, and if needed, the entire VoiceXML rendering can be replaced (it's another extension point).

Flow objects participate in deciding the execution path through the callflow. Because they return an IFlow object to be run next, it's easy to inject additional behaviour when needed. Confirmation is done this way.

Consider a VUI dialog for traversing a list of items. The common behaviour is the commands "next" and "previous" (and possibly "first" and "last"). These move to a new item and say its value, or play an error message if the end of the list has been reached. List traversal is a common VUI feature, but difficult to make into a re-usable artifact in a non-code-based approach. With code however, this is a standard sort of OO design task.
  • Prompts and grammars are made into fields. Default values are provided but can be overridden or configured using getter and setter methods.
  • The list is a generic Java Collection, allowing it to be a list of anything. An IItemFormatter interface is created so the rendering of a Java object (string, integer, XML, whatever) into a prompt becomes an extension point. The default formatter just uses toString.
  • SpeakRight's flow objects are pause-able. This means that a list traverser can pause while another VUI dialog runs, and resume when it finishes. A list traverser can now be a main menu for an app that works on a list of items (such as flights to select from). Additional commands can be added, so that in addition to the traversal commands, the menu can accept additional commands such as "details", "accept", and "search". All of this is built on top of the existing list traversal class; no code duplication is required.
And unlike a drag-and-drop toolkit where a list traversal node has a fixed set of features, there are no restrictions in a code-based approach. You want "next" to wrap-around when it reaches the end of the list? No problem.

Less Code, Less Testing
A code-based approach, by promoting re-use and the DRY principle, reduces the size of the application. This has many benefits:
  • Faster development. Less time spent in tedious repetitive work.
  • Easier to change. A class hierarchy is like the paragraph styles in a word processor. Rather than sprinkling formatting all over the document, it's kept in a few styles (base classes) where it's easily managed and changed.
  • Consistent Voice User Interface. Shared code leads to shared behaviour, which leads to a consistent user interface.
  • Reduced testing. This is a huge gain. When common code is used, it only needs to be tested once, even though its used multiple times in the app. For example, yes-no confirmation can be plugged in to the confirmation extension point of a flow object. Once you've validated that it works in one flow object, there's no need to test all other flow objects since they share the same (base class) code.


Flexible Development
A code-based approach lacks the artificial boundaries of drag-and-drop toolkits. Development can begin by using existing classes and configuring them as needed. When you find the same VUI dialogs appearing in multiple places, sub-classing can be used to centralize a common configuration, such as a MyGetPassword class. As more new classes are created they can be combined into a class heirarchy in order to share common code, with extension points added where variability is needed. When classes are re-used in other projects they can be packaged as a library.

No comments: