Sometime in the 1980s, voice applications (called
IVRs) appeared. And
drag-and-drop toolkits followed.
IVR apps were structured much like a flowchart since the user navigated the
callflow using one of 12
DTMF keys. A visual programming model seemed appropriate for
IVR development. The tools work well on small projects of up to fifty nodes or so. On larger apps the visual approach breaks down. It become hard to navigate an app with hundreds or thousands of
nodes. Code changes become tedious; try changing the
MaxRetries value from 3 to 4 in all
GetDigits nodes in a 200 node app! Also, the architectural weakness of visual programming becomes more apparent as size increases. It's programming model is really a 1960s FORTRAN model based on
GOTOs and global variables. Structured programming features, let alone object-oriented features are simply not supported.
Drag-and-drop
toolkits remain viable for
DTMF apps because the apps are simple. Speech applications are much more complicated that the equivalent
DTMF app. There are roughly nine times as many prompts (escalated versions of the main prompt and silence and
nomatch prompts). Confirmation needs to be done since speech recognition is never 100% accurate. Lastly, speech apps are more complicated because, released from the limitations of 12
DTMF keys, they try to do more. This complexity means that speech apps need a more powerful development environment, such as the Java programming language.
The first wave of speech applications were written directly in
VoiceXML. Again this is simple for small apps but doesn't scale. A large app has many
voicexml files, and the relationship between them is not clearly shown. A
login.vxml file may submit its results to
main_menu.vxml, but that is not apparent in looking at a list of files. Raw
VoiceXML does not have any modern programming constructs such as inheritance or design patterns. Lastly, unit testing and debugging are difficult.
This brings us to the final option: a code-based approach.
Write the application in Java.
IDEs are powerfulUse the full power of a good
IDE with
refactoring support, code assist (AKA
Intellisense), unit testing, debugging, and integrated source control. The Eclipse
IDE, for example, is used by millions of programmers. It will be improved and extended at a far faster rate than any proprietary toolkit. And Eclipse is free.
Better DebuggingJava
IDEs have real debuggers. Enough said.
Better TestingJava
IDEs have excellent unit testing.
SpeakRight provides a keyboard-based interactive tester, and an HTML mode for executing an app using an ordinary web browser (HTML is generated instead of
VXML).
More toolsThere are source code tools for code coverage, profiling, generating documentation and design diagrams. Source code control tools allow important questions such as 'what's changed since last week' to be answered.
Code is flexible
Source code is extremely flexible. Unlike drag-and-drop tools that offer only a few levels of granularity, code can be organized and combined in many ways. Let's look at the ways code can be used.
See also
Matt Raible On Web FrameworksConfigurationAn object can be configured by settings its properties. This allows re-usable objects to be customized for each use. The customization can be done in code
flow.m_
maxAttempts = 4;
or it can come from external configuration files.
SpeakRight allows prompts and grammars to reside in XML files that can be changed post-deployment without having to rebuild the app.
Sub-classingThe DRY principle is Don't Repeat Yourself. DRY reduces the amount of source code and makes modifications simpler. Java inheritance is one way of centralizing common code. Any class deriving from a base class gets the base classes' behaviour automatically. In our example concerning
GetDigit nodes, the
MaxRetries value can be defined a single place (in the base class). Changing the value in the base class causes the change to ripple down to all derived classes.
Java inheritance is flexible because values or behaviour can be overridden at any point in the class
heirarchy.
CompositionComposition is the process of assembling together objects into useful components. Modern frameworks make much use of interfaces and
extension points. An extension point allows behaviour to be changed by plugging in different implementations of it. In
SpeakRight, confirmation is an extension point where different types of confirmation can be plugged-in: yes-no confirmation, confirm-and-correct confirmation, and implicit
confimation.
Extension points increase re-use because the number of options multiply. If you have four types of
GetNumber objects and three types of confirmation, you have twelve types of
GetNumber-And-Confirm behaviour to choose from.
A menu is another example of an extension point. A menu is basically a question followed by a switch statement. Both are extension points that allow flexible menus to be created that still share the common base code.
RefactoringRefactoring is the process of improving code quality without changing the external behaviour. Common code can be pulled into methods or classes. Interfaces and extension points can be added to increase the flexibility of a class. Code can be packaged into
namespaces and
libraries. Inherited behaviour can be overridden.
A code-based approach allows all these modern software development techniques to be applied to speech apps.
Changing The FrameworkSpeakRight is open-source so everything is available to you for modification.
Everything is a Flow ObjectIn
SpeakRight, the
callflow is built out of flow objects. Everything from a low-level "yes-no" question, to a form with multiple fields, up to the app itself are flow objects.
Flow objects participate in generating content (
VoiceXML). A flow object is notified of each prompt being rendered, and allowed to modify it. Flow objects can control which
VoiceXML is generated, and if needed, the entire
VoiceXML rendering can be replaced (it's another extension point).
Flow objects participate in deciding the execution path through the
callflow. Because they return an
IFlow object to be run next, it's easy to inject additional behaviour when needed. Confirmation is done this way.
Consider a VUI dialog for traversing a list of items. The common behaviour is the commands "next" and "previous" (and possibly "first" and "last"). These move to a new item and say its value, or play an error message if the end of the list has been reached. List traversal is a common VUI feature, but difficult to make into a re-usable artifact in a non-code-based approach. With code however, this is a standard sort of OO design task.
- Prompts and grammars are made into fields. Default values are provided but can be overridden or configured using getter and setter methods.
- The list is a generic Java Collection, allowing it to be a list of anything. An IItemFormatter interface is created so the rendering of a Java object (string, integer, XML, whatever) into a prompt becomes an extension point. The default formatter just uses toString.
- SpeakRight's flow objects are pause-able. This means that a list traverser can pause while another VUI dialog runs, and resume when it finishes. A list traverser can now be a main menu for an app that works on a list of items (such as flights to select from). Additional commands can be added, so that in addition to the traversal commands, the menu can accept additional commands such as "details", "accept", and "search". All of this is built on top of the existing list traversal class; no code duplication is required.
And unlike a drag-and-drop toolkit where a list traversal node has a fixed set of features, there are no restrictions in a code-based approach. You want "next" to wrap-around when it reaches the end of the list? No problem.
Less Code, Less TestingA code-based approach, by promoting re-use and the DRY principle, reduces the size of the application. This has many benefits:
- Faster development. Less time spent in tedious repetitive work.
- Easier to change. A class hierarchy is like the paragraph styles in a word processor. Rather than sprinkling formatting all over the document, it's kept in a few styles (base classes) where it's easily managed and changed.
- Consistent Voice User Interface. Shared code leads to shared behaviour, which leads to a consistent user interface.
- Reduced testing. This is a huge gain. When common code is used, it only needs to be tested once, even though its used multiple times in the app. For example, yes-no confirmation can be plugged in to the confirmation extension point of a flow object. Once you've validated that it works in one flow object, there's no need to test all other flow objects since they share the same (base class) code.
Flexible Development
A code-based approach lacks the artificial boundaries of drag-and-drop
toolkits. Development can begin by using existing classes and configuring them as needed. When you find the same
VUI dialogs appearing in multiple places, sub-classing can be used to centralize a common configuration, such as a
MyGetPassword class. As more new classes are created they can be combined into a class
heirarchy in order to share common code, with extension points added where variability is needed. When classes are re-used in other projects they can be packaged as a library.