The SpeakRight Framework

What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents

Friday, February 1, 2008

Bottom-up Programming

Peter Bell's blog is a great guide to domain-specific programming, application frameworks, and DSLs. His article Bottom-Up Programming really captures a style of programming I like. It refers to Paul Graham article that contrasts the old top-down programming style of Structure Programming, and the bottom-up programming often used in LISP. LISP, like Smalltalk, allows you to extend the language itself. LISP programmers would simply add to the language as they needed new features or mechanisms. The resulting solution grew from the bottom up until it could express cleanly and concisely what the domain problem required.

This ties in to the Agile notion of "listening to the code". Code a solution normally and then look at the code for common patterns. Extract those into re-usable classes and components. Repeat. You end up with a framework for building a family of related programs. Most programmers work on related products in a particular area (telephony, graphics, web commerce, whatever), so this framework-building is rarely wasted effort.

Bottom-up programming also ties into Dijkstra's notion that programmers produce about the same number of lines of code regardless of the language. So the higher a level at which you can work, the more productive you can be.

Speakright shares this philosophy. Make a framework which lets programmers write speech apps at a higher level. And able to extend the framework themselves.

Tuesday, December 4, 2007

The StringTemplate Template Engine

The StringTemplate template engine is a popular choice for generating markup text in Java. It comes from Terrence Parr, the inventor of ANTLR.

SpeakRight uses StringTemplate (ST) for all its VoiceXML generation. When a flow object is rendered, it is first converted into a SpeechPage object. A SpeechPage is not VoiceXML-specific, and allows SpeakRight to output other formats such as SALT, or whatever you want. It's also the glue that ST requires. SpeechPages are rendered using one of the ISpeechPageWriter classes. For testing, an HTML page writer is available. The main page writer though is VoiceXMLPageWriter.

VoiceXMLPageWriter into VoiceXML. A StringTemplate file defines the format for prompts, grammars, fields, forms and other VoiceXML tags. This gives a lot of flexibility. If your VoiceXML platform has special requirements, simply modify the speakright.stg template file.

Matt Raible on web frameworks

Matt Raible has a fascinating video comparing web frameworks. Comparisons are tricky since frameworks are changing rapidly, with multiple releases per year. However, he makes an interesting aside about the (lack of) value of visual IDEs. JSF comes with a drag-and-drop IDE that is "appealing to managers", but "if one wants to develop anything substantial, we're going to have to get down and dirty with the code."

I've been espousing a code-based approach for speech applications for while. Indeed that's the whole premise of the SpeakRight framework. Any substantial app will use dynamically generated code, and not be pages of handwritten markup text.

Things are a bit simpler for speech applications. The following criteria for comparing web frameworks don't apply

Bookmarkable URLs.
Avoiding the double-POST problem
AJAX
Massive scalability. Web applications may involve millions of users but speech apps are still orders of magnitude smaller.
Page decoration. The vast topic of graphical design doesn't exist in a speech app. Persona is as close as one gets to "decoration".

Monday, December 3, 2007

NBest

NBest is a very useful feature for handling similar sounding words. Normally a speech rec engine finds the grammar rule that is the best match for the user's speech utterance. Large grammars can suffer from substitution errors where the wrong rule is matched: caller says "Boston" but the engine selects "Austin". NBest helps the application sort out this type of ambiguity.

When enabled, NBest is a request to the speech rec engine to return the top N matches, sorted in order of decreasing confidence level. N is usually a small number, such as 4. Remember that the NBest value is a maximum; fewer results may be returned.

In SpeakRight, NBest is enabled using the QuestionFlow method enableNBest.


flow.enableNBest(4); //up to 4 results

When the SRResults come back for that question, you can check for NBest results. The SRResults method hasNBest which indicates that more than one result was returned.

NBest Pruning
The simplest thing an application can do, is check the NBest results in validateInput, and use additional application logic to select the most likely result. This is called NBest pruning. For example, if the user is asked for her account number, each result can be checked against the database. If only one result is a valid account number, the application could assume that's what the caller said.


String probableAccountNumber = "";
int matches = 0;
for (int i = 0; i <>
String value = results.getNBestValue(i);
if (CheckAccountNumber(value)) {   //check against the database
probableAccountNumber = value;
matches++;
}
}
if (matches == 1) {
results.replaceInput(probableAccountNumber); //let's use it!
}

NBest Confirmation
A more common use for NBest is to do confirmation. When NBest results are returned, the application confirms each NBest result, stopping as soon as the user says "yes".

C: What city?
H: Boston
C: (returns 'Austin' and 'Boston' as NBest results) Do you want Austin?
H: No
C: Do you want Boston?
H: Yes
C: Great. Flying to Boston on what date?
...

The application may still want to prune the NBest results, re-ordering the results according to the most likely answers. This way the first confirmation question is more likely to be the correct one. This is an important part of NBest -- using additional context information and application logic to improve on the speech rec engine's results.

Pass an NBestConfirmerFlow and the question flow object to a ConfirmationWrapper object that will manage the confirmation process. It will ask the user to confirm values until the caller accepts one (by saying "yes" or whatever your confirmation grammar uses for acceptance). If the caller says "no" to all NBest values, then the question is asked again, and the process repeats. You can override NBestConfirmerFlow to adjust this behaviour.

Note that NBest confirmation is an extension of basic confirmation. A YesNoConfirmerFlow confirms a single result, while a NBestConfirmerFlow confirms multiple results.

ConfirmationWrapper cw = new ConfirmationWrapper(new AskCity(),
new NBestConfirmer("yesno.grxml"));

Skip Lists
A skip list is a list of words that the application will not confirm because the caller has already rejected them. This is an optional feature of NBestConfirmerFlow. Enable it with the enableSkipList method. If the caller says "no" to all NBest values, then the question is asked again. Before beginning confirmation, NBestConfirmerFlow will remove from the new NBest results any values that were rejected during the previous round of confirmation questions. If this results in only a single NBest result, then there is no need for confirmation.

C: What city?
H: Crosston
C: (returns 'Austin' and 'Boston' as NBest results) Do you want Austin?
H: No
C: Do you want Boston?
H: No
C: (asking the question again) Let's try again. What city?
H: Crosston
C: (returns 'Austin' and 'Crosston' and 'Aulston' as NBest results. Austin is removed.)Do you want Crosston?
H: yes
C: Got it. Flying to Crosston on what date?
...

If you don't use a skip list, the application can infuriatingly confirm the same wrong result again and again.

NBest With SROs
SpeakRight Reusable Objects (SROs) are pre-built flow objects for gathering common data such as numbers, dates, etc.

To enable NBest for an SRO, use its enableNBest method. This will use an SROConfirmNBest confirmer object. If you need to use a custom confirmer, call enableNBest followed by setConfirmer to pass in your custom confirmer.


SRONumber flow = new SRONumber("tickets", 1, 10);
flow.enableNBest(4); //up to 4 results

Friday, November 30, 2007

Version 0.1.4 now available

The latest release is available here. Transfer and Record are now supported. There are several new flow object classes, including RawContentFlow (roll your own VoiceXML), and GotoUrlFlow (transfer to another VoiceXML application).

Content-logging is a new feature that's helpful during development -- vxml content is dumped to text files so you can see what the rendered VoiceXML looks like.

Some code refactoring has also been done. Flow object classes are now in the package org.speakright.core.flows.

Enjoy.

Thursday, June 14, 2007

Initialization

SpeakRight apps normally run in three different environments: in a JUnit test, in the interactive tester, and most importantly in a servlet. You can avoid problems by creating a single piece of initialization code that is used across all environments. This pace is called the app factory. It should be derived from SRFactory, which performs standard initialization.

Your class should override onCreateRunner and onInitRunner to do additional initialization, such as:

create and attach a model object
register prompt file(s)
set the extension point factory
other things. For example, the SimpsonsDemo app records votes in a text file, and its Voting object needs to be initialized with the path

Initialization is done using the createRunner method of SRFactory

public SRRunner createRunner(String projectDir, String returnUrl, String baseUrl, ISRServlet servlet);

The projectDir is a path to the application's base directory, which usually has sub-directories audio, grammar, and sro.

The two URLs are only needed in a servlet environment. returnUrl is the URL that the VoiceXML page should postback to. baseUrl is used to generate URLs for audio and grammars.

servlet can be null. It's an extension point that allows the servlet to do extra initialization.

Now let's look at each environment in turn.

JUnit

In a unit test, the dependencies can be visualized like this, from top to bottom:

JUnit test class
App (your callflow)
SRRunner
SRFactory or your derived class
SRConfig

Use your app factory to create a runner

AppFactory factory = new AppFactory();
SRRunner run = factory.createRunner();

Then run your application using the start and proceed methods of SRRunner.

If your app uses properties in the srf.properties file, you need to initialize SRConfig first. JUnit 4 has a per-class initializer called @BeforeClass

@BeforeClass static public void redirectStderr() {
SRConfig.init("C:\\source\\app2\\", "srf.properties");
}

Interactive tester

The interactive tester is a console app. It's The dependencies can be visualized like this, from top to bottom:

App (your callflow)
SRInteractiveTester
SRRunner
SRFactory or your derived class
SRConfig

SRInteractiveTester inits SRConfig for you.

SRInteractiveTester tester = new SRInteractiveTester();

AppFactory factory = new AppFactory();
SRRunner runner = factory.createRunner(appDir, "http://def.com", "", null);

App app = new App();
tester.init(app, run);
tester.run();

Servlet

In a servlet, the dependencies can be visualized like this, from top to bottom:

Servlet
App (your callflow)
SRRunner
SRServletRunner
SRFactory or your derived class
SRConfig

In a servlet the SRServletRunner class is used. You pass your app factory and it does initialization, including SRConfig. The SRRunner's project directory is set to the directory corresponding to the web apps' "/" url.

The code in doGet should be

SRServletRunner runner = new SRServletRunner(new AppFactory(), null, request, response, "GET");

if (runner.isNewSession()) {
SRRunner run = runner.createNewSRRunner(this);

IFlow flow = new App();
runner.startApp(flow);
}
else {
runner.continueApp();
}

The code in doPost should be

SRServletRunner runner = new SRServletRunner(new AppFactory(), null, request, response, "POST");

if (runner.isNewSession()) {
//err!!
runner.log("can't get new session in a POST!!");
}
else {
runner.continueApp();
}

SRConfig

SRConfig provides access to an srf.properties file. Properties are often used by the constructors of flow objects. Therefore it's important to initialize SRConfig early:

SRConfig.init(path, "srf.properties");

For console apps or JUnit, a hard-coded path is used. For servlets, this is done for you by SRServletRunner, which uses the directory corresponding to the web app's "/" base url.

Currently the SpeakRight framework itself does not use any properties, but applications are free to.

Thursday, May 24, 2007

List of Flow Objects

Flow objects are the building blocks of SpeakRight applications. Here is the list of available objects:

BranchFlow Performs branching in the callflow based on an application-defined condition
ChoiceFlow Branches based on user input, such as in a menu
DisconnectFlow Hangs up the call
FlowList a sequence of flow objects, optionally ending with an AppEvent
GotoUrlFlow Redirects to an external URL
LoopFlow Iterates over a sequence of sub-flows
NBestConfirmerFlow confirms NBest results
PromptFlow Plays one or more prompts
QuestionFlow Asks the user a question. Has built-in error retries for silence and nomatch.
RawContentFlow Ouput raw VoiceXML
RecordAudioFlow record the caller's voice to an audio file.
SRApp The root flow object
TransferFlow Transfer the call
YesNoConfirmerFlow used to confirm a single result

Additional flow objects can be created by implementing the IFlow interface.

There are also SROs (SpeakRight Reusable Objects) which you can use.