What is SpeakRight?
SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...
Tuesday, December 4, 2007
The StringTemplate Template Engine
SpeakRight uses StringTemplate (ST) for all its VoiceXML generation. When a flow object is rendered, it is first converted into a SpeechPage object. A SpeechPage is not VoiceXML-specific, and allows SpeakRight to output other formats such as SALT, or whatever you want. It's also the glue that ST requires. SpeechPages are rendered using one of the ISpeechPageWriter classes. For testing, an HTML page writer is available. The main page writer though is VoiceXMLPageWriter.
VoiceXMLPageWriter into VoiceXML. A StringTemplate file defines the format for prompts, grammars, fields, forms and other VoiceXML tags. This gives a lot of flexibility. If your VoiceXML platform has special requirements, simply modify the speakright.stg template file.
Matt Raible on web frameworks
I've been espousing a code-based approach for speech applications for while. Indeed that's the whole premise of the SpeakRight framework. Any substantial app will use dynamically generated code, and not be pages of handwritten markup text.
Things are a bit simpler for speech applications. The following criteria for comparing web frameworks don't apply
- Bookmarkable URLs.
- Avoiding the double-POST problem
- AJAX
- Massive scalability. Web applications may involve millions of users but speech apps are still orders of magnitude smaller.
- Page decoration. The vast topic of graphical design doesn't exist in a speech app. Persona is as close as one gets to "decoration".
Monday, December 3, 2007
NBest
When enabled, NBest is a request to the speech rec engine to return the top N matches, sorted in order of decreasing confidence level. N is usually a small number, such as 4. Remember that the NBest value is a maximum; fewer results may be returned.
In SpeakRight, NBest is enabled using the QuestionFlow method enableNBest.
flow.enableNBest(4); //up to 4 results
When the SRResults come back for that question, you can check for NBest results. The SRResults method hasNBest which indicates that more than one result was returned.
NBest Pruning
The simplest thing an application can do, is check the NBest results in validateInput, and use additional application logic to select the most likely result. This is called NBest pruning. For example, if the user is asked for her account number, each result can be checked against the database. If only one result is a valid account number, the application could assume that's what the caller said.
String probableAccountNumber = "";
int matches = 0;
for (int i = 0; i <>
String value = results.getNBestValue(i);
if (CheckAccountNumber(value)) { //check against the database
probableAccountNumber = value;
matches++;
}
}
if (matches == 1) {
results.replaceInput(probableAccountNumber); //let's use it!
}
NBest Confirmation
A more common use for NBest is to do confirmation. When NBest results are returned, the application confirms each NBest result, stopping as soon as the user says "yes".
C: What city?
H: Boston
C: (returns 'Austin' and 'Boston' as NBest results) Do you want Austin?
H: No
C: Do you want Boston?
H: Yes
C: Great. Flying to Boston on what date?
...
The application may still want to prune the NBest results, re-ordering the results according to the most likely answers. This way the first confirmation question is more likely to be the correct one. This is an important part of NBest -- using additional context information and application logic to improve on the speech rec engine's results.
Pass an NBestConfirmerFlow and the question flow object to a ConfirmationWrapper object that will manage the confirmation process. It will ask the user to confirm values until the caller accepts one (by saying "yes" or whatever your confirmation grammar uses for acceptance). If the caller says "no" to all NBest values, then the question is asked again, and the process repeats. You can override NBestConfirmerFlow to adjust this behaviour.
Note that NBest confirmation is an extension of basic confirmation. A YesNoConfirmerFlow confirms a single result, while a NBestConfirmerFlow confirms multiple results.
ConfirmationWrapper cw = new ConfirmationWrapper(new AskCity(),
new NBestConfirmer("yesno.grxml"));
Skip Lists
A skip list is a list of words that the application will not confirm because the caller has already rejected them. This is an optional feature of NBestConfirmerFlow. Enable it with the enableSkipList method. If the caller says "no" to all NBest values, then the question is asked again. Before beginning confirmation, NBestConfirmerFlow will remove from the new NBest results any values that were rejected during the previous round of confirmation questions. If this results in only a single NBest result, then there is no need for confirmation.
C: What city?
H: Crosston
C: (returns 'Austin' and 'Boston' as NBest results) Do you want Austin?
H: No
C: Do you want Boston?
H: No
C: (asking the question again) Let's try again. What city?
H: Crosston
C: (returns 'Austin' and 'Crosston' and 'Aulston' as NBest results. Austin is removed.)Do you want Crosston?
H: yes
C: Got it. Flying to Crosston on what date?
...
If you don't use a skip list, the application can infuriatingly confirm the same wrong result again and again.
NBest With SROs
SpeakRight Reusable Objects (SROs) are pre-built flow objects for gathering common data such as numbers, dates, etc.
To enable NBest for an SRO, use its enableNBest method. This will use an SROConfirmNBest confirmer object. If you need to use a custom confirmer, call enableNBest followed by setConfirmer to pass in your custom confirmer.
SRONumber flow = new SRONumber("tickets", 1, 10);
flow.enableNBest(4); //up to 4 results
Friday, November 30, 2007
Version 0.1.4 now available
Content-logging is a new feature that's helpful during development -- vxml content is dumped to text files so you can see what the rendered VoiceXML looks like.
Some code refactoring has also been done. Flow object classes are now in the package org.speakright.core.flows.
Enjoy.
Thursday, June 14, 2007
Initialization
Your class should override onCreateRunner and onInitRunner to do additional initialization, such as:
- create and attach a model object
- register prompt file(s)
- set the extension point factory
- other things. For example, the SimpsonsDemo app records votes in a text file, and its Voting object needs to be initialized with the path
public SRRunner createRunner(String projectDir, String returnUrl, String baseUrl, ISRServlet servlet);
The projectDir is a path to the application's base directory, which usually has sub-directories audio, grammar, and sro.
The two URLs are only needed in a servlet environment. returnUrl is the URL that the VoiceXML page should postback to. baseUrl is used to generate URLs for audio and grammars.
servlet can be null. It's an extension point that allows the servlet to do extra initialization.
Now let's look at each environment in turn.
JUnit
In a unit test, the dependencies can be visualized like this, from top to bottom:
JUnit test class
App (your callflow)
SRRunner
SRFactory or your derived class
SRConfig
Use your app factory to create a runner
AppFactory factory = new AppFactory();Then run your application using the start and proceed methods of SRRunner.
SRRunner run = factory.createRunner();
If your app uses properties in the srf.properties file, you need to initialize SRConfig first. JUnit 4 has a per-class initializer called @BeforeClass
@BeforeClass static public void redirectStderr() {
SRConfig.init("C:\\source\\app2\\", "srf.properties");
}
Interactive tester
The interactive tester is a console app. It's The dependencies can be visualized like this, from top to bottom:
App (your callflow)
SRInteractiveTester
SRRunner
SRFactory or your derived class
SRConfig
SRInteractiveTester inits SRConfig for you.
SRInteractiveTester tester = new SRInteractiveTester();
AppFactory factory = new AppFactory();
SRRunner runner = factory.createRunner(appDir, "http://def.com", "", null);
App app = new App();
tester.init(app, run);
tester.run();
Servlet
In a servlet, the dependencies can be visualized like this, from top to bottom:
Servlet
App (your callflow)
SRRunner
SRServletRunner
SRFactory or your derived class
SRConfig
In a servlet the SRServletRunner class is used. You pass your app factory and it does initialization, including SRConfig. The SRRunner's project directory is set to the directory corresponding to the web apps' "/" url.
The code in doGet should be
SRServletRunner runner = new SRServletRunner(new AppFactory(), null, request, response, "GET");
if (runner.isNewSession()) {
SRRunner run = runner.createNewSRRunner(this);
IFlow flow = new App();
runner.startApp(flow);
}
else {
runner.continueApp();
}
The code in doPost should be
SRServletRunner runner = new SRServletRunner(new AppFactory(), null, request, response, "POST");
if (runner.isNewSession()) {
//err!!
runner.log("can't get new session in a POST!!");
}
else {
runner.continueApp();
}
SRConfig
SRConfig provides access to an srf.properties file. Properties are often used by the constructors of flow objects. Therefore it's important to initialize SRConfig early:
SRConfig.init(path, "srf.properties");
For console apps or JUnit, a hard-coded path is used. For servlets, this is done for you by SRServletRunner, which uses the directory corresponding to the web app's "/" base url.
Currently the SpeakRight framework itself does not use any properties, but applications are free to.
Thursday, May 24, 2007
List of Flow Objects
- BranchFlow Performs branching in the callflow based on an application-defined condition
- ChoiceFlow Branches based on user input, such as in a menu
- DisconnectFlow Hangs up the call
- FlowList a sequence of flow objects, optionally ending with an AppEvent
- GotoUrlFlow Redirects to an external URL
- LoopFlow Iterates over a sequence of sub-flows
- NBestConfirmerFlow confirms NBest results
- PromptFlow Plays one or more prompts
- QuestionFlow Asks the user a question. Has built-in error retries for silence and nomatch.
- RawContentFlow Ouput raw VoiceXML
- RecordAudioFlow record the caller's voice to an audio file.
- SRApp The root flow object
- TransferFlow Transfer the call
- YesNoConfirmerFlow used to confirm a single result
There are also SROs (SpeakRight Reusable Objects) which you can use.
Call Control
SpeakRight catches the disconnect event (usually connection.disconnect) in order to do a final postback. This results in the DISCONNECT event being thrown and onDisconnect being invoked. SRApp provides a default handler for this.
There are a number of call control flow objects.
DisconnectFlow
This flow object plays a final prompt and hangs up.
SROTransferCall
This flow object transfers the call using one of the VoiceXML 2.1 transfer types: Blind, Bridged, or Consultation. The specified destination parameter is a string, such as
"tel:+2560001112222"The format of the dial string is often platform specific.
TransferFlow has two prompts. An initial prompt called main is played before the transfer is initiated. A transferFailed prompt is played if the transfer fails to complete. Both these prompts can be overriden in an app-specific prompt XML file.
If the transfer fails because, for example, the destination is busy, then onTransferFailed is invoked. It plays the transferFailed prompt. Or you can override onTransferFailed and provide your own behaviour.
TransferFlow
This flow object is a low-level object. We recommend the use of SROTransferCall instead.
RawContentFlow
This flow object is an escape hatch. The application can supply any VoiceXML it likes. RawContentFlow may be useful for invoking platform-specific call control features.
Thursday, May 17, 2007
Optional Sub-Flow Objects
Applications can simply create or not create certain flow objects based on the required logic. However, an alternative solution is provided called optional sub-flows. The BasicFlow and LoopFlow classes have been enhanced. The flow objects they contain (called sub-flows) can indicate that they don't wish to run by returning false from shouldExecute. When this occurs, the sub-flow is skipped and the next sub-flow is executed.
The advantage of optional sub-flows is that the decision on whether to run or not can be deferred until a sub-flow is executed. The initialization code doesn't need to handle this.
However there are a restrictions: the final sub-flow cannot be optional. Because if the final sub-flow returns null from its getFirst, there's nothing to execute.
Example code that creates the callflow using a BasicFlow object:
//callflow creation..
BasicFlow flow = new BasicFlow();
flow.add(new PromptFlow("Welcome"));
flow.add(new RegisterUser());
flow.add(new MainMenu());
app.add(flow);
And the class definition for the optional sub-flow looks like this:
class RegisterUser extends BaseSROQuestion {Note. I'm not completely happy with this feature. It's handy but the cant-be-last restriction will be easy to forget and will only fail at runtime. Full test coverage is required!
public Model M;
@Override public boolean shouldExecute() {
return M.userHasRegistered;
}
//..rest of class definition omitted..
Wednesday, May 16, 2007
Supported Platforms
- Voxeo Evolution (free hosting site http://community.voxeo.com/). VoiceCenter 5.5
- Voxeo Prophecy platform version 8.0 beta
VoiceXML Tags Supported
- assign name expr used to track dialog state
- audio src play audio file
- block
- break time time is in msec
- catch connection.disconnect So app gets a final postback
- disconnect
- exit
- field name
- filled
- form one per page
- goto next To goto an external URL
- grammar type src type can be "text/gsl", "application/srgs+xml", or"application/srgs".
- noinput count bargein main prompt(s), one per escalation
- nomatch count bargein main prompt(s), one per escalation
- prompt count bargein main prompt(s), one per escalation
- submit next namelist method
- transfer type dest connecttimeout
- var name expr used to track dialog state
You can customize the VoiceXML; see the StringTemplate template engine.
Thursday, May 10, 2007
Version 0.0.3 Released
Thursday, May 3, 2007
Automated Testing
SpeakRight provides an automated tester SRAutoTester. It runs your callflow in a test harness where user input comes from strings that you provide. It checks the progress through the callflow to validate the application logic. Tests can be run directly on a developer's machine, no VoiceXML platform is needed.
SRAutoTester uses the same test harness as SRInteractiveTester, so you can test manually (using the keyboard) and auto-test, and vice versa.
The format of the string you give is a set of commands separated by semi-colons, where each command's format is: cmd[~ExpectedCurrentFlowObject]
The commands are
- "e" echo. toggle echo to log of VoiceXML on/off
- "g" go. simulate the current VoiceXML page ending and posting back. Causes SpeakRight to proceed to the next flow object. Can contain user input, such as "go chicago" or if a slot is set use ":" like this "go city:chicago"
- "q" quit
The commands for testing a good login are: g;g 4552;g 1234;g~MainMenu;q
Let's break that down:
- "g" means run the first flow object, which is the welcome prompt
- "g 4552" is the user id
- "g 1234" is the password
- "g~MainMenu" validates that we're at the MainMenu flow object
Note that these tests are high-level tests of the callflow. They allow requirements such as "A bad login does not proceed to the Main Menu". Testing the details of the VUI prompts and grammars need to be done as well; either with another SpeakRight tester (TBD) or on the VoiceXML platform itself.
Wednesday, May 2, 2007
Benefits of a Code-Based Approach
Drag-and-drop toolkits remain viable for DTMF apps because the apps are simple. Speech applications are much more complicated that the equivalent DTMF app. There are roughly nine times as many prompts (escalated versions of the main prompt and silence and nomatch prompts). Confirmation needs to be done since speech recognition is never 100% accurate. Lastly, speech apps are more complicated because, released from the limitations of 12 DTMF keys, they try to do more. This complexity means that speech apps need a more powerful development environment, such as the Java programming language.
The first wave of speech applications were written directly in VoiceXML. Again this is simple for small apps but doesn't scale. A large app has many voicexml files, and the relationship between them is not clearly shown. A login.vxml file may submit its results to main_menu.vxml, but that is not apparent in looking at a list of files. Raw VoiceXML does not have any modern programming constructs such as inheritance or design patterns. Lastly, unit testing and debugging are difficult.
This brings us to the final option: a code-based approach. Write the application in Java.
IDEs are powerful
Use the full power of a good IDE with refactoring support, code assist (AKA Intellisense), unit testing, debugging, and integrated source control. The Eclipse IDE, for example, is used by millions of programmers. It will be improved and extended at a far faster rate than any proprietary toolkit. And Eclipse is free.
Better Debugging
Java IDEs have real debuggers. Enough said.
Better Testing
Java IDEs have excellent unit testing. SpeakRight provides a keyboard-based interactive tester, and an HTML mode for executing an app using an ordinary web browser (HTML is generated instead of VXML).
More tools
There are source code tools for code coverage, profiling, generating documentation and design diagrams. Source code control tools allow important questions such as 'what's changed since last week' to be answered.
Code is flexible
Source code is extremely flexible. Unlike drag-and-drop tools that offer only a few levels of granularity, code can be organized and combined in many ways. Let's look at the ways code can be used.
See also Matt Raible On Web Frameworks
Configuration
An object can be configured by settings its properties. This allows re-usable objects to be customized for each use. The customization can be done in code
flow.m_maxAttempts = 4;
or it can come from external configuration files. SpeakRight allows prompts and grammars to reside in XML files that can be changed post-deployment without having to rebuild the app.
Sub-classing
The DRY principle is Don't Repeat Yourself. DRY reduces the amount of source code and makes modifications simpler. Java inheritance is one way of centralizing common code. Any class deriving from a base class gets the base classes' behaviour automatically. In our example concerning GetDigit nodes, the MaxRetries value can be defined a single place (in the base class). Changing the value in the base class causes the change to ripple down to all derived classes.
Java inheritance is flexible because values or behaviour can be overridden at any point in the class heirarchy.
Composition
Composition is the process of assembling together objects into useful components. Modern frameworks make much use of interfaces and extension points. An extension point allows behaviour to be changed by plugging in different implementations of it. In SpeakRight, confirmation is an extension point where different types of confirmation can be plugged-in: yes-no confirmation, confirm-and-correct confirmation, and implicit confimation.
Extension points increase re-use because the number of options multiply. If you have four types of GetNumber objects and three types of confirmation, you have twelve types of GetNumber-And-Confirm behaviour to choose from.
A menu is another example of an extension point. A menu is basically a question followed by a switch statement. Both are extension points that allow flexible menus to be created that still share the common base code.
Refactoring
Refactoring is the process of improving code quality without changing the external behaviour. Common code can be pulled into methods or classes. Interfaces and extension points can be added to increase the flexibility of a class. Code can be packaged into namespaces and libraries. Inherited behaviour can be overridden.
A code-based approach allows all these modern software development techniques to be applied to speech apps.
Changing The Framework
SpeakRight is open-source so everything is available to you for modification.
Everything is a Flow Object
In SpeakRight, the callflow is built out of flow objects. Everything from a low-level "yes-no" question, to a form with multiple fields, up to the app itself are flow objects.
Flow objects participate in generating content (VoiceXML). A flow object is notified of each prompt being rendered, and allowed to modify it. Flow objects can control which VoiceXML is generated, and if needed, the entire VoiceXML rendering can be replaced (it's another extension point).
Flow objects participate in deciding the execution path through the callflow. Because they return an IFlow object to be run next, it's easy to inject additional behaviour when needed. Confirmation is done this way.
Consider a VUI dialog for traversing a list of items. The common behaviour is the commands "next" and "previous" (and possibly "first" and "last"). These move to a new item and say its value, or play an error message if the end of the list has been reached. List traversal is a common VUI feature, but difficult to make into a re-usable artifact in a non-code-based approach. With code however, this is a standard sort of OO design task.
- Prompts and grammars are made into fields. Default values are provided but can be overridden or configured using getter and setter methods.
- The list is a generic Java Collection, allowing it to be a list of anything. An IItemFormatter interface is created so the rendering of a Java object (string, integer, XML, whatever) into a prompt becomes an extension point. The default formatter just uses toString.
- SpeakRight's flow objects are pause-able. This means that a list traverser can pause while another VUI dialog runs, and resume when it finishes. A list traverser can now be a main menu for an app that works on a list of items (such as flights to select from). Additional commands can be added, so that in addition to the traversal commands, the menu can accept additional commands such as "details", "accept", and "search". All of this is built on top of the existing list traversal class; no code duplication is required.
Less Code, Less Testing
A code-based approach, by promoting re-use and the DRY principle, reduces the size of the application. This has many benefits:
- Faster development. Less time spent in tedious repetitive work.
- Easier to change. A class hierarchy is like the paragraph styles in a word processor. Rather than sprinkling formatting all over the document, it's kept in a few styles (base classes) where it's easily managed and changed.
- Consistent Voice User Interface. Shared code leads to shared behaviour, which leads to a consistent user interface.
- Reduced testing. This is a huge gain. When common code is used, it only needs to be tested once, even though its used multiple times in the app. For example, yes-no confirmation can be plugged in to the confirmation extension point of a flow object. Once you've validated that it works in one flow object, there's no need to test all other flow objects since they share the same (base class) code.
Flexible Development
A code-based approach lacks the artificial boundaries of drag-and-drop toolkits. Development can begin by using existing classes and configuring them as needed. When you find the same VUI dialogs appearing in multiple places, sub-classing can be used to centralize a common configuration, such as a MyGetPassword class. As more new classes are created they can be combined into a class heirarchy in order to share common code, with extension points added where variability is needed. When classes are re-used in other projects they can be packaged as a library.
Prompt Ids and Prompt XML Files
A prompt id is a ptext item beginning with "id:", such as "id:outOfRange". When the prompt is rendered, a set of XML files are searched to find an entry for that id. The entry looks like this:
<prompt name="outOfRange" def="true">That value is out of range. </prompt>The prompt text for the id is a full PText; it can contain references to other ids, for example. It's an error if a prompt id cannot be found in any of the XML files.
Which set of XML files? Well you get to define them using SRRunner.registerPromptFile, usually one per app. The framework itself may register some; each SRO has its own prompt XML. The registration may be permanent (for the life of the app), or temporary (for the current flow object execution). The list of XML files is searched in reverse order so that your XML files are searched first, and framework XML files searched last.
Prompt Groups
Another useful feature is the ability to build an app using rough prompts, and then finalize the prompts later without having to do any code changes. Prompt Groups does this. Each flow object has a prompt group. The default value is the flow object's name. Prompt ids are looked up twice. First the prefix is added, so for an id "id:outOfRange" in a flow object "MyMenu" the first lookup is "id:MyMenu.outOfRange". If this prompt id is found, the value in the XML file is used. If not, then a second lookup without the prefix is done, which for our example would be "id:outOfRange".
All SROs use prompt ids with default values (see Prompts in SROs). The default prompts are usually good enough to get your app logic up and tested. Then you can create an app-specific prompt XML file and register it (using SRRunner.registerPromptFile). Now you can define the prompt text for all the flow objects at your leisure. No code changes needed!
Thursday, April 26, 2007
Simpsons Demo
Development Process
Productivity of developers is important. When code is quick to create, it's quick to change (and test!). This encourages an agile development process.
A SpeakRight application is developed in stages.
Stage 1. The application flow objects are created and wired together. Concentrate on the Model and the overall callflow. Don't worry about grammars and prompts at this point. Use inline grammars and default prompts. Do all testing using the keyboard-based interactive tester (and unit tests). The goal at this stage is to get the callflow logic working. External data access can be done at this stage, or mocked out, and done later in parallel with other stages.
Stage 2. Define the grammars and prompts. Use external grammars that take into consideration pre-amble, post-amble, and various ways of saying things ("LA" and "Los Angeles"). An application prompt XML file should be created that defines the main and error prompts for your flow objects. Deploy the app as a servlet and test using the HTML mode, where the app can be executed using a web browser. This will flush out missing files and other errors.
Stage 3. Deploy the app to the VoiceXML platform. At this point the callflow logic should be already complete and well tested, and the prompts and grammars defined. All that remains is to listen for mis-pronounciations, poor prosody, and other VUI-level errors.
The idea is to get as much testing done before deploying to the VoiceXML platform, where testing becomes much slower and more difficult (especially automated testing). Of course, it's not a fixed waterfall approach; you may, for example, need to prototype some VUI design issues on the VoiceXML platform before tackling stage 1.
Description of the Call Flow
When a user calls in they choose a character by saying the character's name, such as "Mister Burns". The application plays a short description of the character and asks if the user wishes to vote for this character. If yes, the vote is recorded and the user is taken to the main menu. Otherwise the user is asked if they want to hear about the related character. If the user says yes, the related character's description is played and the user has the opportunity to vote for the
character as before. If the user chooses not to hear about the related character then they are asked to choose another character.
The main menu has four options.
- choose character select a Simpsons character and vote. described above.
- voting results. Hear the voting results. Results are played in sets of three. The top three characters are listed. If the user says 'next', then the next three characters are listed.
- call statistics. Lists # of calls, average call duration, and other statistics.
- speakright. Plays a description of the SpeakRight framework.
(For brevity we've left off examples of error handling for silence and no-input errors)
Computer: Welcome to the Simpsons Demo where you can vote for your favorite Simpson's character. {pause} Please say the name of a character, such as "Mister Burns".
Human: Homer
C: Homer Simpson is the show's main character. His signature annoyed grunt "D'oh!" has been included in the Oxford English Dictionary. Do you want to vote for Homer?
H: No
C: Do you want to hear about Marge, Homer's wife?
H: Yes
C: Marge Simpson is the well-meaning and patient wife of Homer. Do you want to vote for Marge?
H: Yes.
C: Vote recorded. {pause} Main Menu. You can say 'choose character', 'voting results', 'call statistics', or 'hear about speakright'.
H: call statistics
C: There have been 413 calls with average length of 65 seconds. The average completion is...
H: Hangup
Pseudo-Code for the Call Flow
Let's write the call flow logic as a series of actions with some basic pseudo-code to represent branching and looping. Labels are marked in bold.
Welcome
A: ChooseCharacter
B: SayCharacterInfo
VoteForCharacterYesNo
if yes then goto MainMenu
HearAboutRelatedCharacterYesNo
if yes then goto B else goto A
MainMenu: MainMenu
if 'character' then goto A
else if 'results' then SayVotingResults
else if 'statistics' then SayCallStatistics
else if 'speakright' then SaySpeakRightInfo
Writing the Call Flow in Java
In SpeakRight, the pseudo-code for a callflow can be converted into Java code in a fairly simple way. Each action becomes a flow object, and is represented by a class derived from one of the SpeakRight base classes, such as a PromptFlow that plays some audio output. A series of flow objects are executed in sequence.
Where branching is required, the getNext method of a flow object can be overridden to add the branching logic. getNext returns either a flow object to be executed, or an event object which causes execution to jump to a previous point. Event objects and event handlers act like "throw" and "catch" respectively.
The application's data is stored in the model, a special class generated by the SpeakRight tool MGen. The model can be used to hold user input (such as the currently selected Simpson's character) and retrieved data (eg. from a database). It can also hold control data that is used to control the execution path.
Let's get started. The outermost flow object represents the entire callflow, and is always derived from SRApp.
public static class SimpsonsDemo extends SRApp
{
public SimpsonsDemo()
{
addPromptFlow("Welcome");
add(new MainLoop());
}
}
The constructor adds two sub-flows: a welcome prompt and a loop flow object. The welcome is only played once and the remainder of the callflow is done in a loop since the user can go back and forth from selecting a character to the main menu as many times as he or she likes.
The MainLoop class defines a model variable M. By convention, SpeakRight will inject a value for M at runtime automatically.
public static class MainLoop extends BranchFlow
{
public Model M;
The first method called in a flow object is its onBegin method. Here we initialize the two main model values. nextAction is used to control what MainLoop does next, and currentCharacterId is the currently chosen Simpsons' character.
@Override
public void onBegin() {
M.nextAction().set("A");
M.CurrentCharacterId().clear();
}
The next method to be called is getFirst. If nextAction is set to choose a character then we build a sequence of sub-flows for selecting and voting for a character. Otherwise we return the main menu flow object.
@Override
public IFlow branch() {
if (M.nextAction().get() == "A") {
BasicFlow flow = new BasicFlow();
if (M.getCurrentCharacterId() == 0) { //no char selected?
flow.add(new AskCharacter());
}
flow.add(new new PromptFlow("{$M.CurrentCharacterId}"));
flow.add(new VoteYesNo());
flow.add(new RelatedCharacterYesNo());
return flow;
}
else {
MainMenu menu = new MainMenu();
return menu;
}
}
AskCharacter is a flow object that asks the user to enter a character name.
public static class AskCharacter extends BaseSROQuestion {
public AskCharacter() {
super("character");
m_main1Prompt = "Say the name of a Simpson's character";
m_slotName = "x";
m_modelVar = "currentCharacterId";
}
}
The voting flow object is a yes/no question. The VoteYesNo class asks the question and then it's getNext method handles the result. If 'yes' is input then record the vote and jump to the main menu. The GotoMainLoopEvent event object is used to do this.
public static class VoteYesNo extends SROYesNo {
public VoteYesNo()
{
m_main1Prompt = "Do you want to vote for {$M.CharacterName}";
}
@Override
public IFlow onYes() {
return new MainLoop.GotoEvent(MainLoop.BRANCH_MAIN_MENU);
}
}
The GotoMainLoopEvent is caught by MainLoop in its onCatchGotoBranchEventmethod. Here we set the branching condition M.nextAction.
MainLoop is a BranchFlow with it's loop-forever set. MainLoop will be executed again, and depending on nextAction do either the main menu or choose-a-character.protected void onCatchGotoBranchEvent(GotoBranchEvent ev)
{
log("branch. action: " + ev.m_action);
M.nextAction().set(ev.m_action);
}
SimpsonsDemo has an interactive tester (for keyboard testing), and an auto-tester (see Automated Testing).
Thursday, April 19, 2007
SpeakRight Reusable Objects (SROs)
SpeakRight provides a set of reusable speech objects called SROs. They are configurable and extensible. Here is a list of the ways SROs can be "tweaked":
- SROs have a full set of prompts, with main, silence, no-reco, and help prompts. Up to four escalations of each can be defined.
- any or all prompts can be replaced. Each SRO has a subject, a word such as "flights", which is used to build prompts. Changing the subject word is the simplest way to adjust the prompts. There is an extension point for handling the plurality of subject words ("flight", "flights"). Or the entire prompt can be replaced.
- prompts can be conditional, such as a prompt that only plays the first time an SRO is executed.
- prompts can be defined at compile time, at runtime in code, or in external XML files.
- grammars are replaceable. Inline grammars or grammar files can be used. The only restriction on a grammar is that it uses the slot names required by the SRO.
- validation code can be added. This server-side code inspects use input and either accepts it or causes the SRO to re-execute.
- Model binding. An SRO has a model variable name. When set, the user input results are bound to the model (i.e. stored in the model for later use by the app)
- Command phrases can be added to an SRO. A common data entry pattern is the enter-data-or-say-cancel pattern. SROs have a list of command phrases that you can add to.
- confirmation can be added. An SRO has a confirmation plug-in that can be used to add various forms of confirmation (explicit, implicit, or confirm-and-correct).
The current list of SROs is:
- SROCancelCommand
- SROChoice
- SROConfirmYesNo
- SRODigitString
- SROListNavigator
- SRONumber
- SROOrdinalItem
- SROTransferCall
- SROYesNo
Thursday, March 29, 2007
Prompts in SROs
SROGen also generates an XML file holding the default prompts. This file is deployed and is read at runtime to load the SRO prompt fields. You can modify this file on a production system to modify the default prompts for an SRO; no re-compile (or restart) is necessary.
Here's what a prompt definition in the XML file looks like:
<prompt name="main1" def="no">What {%subject%} would you like?</prompt>
The prompt id main1, has a corresponding field in the SRO class called m_main1Prompt. A derived class, or a SR app, can modify the field using get/set methods.
This works fairly well with the PText feature {%fieldName%} for extracting values from a field.
Sub-Prompts
The one thing missing so far is the ability to define conditional prompts, such as a play-once prompt. You can do it in code of course, but that's not very flexible
if (executionCount() == 1) {
m_main1Prompt = "Let's get started. " + m_main1Prompt;
}
To remedy this, SROs support multiple sub-prompts to be defined for a single prompt, such as the MAIN prompt.
<prompt name="main1welc" group="MAIN" cond="once_ever">Let's get started.</prompt>
<prompt name="main1" group="MAIN" def="no">What {%subject%} would you like?</prompt>
Each prompt tag results in a field being created. Multiple prompts in the same group are rendered as a single VoiceXML prompt, in the order they occur in the XML file. The neat thing is that conditions can now be applied to individual sub-prompts. The cond attribute defines a condition. Here, "once_ever" means a play-once-ever condition. The first time the SRO is executed, the prompt will be: "Let's get started. What flight would you like?". On subsequent executions, the prompt is "What flight would you like?".
Implementation note: Sub-prompts are implemented using the m_subIndex field of Prompt. When prompts are rendered (in a PromptSet), all the rendered items are gathered together in the first sub-prompt. But since each sub-prompt is an independent Prompt object, its rendering can be enabled or disabled by its condition.
Tuesday, March 6, 2007
Release 0.0.2 is out
Here's a simple app to ask for the number of tickets in a travel application
SRApp flow = new SRApp();From this, the following dialog is possible
SRONumber sro = new SRONumber("tickets", 1, 10);
sro.setConfirmer(new SROConfirmYesNo("tickets"));
flow.add(sro);
flow.addPrompt("You said {$INPUT}");
Computer: How many tickets would you like?
Human:
Computer: I didn't hear you. How many tickets would you like?
Human:
Computer: I still didn't hear you. How many tickets would you like?
Human: twelve
Computer: Sorry, I'm looking for a number between one and ten. How many tickets would you like?
Human: two
Computer: Do you want two?
Human: yes
Computer: You said two.
As you can see, escalating error prompts are given and user input is validated against the range given to the SRONumber flow object. Utterances below 80% confidence are confirmed, and finally the user's input is played back to them.
This release uses extremely simple grammar in SRONumber (a built-in grammar for one to nine). You're free to replace it with your own, and of course the next release will improve on this.
Enjoy :)
Saturday, March 3, 2007
Testing
Unit Tests
Both JUnit and XMLUnit are supported. Here is a basic test that creates an app and runs it, feeding the required user input. Notice the use of TrailWrapper, a flow object that tracks execution in a trail of flow object names. The test shown below is in org.speakright.sro.tests and uses the MockSpeechPageWriter. This mock object remembers the rendered SpeechPage object which can then be checked to see if the expected grammars and prompts are there.
@Test public void testConfirmNo()
{
log("--------testConfirmNo--------");
SRApp flow = createApp(true);
TrailWrapper wrap1 = new TrailWrapper(flow);
SRInstance run = StartIt(wrap1);
Proceed(run, "2", "num", 40); //question with low confidence
Proceed(run, "no"); //reject the confirmation
Proceed(run, "8", "num"); //question again
Proceed(run, ""); //you said...
assertEquals("fail", false, run.isFailed());
assertEquals("fin", true, run.isFinished());
assertEquals("start", true, run.isStarted());
ChkTrail(run, "SROQuantity;ConfYNFlow;SROQuantity;PFlow");
assertEquals("city", "8", M.city().get());
}
XMLUnit can also be used. See the TestRender.java file in org.speakright.core.tests. XML comparison is more fussy (and slower), but you can check the actual VoiceXML.
ISRInteractiveTester
Next up is the ISRInteractiveTester class. Use it in a console-app for executing a SpeakRight app interactively from the keyboard. See InteractiveTester.java in org.speakright.core.tests for an example. Run this file as a Java application.
Here are the commands it uses:
SpeakRight ITester..The out command causes each rendered VoiceXML page to be output as a file (page1.vxml, page2.vxml, etc).
RUNNING App2.........
1> ???
available cmds:
q -- quit
go -- run or proceed
bye -- simulate a DISCONNECT
echo -- toggle echo of generated content
version -- display version
status -- show interpreter status
out -- turn on file output of each page in tmpfiles dir
ret -- set return url
gramurl -- gram base dir
prompturl -- prompt base dir
html -- switch to HTML output
vxml -- switch to VXML output
Running Servlet in HTML mode
Once you're satisfied with your app, it's time to test it inside a real servlet. Write your servlet; the one that will output VoiceXML. However when you run it, add the CGI param "mode=html", like this:
http:localhost:8080/MyServlet3/App1?mode=html
MyServlet3 is the name of your dynamic web project, and App1 is the servlet inside it.
SpeakRight will render into HTML inside VoiceXML. A sample page is shown on the left. Pressing the Next or Submit button simulates the VoiceXML platform returning results.
Running Servlet in VoiceXML mode
OK, time for the real thing. Point your VoiceXML platform at your servlet's URL, like this:
http://www.someplace.com/MyServlet3/App1
Some platforms (such as Voxeo's) have a real-time debugger that shows events and log messages as they occur.
You can also use the log4j log file that SpeakRight writes. Here's a sample:
03-03 11:32:12.015 [or24] INFO srf - SR: startAppAutomated Testing
03-03 11:32:12.015 [or24] INFO srf - START: MyApp
03-03 11:32:12.015 [or24] INFO srf - push MyApp
03-03 11:32:12.015 [or24] INFO srf - push PFlow
03-03 11:32:12.015 [or24] INFO srf - EXEC PFlow
03-03 11:32:12.031 [or24] DEBUG srf - prompt (1 items): Welcome to Joe's Pizza
03-03 11:32:12.359 [or24] INFO srf - SR: writing content.
03-03 11:32:12.375 [or24] INFO srf - SR: saving state.
03-03 11:32:12.375 [or24] INFO srf - SR: done.
03-03 11:32:14.109 [or25] INFO srf - SR: doing POST
See Automated Testing.
Friday, March 2, 2007
Servlets
SpeakRight provides a class, SRServletRunner, that does most of the work. Here's how to use it in the doGet method of a servlet
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
SRServletRunner runner = new SRServletRunner(new AppFactory(), this, request, response, "GET");
if (runner.isNewSession()) {
SRRunner run = runner.createNewSRRunner(this);
IFlow flow = new App();
runner.startApp(flow);
} else {
runner.logger().log("contine in GET!!");
runner.continueApp();
}
}
First we pass the request and response objects into SRServletRunner, along with a string (used for logging) and our app factory (see Initialization). Then we check if this is a new session. If it is then we create our SpeakRight application and call startApp; otherwise we call continueApp. SRServletRunner manages passifying and re-activating the SpeakRight runtime between HTTP requests.
Logging is done using log4j.
Monday, February 26, 2007
First Release!
This is my first open-source project. It's been an intense learning curve, but getting this far is due to some fine OSS software: Eclipse, Subversion, SourceForge, StringTemplate, Junit, XMLUnit, Skype, and the Voxeo community.
Saturday, February 24, 2007
Control Flow, Errors, and Event Handling
Execution of a sequence of flow objects is done by a flow object having a list of sub-flow objects. Every time its getNext method is called it returns the next object from the list.
Conditional flow is done by adding logic in getNext, to return one flow object or another based on some condition.
Looping is done by having getNext return its sub-flows more than once, iterating over them multiple times.
Error Handling
Callflows can have errors, such as user errors (failing to say anything recognizable) and application errors (missing files, db error, etc). SpeakRight manages error handling separately from the getFirst/getNext mechanism. getNext handles the error-free case. If an error occurs then of the IFlow error handling methods is called.
IFlow OnNoInput(current, results); //app was expecting input and none was provided by the user
IFlow OnException(current, results); //a generic failure such as exception being thrown.
IFlow OnDisconnect(current, results); //user terminated the interaction (usually by hanging up the phone. how in multimodal?)
IFlow OnHalt(current, results); //system is stopping
IFlow OnValidateFailed(current, results);
Note that a number of things that aren't really errors are handled this way. The goal is to keep the "nexting" logic clean, and handle everything else separately.
Errors are handled in a similar manner as exceptions; a search up the flow stack is done for an error handler. If the current flow doesn't have one, then its parent is tried. It's an runtime error if an error handler is not found.
The outermost flow is usually a class derived from SRApp. SRApp provides error handlers with default behaviour. They play a prompt indicating that a problem has occurred, and transfers the call to an operator.
Catch and Throw
The basic flow of control in a SpeakRight app is nesting of flow objects. These behave like subroutine calls; when the nested flow finishes, the parent flow resumes execution. Sometimes a non-local transfer of control is needed. SpeakRight supports a generic throw and catch approach. A flow can throw a custom flow event, which MUST be caught by a flow above it in the flow stack.
return new ThrowFlowEvent("abc");
and the catch looks like any other error handler
IFlow OnCatch(current, results, thrownEvent);
Note: like all other handlers a flow event can catch its own throw. May seem weird but this lets developers move code around easily.
Some control flow is possible in execute. If you want a flow to branch if a db error happens, then do this. However, in this case a flow object can not catch its own flow event, since that would cause Execute to be called again, and infinite recursion...
Update: See also Optional Sub-Flow Objects
GotoUrlFlow
The GotoUrlFlow flow object is used to redirect the VoiceXML browser to an external URL. It is used to redirect to static VoiceXML pages or to another application.
Internal Architecture
- Application code
- Web server
- VoiceXML browser
- VoiceXML platform
- Telephony hardware, VOIP stack
SpeakRight lives in application code layer, typically in a servlet. The SpeakRight runtime dynamically generates VoiceXML pages, one per HTTP request. Between requests, the runtime is stateless, in the same sense of a "stateless bean". State is saved in the servlet session, and restored on each HTTP request.
The SpeakRight framework is a set of Java classes specifically designed for writing speech rec applications. Although VoiceXML uses a similar web architecture as HTML, the needs of a speech app are very different (see Why Speech is Hard TBD).
SpeakRight has a Model-View-Controller architecture (MVC) similar to GUI frameworks. In GUIs, a control represents the view and controller. Controls can be combined using nesting to produce larger GUI elements. In SpeakRight, a flow object represents the view and controller. Flow objects can be combined using nesting to produce larger GUI elements. Flow objects can be customized by setting their properties (getter/setter methods), and extended through inheritance and extension points. For instance, the confirmation strategy used by a flow object is represented by another flow object. Various types of confirmation can be plugged-in.
Flow objects contain sub-flow objects. The application is simply the top-level flow object.
Flow objects implement the IFlow interface. The basics of this interface are
IFlow getFirst();getFirst returns the first flow object to be run. A flow object with sub-flows would return its first sub-flow object. A leaf object (one with no sub-flows) returns itself. (See also Optional Sub-Flow Objects)
IFlow getNext(IFlow current, SRResults results);
void execute(ExecutionContext context);
getNext returns the next flow object to be run. It is passed the results of the previous flow object to help it decide. The results contain user input and other events sent by the VoiceXML platform.
In the execute method, the flow object renders itself into a VoiceXML page. (see also StringTemplate template engine).
Execution uses a flow stack. An application starts by pushing the application flow object (the outer-most flow object) onto the stack. Pushing a flow object is known as activation. If the application object's getFirst returns a sub-flow then the sub-flow is pushed onto the stack. This process continues until a leaf object is encountered. At this point all the flow objects on the stack are considered "active". Now the runtime executes the top-most stack object, calling its execute method. The rendered content (a VoiceXML page) is sent to the VoiceXML platform.
When the results of the VoiceXML page are returned, the runtime gives them to the top-most flow object in the stack, by calling its getNext method. This method can do one of three things:
- return null to indicate it has finished. A finished flow object is popped off the stack and the next flow-object is executed.
- return itself to indicate it wants to execute again.
- return a sub-flow, which is activated (pushed onto the stack).\
Table of Contents
- What is SpeakRight?
- The Benefits of a Code-Based Approach
- Features
- FAQ
- Tutorial, Simpsons Demo
- SourceForge, Download
SpeakRight documentation
- Getting Started
- Tutorial and Simpsons Demo
- Javadocs
- Architecture
- Internal Architecture
- Servlets
- Extension Points
- Performance
- StringTemplate template engine
- Initialization
- Flow Objects
- Prompts, Grammars
- Prompts in SROs
- DTMFOnlyMode
- Internationalization
- NBest
- Call Control
- Control Flow, Errors and Event Handling
- HotWords
- SpeakRight Reusable Objects (SROs)
- VoiceXML
- Testing
- Project Plan
- Wish List
- Contributors
- Powered By
Sunday, February 18, 2007
Grammars
SpeakRight supports three types of grammars: external grammars (referenced by URL), built-in grammars, and inline grammars (which use a simplified GSL format). Grammars work much like prompts. You specify a grammar text, known as a gtext, that uses a simple formatting language:
- (no prefix). The grammar text is a URL. It can be an absolute or relative URL.
- inline: prefix. An inline grammar. The prefix is followed by a simplified version of GSL, such as "small medium [large big] (very large)".
- builtin: prefix. One of VoiceXML's built-in grammars. The prefix is followed by something like "digits?minlength=3;maxlength=9"
When a flow object is rendered, the grammars are rendered using a pipeline, that applies the following logic:
- check the grammar condition. If false then skip the grammar.
- parse an inline grammar into its word list
- parse the builtin grammar
- convert relative URLs into absolute URLs
The grammar text is a URL. It can be an absolute URL (eg. http://www.somecompany.com/speechapp7/grammars/fruits.grxml), or a relative URL. Relative URLs (eg. "grammars/fruits.grxml") are converted into absolute URLs when the grammar is rendered. The servlet's URL is currently used for this.
The grammar file extension is used to determine the type value for the grammar tag
- ".grxml" means type="application/srgs+xml"
- ".gsl" means type="text/gsl"
- all other files are assumed to be ABNF SRGS format, type="application/srgs"
Built-In Grammars
TDB. Built-ins are part of VoiceXML 2.0, but optional. They are also intended for prototyping, and it's recommended that applications use full, properly tuned grammars.
Inline Grammars
GSL is (I believe) a propietary Nuance format. SpeakRight uses a simplified version that currently only supports [ ] and ( ).
A single utterance can contain one or more slots. The simplest type of directed dialog VUIs use single slot questions, such as "How many passengers are there?". SR only supports single slot for now.
Grammar Types
There are two types of grammars (represented by the GrammarType enum).
- VOICE is for spoken input
- DTMF is for touchtone digits
DTMF Only Mode
Speech recognition may not work at all in very noisy environments. Not only will recognition fail, but prompts may never play due to false barge-in. For these reasons, speech applications should be able to fall back to a DTMF-only mode. This mode can be activated by the user by pressing a certain key, usually '*'. Once activated, SpeakRight will not render any VOICE grammars. Thus the VoiceXML engine will only listen for DTMF digits.
Slots
A grammar represents a series of words, such as "A large pizza please". The application may only care about a few of the words; here, the size word "large" is the only word of importance to the app. These words are attached tonamed return values called slots. In our pizza example, a slot called "size" would be bound to the words "small", "medium", or "large". Any of those words would fill the slot.
Slots define the interface between a grammar and a VoiceXML field. The field's name (shown below) defines a slot that grammar must fill in order for the field to be filled.
Any grammar that fills the slot "size" can be used.
A single utterance can fill multiple slots, as in "I would like to fly to Atlanta on Friday."
SpeakRight doesn't yet support multi-slot questions..
Prompts
Here is a basic prompt that plays some text using TTS (text-to-speech):
"Welcome to Inky's Travel".
PTexts are Java strings. Here's another prompt:
"Welcome to Inky's Travel. {audio:logo.wav}"
This prompt contains two items: a TTS phrase and an audio file. Items are delimited by '{' and '}'. The delimiters are optional for the first item. This is equivalent:
"{Welcome to Inky's Travel. }{audio:logo.wav}"
PTexts can contains as many items as you want. They will be rendered as a prompt
<prompt>Welcome to Inky's Travel. <audio src="http://myIPaddress/logo.wav"></audio>
</prompt>
For convenience, audio items can be specified without the "audio:" prefix. The following is equivalent to the previous prompt. The prefix is optional if the filename ends in ".wav" and contains no whitespace characters.
"{Welcome to Inky's Travel. }{logo.wav}"
You can add pauses as well using "." inside an item. Each period represents 250 msec. Pause items contain only periods (otherwise they're considered as TTS). Here's a 750 msec pause.
"{Welcome to Inky's Travel. }{...}{logo.wav}"
Model variables can be prompt items by using a "$M." prefix. The value of the model is rendered.
"The current price is {$M.price}"
"You chose {$INPUT}"
Also fields (aka. member variables) of a flow object can be items by wrapping them in '%'. If a flow class has a member variable: int m_numPassengers; then you can play this value in a prompt like this:
If you're familiar with SSML then you can use raw prompt items, that have a "raw:" prefix. These are output as it, and can contain SSML tags.
Lastly, there are id prompt items, which are references to an external prompt string in an XML file. This is useful for multi-lingual apps, or for changing prompts after deployment. See Prompt Ids and Prompt XML Files.
"id:sayPrice"
"audio:" audio prompts - "M$." model values
- "%value%" field values (of currently executing flow object)
- ".." pause (250 msec for each period)
- "raw:" raw SSML prompts
- "id:" id prompts
- TTS prompt (any prompt item that doesn't match one of the above types is played as TTS)
Prompt Conditions
By default, all the prompts in a flow object are played. However there are occasions when the playing of a prompt needs to be controlled by a condition. Conditions are evaluated when the flow object is executed; if the condition returns false the prompt is not played.
Condition | Description |
none | always play prompt |
PlayOnce | only play the first time the flow is executed. If the flow is re-executed (the same flow object executes more than once in a row), don't play the prompt. PlayOnce are useful in menus where the initial prompt may contain information that should only be played once. |
PlayOnceEver | only play once during the entire session (phone call). |
PlayIfEmpty | only play if the given model variable is empty (""). Useful if you want to play a prompt as long as something has not yet occured. |
PlayIfNotEmpty | only play if the given model variable is not empty ("") |
Prompt Rendering
Prompts are rendered using a pipeline of steps. The order of the steps has been chosen to maximize usefulness.
Step | Description |
1 | Apply condition. If false then return |
2 | Resolve ids. Read external XML file and replace each prompt id with its specified prompt text |
3 | Evaluate model values |
4 | Call fixup handlers in the flow objects. The IFlow method fixupPrompt allows a flow object to tweak TTS prompt items |
5 | Merge runs of of TTS items into a single item |
6 | Do audio matching. An external XML file defines TTS text for which an audio file exists. The text is replaced with the audio file. |
The result is a list of TTS and/or audio items that are sent to the page writer.
Audio matching
Audio matching is a technique that lets you use TTS for the initial speech app development. Once the app is fairly stable, record audio files for all the prompts. Then you ceate an audio matching xml file that lists each audio file and the prompt text it replaces. Now when the SpeakRight application runs, matching text is automatically replaced with the audio file. No source code changes are required.
The match is a soft match that ignores case and punctuation. That is a prompt item "Dr. Smith lives on Maple Dr." would match an audio-match text of "dr smith lives on maple dr".
Audio matching works at the item level. Do we need to suport some tag for spanning multi items???