What is SpeakRight?

SpeakRight is an open-source Java framework for writing speech recognition applications in VoiceXML.Unlike most proprietary speech-app tools, SpeakRight is code-based. Applications are written in Java using SpeakRight's extensible classes. Java IDEs such as Eclipse provide great debugging, fast Java-aware editing, and refactoring. Dynamic generation of VoiceXML is done using the popular StringTemplate templating framework. Read more...

See Getting Started, Tutorial, and Table of Contents
Powered By Blogger

Thursday, March 29, 2007

Prompts in SROs

SpeakRight speech objects (SROs) offer a highly-reusable approach to prompts. Recall that SRO classes are generated by the SROGen tool. Each SRO has an XML file that defines what code will be generated. The XML file contains the grammars and prompts for the SRO. You can modify this file and regenerate an SRO.

SROGen also generates an XML file holding the default prompts. This file is deployed and is read at runtime to load the SRO prompt fields. You can modify this file on a production system to modify the default prompts for an SRO; no re-compile (or restart) is necessary.

Here's what a prompt definition in the XML file looks like:

<prompt name="main1" def="no">What {%subject%} would you like?</prompt>

The prompt id main1, has a corresponding field in the SRO class called m_main1Prompt. A derived class, or a SR app, can modify the field using get/set methods.

This works fairly well with the PText feature {%fieldName%} for extracting values from a field.

Sub-Prompts
The one thing missing so far is the ability to define conditional prompts, such as a play-once prompt. You can do it in code of course, but that's not very flexible

if (executionCount() == 1) {
m_main1Prompt = "Let's get started. " + m_main1Prompt;
}

To remedy this, SROs support multiple sub-prompts to be defined for a single prompt, such as the MAIN prompt.

<prompt name="main1welc" group="MAIN" cond="once_ever">Let's get started.</prompt>
<prompt name="main1" group="MAIN" def="no">What {%subject%} would you like?</prompt>

Each prompt tag results in a field being created. Multiple prompts in the same group are rendered as a single VoiceXML prompt, in the order they occur in the XML file. The neat thing is that conditions can now be applied to individual sub-prompts. The cond attribute defines a condition. Here, "once_ever" means a play-once-ever condition. The first time the SRO is executed, the prompt will be: "Let's get started. What flight would you like?". On subsequent executions, the prompt is "What flight would you like?".

Implementation note: Sub-prompts are implemented using the m_subIndex field of Prompt. When prompts are rendered (in a PromptSet), all the rendered items are gathered together in the first sub-prompt. But since each sub-prompt is an independent Prompt object, its rendering can be enabled or disabled by its condition.

Tuesday, March 6, 2007

Release 0.0.2 is out

This second release is a bit more real. First two SROs (SRONumber and SROConfirmYesNo) are available. And confirmation and validation are working.

Here's a simple app to ask for the number of tickets in a travel application

SRApp flow = new SRApp();

SRONumber sro = new SRONumber("tickets", 1, 10);
sro.setConfirmer(new SROConfirmYesNo("tickets"));

flow.add(sro);
flow.addPrompt("You said {$INPUT}");

From this, the following dialog is possible

Computer: How many tickets would you like?
Human:
Computer: I didn't hear you. How many tickets would you like?
Human:
Computer: I still didn't hear you. How many tickets would you like?
Human: twelve
Computer: Sorry, I'm looking for a number between one and ten. How many tickets would you like?
Human: two
Computer: Do you want two?
Human: yes
Computer: You said two.

As you can see, escalating error prompts are given and user input is validated against the range given to the SRONumber flow object. Utterances below 80% confidence are confirmed, and finally the user's input is played back to them.

This release uses extremely simple grammar in SRONumber (a built-in grammar for one to nine). You're free to replace it with your own, and of course the next release will improve on this.

Enjoy :)

Saturday, March 3, 2007

Testing

Testing is an important part of SpeakRight. Running applications on an VoiceXML platform is not easy or quick to do. It's essential to be able to test your app under more programmer-friendly conditions.

Unit Tests

Both JUnit and XMLUnit are supported. Here is a basic test that creates an app and runs it, feeding the required user input. Notice the use of TrailWrapper, a flow object that tracks execution in a trail of flow object names. The test shown below is in org.speakright.sro.tests and uses the MockSpeechPageWriter. This mock object remembers the rendered SpeechPage object which can then be checked to see if the expected grammars and prompts are there.

@Test public void testConfirmNo()
{
log("--------testConfirmNo--------");
SRApp flow = createApp(true);
TrailWrapper wrap1 = new TrailWrapper(flow);

SRInstance run = StartIt(wrap1);
Proceed(run, "2", "num", 40); //question with low confidence
Proceed(run, "no"); //reject the confirmation
Proceed(run, "8", "num"); //question again
Proceed(run, ""); //you said...

assertEquals("fail", false, run.isFailed());
assertEquals("fin", true, run.isFinished());
assertEquals("start", true, run.isStarted());

ChkTrail(run, "SROQuantity;ConfYNFlow;SROQuantity;PFlow");
assertEquals("city", "8", M.city().get());
}

XMLUnit can also be used. See the TestRender.java file in org.speakright.core.tests. XML comparison is more fussy (and slower), but you can check the actual VoiceXML.

ISRInteractiveTester

Next up is the ISRInteractiveTester class. Use it in a console-app for executing a SpeakRight app interactively from the keyboard. See InteractiveTester.java in org.speakright.core.tests for an example. Run this file as a Java application.

Here are the commands it uses:

SpeakRight ITester..
RUNNING App2.........
1> ???
available cmds:
q -- quit
go -- run or proceed
bye -- simulate a DISCONNECT
echo -- toggle echo of generated content
version -- display version
status -- show interpreter status
out -- turn on file output of each page in tmpfiles dir
ret -- set return url
gramurl -- gram base dir
prompturl -- prompt base dir
html -- switch to HTML output
vxml -- switch to VXML output

The out command causes each rendered VoiceXML page to be output as a file (page1.vxml, page2.vxml, etc).

Running Servlet in HTML mode

Once you're satisfied with your app, it's time to test it inside a real servlet. Write your servlet; the one that will output VoiceXML. However when you run it, add the CGI param "mode=html", like this:

http:localhost:8080/MyServlet3/App1?mode=html

MyServlet3
is the name of your dynamic web project, and App1 is the servlet inside it.

SpeakRight will render into HTML inside VoiceXML. A sample page is shown on the left. Pressing the Next or Submit button simulates the VoiceXML platform returning results.



















Running Servlet in VoiceXML mode


OK, time for the real thing. Point your VoiceXML platform at your servlet's URL, like this:

http://www.someplace.com/MyServlet3/App1

Some platforms (such as Voxeo's) have a real-time debugger that shows events and log messages as they occur.

You can also use the log4j log file that SpeakRight writes. Here's a sample:

03-03 11:32:12.015 [or24] INFO srf - SR: startApp
03-03 11:32:12.015 [or24] INFO srf - START: MyApp
03-03 11:32:12.015 [or24] INFO srf - push MyApp
03-03 11:32:12.015 [or24] INFO srf - push PFlow
03-03 11:32:12.015 [or24] INFO srf - EXEC PFlow
03-03 11:32:12.031 [or24] DEBUG srf - prompt (1 items): Welcome to Joe's Pizza
03-03 11:32:12.359 [or24] INFO srf - SR: writing content.
03-03 11:32:12.375 [or24] INFO srf - SR: saving state.
03-03 11:32:12.375 [or24] INFO srf - SR: done.
03-03 11:32:14.109 [or25] INFO srf - SR: doing POST

Automated Testing

See Automated Testing.

Friday, March 2, 2007

Servlets

Let's consider how to enclose a SpeakRight application in a Java servlet. Servlets are a portable API for responding to HTTP requests. Tomcat or other types of web servers (such as TBD) support servlets.

SpeakRight provides a class, SRServletRunner, that does most of the work. Here's how to use it in the doGet method of a servlet

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
SRServletRunner runner = new SRServletRunner(new AppFactory(), this, request, response, "GET");

if (runner.isNewSession()) {
SRRunner run = runner.createNewSRRunner(this);
IFlow flow = new App();
runner.startApp(flow);
} else {
runner.logger().log("contine in GET!!");
runner.continueApp();
}
}

First we pass the request and response objects into SRServletRunner, along with a string (used for logging) and our app factory (see Initialization). Then we check if this is a new session. If it is then we create our SpeakRight application and call startApp; otherwise we call continueApp. SRServletRunner manages passifying and re-activating the SpeakRight runtime between HTTP requests.

Logging is done using log4j.