idnits 2.17.1 

draft-ietf-speechsc-mrcpv2-20.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

     (You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Feb 2009 rather than one of the newer Notices.  See
     https://trustee.ietf.org/license-info/.)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (August 11, 2009) is 5371 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFCXXXX' is mentioned on line 7636, but not defined

  ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
     RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Obsolete normative reference: RFC 4572 (Obsoleted by RFC 8122)

  ** Obsolete normative reference: RFC 3388 (Obsoleted by RFC 5888)

  ** Obsolete normative reference: RFC 2109 (Obsoleted by RFC 2965)

  ** Obsolete normative reference: RFC 2965 (Obsoleted by RFC 6265)

  ** Obsolete normative reference: RFC 4646 (Obsoleted by RFC 5646)

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838)

  ** Obsolete normative reference: RFC 4395 (Obsoleted by RFC 7595)

  ** Downref: Normative reference to an Experimental RFC: RFC 2483


     Summary: 13 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SPEECHSC                                                   S. Shanmugham
3	Internet-Draft                                       Cisco Systems, Inc.
4	Intended status: Standards Track                              D. Burnett
5	Expires: February 12, 2010                                         Voxeo
6	                                                         August 11, 2009

8	           Media Resource Control Protocol Version 2 (MRCPv2)
9	                     draft-ietf-speechsc-mrcpv2-20

11	Status of this Memo

13	   This Internet-Draft is submitted to IETF in full conformance with the
14	   provisions of BCP 78 and BCP 79.  This document may contain material
15	   from IETF Documents or IETF Contributions published or made publicly
16	   available before November 10, 2008.  The person(s) controlling the
17	   copyright in some of this material may not have granted the IETF
18	   Trust the right to allow modifications of such material outside the
19	   IETF Standards Process.  Without obtaining an adequate license from
20	   the person(s) controlling the copyright in such materials, this
21	   document may not be modified outside the IETF Standards Process, and
22	   derivative works of it may not be created outside the IETF Standards
23	   Process, except to format it for publication as an RFC or to
24	   translate it into languages other than English.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF), its areas, and its working groups.  Note that
28	   other groups may also distribute working documents as Internet-
29	   Drafts.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   The list of current Internet-Drafts can be accessed at
37	   http://www.ietf.org/ietf/1id-abstracts.txt.

39	   The list of Internet-Draft Shadow Directories can be accessed at
40	   http://www.ietf.org/shadow.html.

42	   This Internet-Draft will expire on February 12, 2010.

44	Copyright Notice

46	   Copyright (c) 2009 IETF Trust and the persons identified as the
47	   document authors.  All rights reserved.

49	   This document is subject to BCP 78 and the IETF Trust's Legal
50	   Provisions Relating to IETF Documents in effect on the date of
51	   publication of this document (http://trustee.ietf.org/license-info).
52	   Please review these documents carefully, as they describe your rights
53	   and restrictions with respect to this document.

55	Abstract

57	   The MRCPv2 protocol allows client hosts to control media service
58	   resources such as speech synthesizers, recognizers, verifiers and
59	   identifiers residing in servers on the network.  MRCPv2 is not a
60	   "stand-alone" protocol - it relies on other protocols, such as
61	   Session Initiation Protocol (SIP) to rendezvous MRCPv2 clients and
62	   servers and manage sessions between them, and the Session Description
63	   Protocol (SDP) to describe, discover and exchange capabilities.  It
64	   also depends on SIP and SDP to establish the media sessions and
65	   associated parameters between the media source or sink and the media
66	   server.  Once this is done, the MRCPv2 protocol exchange operates
67	   over the control session established above, allowing the client to
68	   control the media processing resources on the speech resource server.

70	Table of Contents

72	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   9
73	   2.  Document Conventions  . . . . . . . . . . . . . . . . . . . .  10
74	     2.1.   Definitions  . . . . . . . . . . . . . . . . . . . . . .  10
75	     2.2.   State-Machine Diagrams . . . . . . . . . . . . . . . . .  11
76	   3.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .  11
77	     3.1.   MRCPv2 Media Resource Types  . . . . . . . . . . . . . .  12
78	     3.2.   Server and Resource Addressing . . . . . . . . . . . . .  13
79	   4.  MRCPv2 Protocol Basics  . . . . . . . . . . . . . . . . . . .  14
80	     4.1.   Connecting to the Server . . . . . . . . . . . . . . . .  14
81	     4.2.   Managing Resource Control Channels . . . . . . . . . . .  14
82	     4.3.   Media Streams and RTP Ports  . . . . . . . . . . . . . .  21
83	     4.4.   MRCPv2 Message Transport . . . . . . . . . . . . . . . .  23
84	   5.  MRCPv2 Specification  . . . . . . . . . . . . . . . . . . . .  23
85	     5.1.   Common Protocol Elements . . . . . . . . . . . . . . . .  24
86	     5.2.   Request  . . . . . . . . . . . . . . . . . . . . . . . .  25
87	     5.3.   Response . . . . . . . . . . . . . . . . . . . . . . . .  26
88	     5.4.   Status Codes . . . . . . . . . . . . . . . . . . . . . .  27
89	     5.5.   Events . . . . . . . . . . . . . . . . . . . . . . . . .  28
90	   6.  MRCPv2 Generic Methods, Headers, and Result Structure . . . .  29
91	     6.1.   Generic Methods  . . . . . . . . . . . . . . . . . . . .  30
92	       6.1.1.   SET-PARAMS . . . . . . . . . . . . . . . . . . . . .  30
93	       6.1.2.   GET-PARAMS . . . . . . . . . . . . . . . . . . . . .  31
94	     6.2.   Generic Message Headers  . . . . . . . . . . . . . . . .  31
95	       6.2.1.   Channel-Identifier . . . . . . . . . . . . . . . . .  33
96	       6.2.2.   Accept . . . . . . . . . . . . . . . . . . . . . . .  33
97	       6.2.3.   Active-Request-Id-List . . . . . . . . . . . . . . .  34
98	       6.2.4.   Proxy-Sync-Id  . . . . . . . . . . . . . . . . . . .  34
99	       6.2.5.   Accept-Charset . . . . . . . . . . . . . . . . . . .  35
100	       6.2.6.   Content-Type . . . . . . . . . . . . . . . . . . . .  35
101	       6.2.7.   Content-ID . . . . . . . . . . . . . . . . . . . . .  35
102	       6.2.8.   Content-Base . . . . . . . . . . . . . . . . . . . .  35
103	       6.2.9.   Content-Encoding . . . . . . . . . . . . . . . . . .  35
104	       6.2.10.  Content-Location . . . . . . . . . . . . . . . . . .  36
105	       6.2.11.  Content-Length . . . . . . . . . . . . . . . . . . .  37
106	       6.2.12.  Fetch Timeout  . . . . . . . . . . . . . . . . . . .  37
107	       6.2.13.  Cache-Control  . . . . . . . . . . . . . . . . . . .  37
108	       6.2.14.  Logging-Tag  . . . . . . . . . . . . . . . . . . . .  39
109	       6.2.15.  Set-Cookie and Set-Cookie2 . . . . . . . . . . . . .  39
110	       6.2.16.  Vendor Specific Parameters . . . . . . . . . . . . .  41
111	     6.3.   Generic Result Structure . . . . . . . . . . . . . . . .  41
112	       6.3.1.   Natural Language Semantics Markup Language . . . . .  42
113	   7.  Resource Discovery  . . . . . . . . . . . . . . . . . . . . .  43
114	   8.  Speech Synthesizer Resource . . . . . . . . . . . . . . . . .  45
115	     8.1.   Synthesizer State Machine  . . . . . . . . . . . . . . .  45
116	     8.2.   Synthesizer Methods  . . . . . . . . . . . . . . . . . .  46
117	     8.3.   Synthesizer Events . . . . . . . . . . . . . . . . . . .  46
118	     8.4.   Synthesizer Header Fields  . . . . . . . . . . . . . . .  47
119	       8.4.1.   Jump-Size  . . . . . . . . . . . . . . . . . . . . .  47
120	       8.4.2.   Kill-On-Barge-In . . . . . . . . . . . . . . . . . .  48
121	       8.4.3.   Speaker Profile  . . . . . . . . . . . . . . . . . .  48
122	       8.4.4.   Completion Cause . . . . . . . . . . . . . . . . . .  49
123	       8.4.5.   Completion Reason  . . . . . . . . . . . . . . . . .  49
124	       8.4.6.   Voice-Parameter  . . . . . . . . . . . . . . . . . .  50
125	       8.4.7.   Prosody-Parameters . . . . . . . . . . . . . . . . .  50
126	       8.4.8.   Speech Marker  . . . . . . . . . . . . . . . . . . .  51
127	       8.4.9.   Speech Language  . . . . . . . . . . . . . . . . . .  52
128	       8.4.10.  Fetch Hint . . . . . . . . . . . . . . . . . . . . .  52
129	       8.4.11.  Audio Fetch Hint . . . . . . . . . . . . . . . . . .  52
130	       8.4.12.  Failed URI . . . . . . . . . . . . . . . . . . . . .  53
131	       8.4.13.  Failed URI Cause . . . . . . . . . . . . . . . . . .  53
132	       8.4.14.  Speak Restart  . . . . . . . . . . . . . . . . . . .  53
133	       8.4.15.  Speak Length . . . . . . . . . . . . . . . . . . . .  53
134	       8.4.16.  Load-Lexicon . . . . . . . . . . . . . . . . . . . .  54
135	       8.4.17.  Lexicon-Search-Order . . . . . . . . . . . . . . . .  54
136	     8.5.   Synthesizer Message Body . . . . . . . . . . . . . . . .  54
137	       8.5.1.   Synthesizer Speech Data  . . . . . . . . . . . . . .  54
138	       8.5.2.   Lexicon Data . . . . . . . . . . . . . . . . . . . .  57
139	     8.6.   SPEAK Method . . . . . . . . . . . . . . . . . . . . . .  58
140	     8.7.   STOP . . . . . . . . . . . . . . . . . . . . . . . . . .  60
141	     8.8.   BARGE-IN-OCCURED . . . . . . . . . . . . . . . . . . . .  61
142	     8.9.   PAUSE  . . . . . . . . . . . . . . . . . . . . . . . . .  63
143	     8.10.  RESUME . . . . . . . . . . . . . . . . . . . . . . . . .  64
144	     8.11.  CONTROL  . . . . . . . . . . . . . . . . . . . . . . . .  66
145	     8.12.  SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . .  68
146	     8.13.  SPEECH-MARKER  . . . . . . . . . . . . . . . . . . . . .  69
147	     8.14.  DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . .  71
148	   9.  Speech Recognizer Resource  . . . . . . . . . . . . . . . . .  71
149	     9.1.   Recognizer State Machine . . . . . . . . . . . . . . . .  73
150	     9.2.   Recognizer Methods . . . . . . . . . . . . . . . . . . .  73
151	     9.3.   Recognizer Events  . . . . . . . . . . . . . . . . . . .  74
152	     9.4.   Recognizer Header Fields . . . . . . . . . . . . . . . .  74
153	       9.4.1.   Confidence Threshold . . . . . . . . . . . . . . . .  76
154	       9.4.2.   Sensitivity Level  . . . . . . . . . . . . . . . . .  76
155	       9.4.3.   Speed Vs Accuracy  . . . . . . . . . . . . . . . . .  77
156	       9.4.4.   N Best List Length . . . . . . . . . . . . . . . . .  77
157	       9.4.5.   Input Type . . . . . . . . . . . . . . . . . . . . .  77
158	       9.4.6.   No Input Timeout . . . . . . . . . . . . . . . . . .  77
159	       9.4.7.   Recognition Timeout  . . . . . . . . . . . . . . . .  78
160	       9.4.8.   Waveform URI . . . . . . . . . . . . . . . . . . . .  78
161	       9.4.9.   Media Type . . . . . . . . . . . . . . . . . . . . .  79
162	       9.4.10.  Input-Waveform-URI . . . . . . . . . . . . . . . . .  79
163	       9.4.11.  Completion Cause . . . . . . . . . . . . . . . . . .  79
164	       9.4.12.  Completion Reason  . . . . . . . . . . . . . . . . .  81
165	       9.4.13.  Recognizer Context Block . . . . . . . . . . . . . .  81
166	       9.4.14.  Start Input Timers . . . . . . . . . . . . . . . . .  82
167	       9.4.15.  Speech Complete Timeout  . . . . . . . . . . . . . .  82
168	       9.4.16.  Speech Incomplete Timeout  . . . . . . . . . . . . .  83
169	       9.4.17.  DTMF Interdigit Timeout  . . . . . . . . . . . . . .  83
170	       9.4.18.  DTMF Term Timeout  . . . . . . . . . . . . . . . . .  84
171	       9.4.19.  DTMF-Term-Char . . . . . . . . . . . . . . . . . . .  84
172	       9.4.20.  Failed URI . . . . . . . . . . . . . . . . . . . . .  84
173	       9.4.21.  Failed URI Cause . . . . . . . . . . . . . . . . . .  84
174	       9.4.22.  Save Waveform  . . . . . . . . . . . . . . . . . . .  85
175	       9.4.23.  New Audio Channel  . . . . . . . . . . . . . . . . .  85
176	       9.4.24.  Speech-Language  . . . . . . . . . . . . . . . . . .  85
177	       9.4.25.  Ver-Buffer-Utterance . . . . . . . . . . . . . . . .  86
178	       9.4.26.  Recognition-Mode . . . . . . . . . . . . . . . . . .  86
179	       9.4.27.  Cancel-If-Queue  . . . . . . . . . . . . . . . . . .  86
180	       9.4.28.  Hotword-Max-Duration . . . . . . . . . . . . . . . .  87
181	       9.4.29.  Hotword-Min-Duration . . . . . . . . . . . . . . . .  87
182	       9.4.30.  Interpret-Text . . . . . . . . . . . . . . . . . . .  87
183	       9.4.31.  DTMF-Buffer-Time . . . . . . . . . . . . . . . . . .  87
184	       9.4.32.  Clear-DTMF-Buffer  . . . . . . . . . . . . . . . . .  88
185	       9.4.33.  Early-No-Match . . . . . . . . . . . . . . . . . . .  88
186	       9.4.34.  Num-Min-Consistent-Pronunciations  . . . . . . . . .  88
187	       9.4.35.  Consistency-Threshold  . . . . . . . . . . . . . . .  89
188	       9.4.36.  Clash-Threshold  . . . . . . . . . . . . . . . . . .  89
189	       9.4.37.  Personal-Grammar-URI . . . . . . . . . . . . . . . .  89
190	       9.4.38.  Enroll-Utterance . . . . . . . . . . . . . . . . . .  89
191	       9.4.39.  Phrase-Id  . . . . . . . . . . . . . . . . . . . . .  90
192	       9.4.40.  Phrase-NL  . . . . . . . . . . . . . . . . . . . . .  90
193	       9.4.41.  Weight . . . . . . . . . . . . . . . . . . . . . . .  90
194	       9.4.42.  Save-Best-Waveform . . . . . . . . . . . . . . . . .  91
195	       9.4.43.  New-Phrase-Id  . . . . . . . . . . . . . . . . . . .  91
196	       9.4.44.  Confusable-Phrases-URI . . . . . . . . . . . . . . .  91
197	       9.4.45.  Abort-Phrase-Enrollment  . . . . . . . . . . . . . .  91
198	     9.5.   Recognizer Message Body  . . . . . . . . . . . . . . . .  91
199	       9.5.1.   Recognizer Grammar Data  . . . . . . . . . . . . . .  92
200	       9.5.2.   Recognizer Result Data . . . . . . . . . . . . . . .  96
201	       9.5.3.   Enrollment Result Data . . . . . . . . . . . . . . .  97
202	       9.5.4.   Recognizer Context Block . . . . . . . . . . . . . .  97
203	     9.6.   Recognizer Results . . . . . . . . . . . . . . . . . . .  97
204	       9.6.1.   Markup Functions . . . . . . . . . . . . . . . . . .  98
205	       9.6.2.   Overview of Recognizer Result Elements and their
206	                Relationships  . . . . . . . . . . . . . . . . . . .  99
207	       9.6.3.   Elements and Attributes  . . . . . . . . . . . . . .  99
208	     9.7.   Enrollment Results . . . . . . . . . . . . . . . . . . . 104
209	       9.7.1.   NUM-CLASHES Element  . . . . . . . . . . . . . . . . 104
210	       9.7.2.   NUM-GOOD-REPETITIONS Element . . . . . . . . . . . . 105
211	       9.7.3.   NUM-REPETITIONS-STILL-NEEDED Element . . . . . . . . 105
212	       9.7.4.   CONSISTENCY-STATUS Element . . . . . . . . . . . . . 105
213	       9.7.5.   CLASH-PHRASE-IDS Element . . . . . . . . . . . . . . 105
214	       9.7.6.   TRANSCRIPTIONS Element . . . . . . . . . . . . . . . 105
215	       9.7.7.   CONFUSABLE-PHRASES Element . . . . . . . . . . . . . 105
216	     9.8.   DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 105
217	     9.9.   RECOGNIZE  . . . . . . . . . . . . . . . . . . . . . . . 109
218	     9.10.  STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 115
219	     9.11.  GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 116
220	     9.12.  START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 117
221	     9.13.  START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 118
222	     9.14.  RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 118
223	     9.15.  START-PHRASE-ENROLLMENT  . . . . . . . . . . . . . . . . 120
224	     9.16.  ENROLLMENT-ROLLBACK  . . . . . . . . . . . . . . . . . . 121
225	     9.17.  END-PHRASE-ENROLLMENT  . . . . . . . . . . . . . . . . . 122
226	     9.18.  MODIFY-PHRASE  . . . . . . . . . . . . . . . . . . . . . 122
227	     9.19.  DELETE-PHRASE  . . . . . . . . . . . . . . . . . . . . . 123
228	     9.20.  INTERPRET  . . . . . . . . . . . . . . . . . . . . . . . 123
229	     9.21.  INTERPRETATION-COMPLETE  . . . . . . . . . . . . . . . . 124
230	     9.22.  DTMF Detection . . . . . . . . . . . . . . . . . . . . . 126
231	   10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 126
232	     10.1.  Recorder State Machine . . . . . . . . . . . . . . . . . 127
233	     10.2.  Recorder Methods . . . . . . . . . . . . . . . . . . . . 127
234	     10.3.  Recorder Events  . . . . . . . . . . . . . . . . . . . . 127
235	     10.4.  Recorder Header Fields . . . . . . . . . . . . . . . . . 127
236	       10.4.1.  Sensitivity Level  . . . . . . . . . . . . . . . . . 128
237	       10.4.2.  No Input Timeout . . . . . . . . . . . . . . . . . . 128
238	       10.4.3.  Completion Cause . . . . . . . . . . . . . . . . . . 128
239	       10.4.4.  Completion Reason  . . . . . . . . . . . . . . . . . 129
240	       10.4.5.  Failed URI . . . . . . . . . . . . . . . . . . . . . 129
241	       10.4.6.  Failed URI Cause . . . . . . . . . . . . . . . . . . 129
242	       10.4.7.  Record URI . . . . . . . . . . . . . . . . . . . . . 130
243	       10.4.8.  Media Type . . . . . . . . . . . . . . . . . . . . . 130
244	       10.4.9.  Max Time . . . . . . . . . . . . . . . . . . . . . . 130
245	       10.4.10. Trim-Length  . . . . . . . . . . . . . . . . . . . . 131
246	       10.4.11. Final Silence  . . . . . . . . . . . . . . . . . . . 131
247	       10.4.12. Capture On Speech  . . . . . . . . . . . . . . . . . 131
248	       10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 131
249	       10.4.14. Start Input Timers . . . . . . . . . . . . . . . . . 132
250	       10.4.15. New Audio Channel  . . . . . . . . . . . . . . . . . 132
251	     10.5.  Recorder Message Body  . . . . . . . . . . . . . . . . . 132
252	     10.6.  RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 132
253	     10.7.  STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 133
254	     10.8.  RECORD-COMPLETE  . . . . . . . . . . . . . . . . . . . . 134
255	     10.9.  START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 135
256	     10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 135
257	   11. Speaker Verification and Identification . . . . . . . . . . . 136
258	     11.1.  Speaker Verification State Machine . . . . . . . . . . . 137
259	     11.2.  Speaker Verification Methods . . . . . . . . . . . . . . 139
260	     11.3.  Verification Events  . . . . . . . . . . . . . . . . . . 140
261	     11.4.  Verification Header Fields . . . . . . . . . . . . . . . 140
262	       11.4.1.  Repository-URI . . . . . . . . . . . . . . . . . . . 141
263	       11.4.2.  Voiceprint-Identifier  . . . . . . . . . . . . . . . 141
264	       11.4.3.  Verification-Mode  . . . . . . . . . . . . . . . . . 142
265	       11.4.4.  Adapt-Model  . . . . . . . . . . . . . . . . . . . . 143
266	       11.4.5.  Abort-Model  . . . . . . . . . . . . . . . . . . . . 143
267	       11.4.6.  Min-Verification-Score . . . . . . . . . . . . . . . 143
268	       11.4.7.  Num-Min-Verification-Phrases . . . . . . . . . . . . 143
269	       11.4.8.  Num-Max-Verification-Phrases . . . . . . . . . . . . 144
270	       11.4.9.  No-Input-Timeout . . . . . . . . . . . . . . . . . . 144
271	       11.4.10. Save-Waveform  . . . . . . . . . . . . . . . . . . . 144
272	       11.4.11. Media Type . . . . . . . . . . . . . . . . . . . . . 145
273	       11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 145
274	       11.4.13. Voiceprint-Exists  . . . . . . . . . . . . . . . . . 145
275	       11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 146
276	       11.4.15. Input-Waveform-Uri . . . . . . . . . . . . . . . . . 146
277	       11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 146
278	       11.4.17. Completion Reason  . . . . . . . . . . . . . . . . . 148
279	       11.4.18. Speech Complete Timeout  . . . . . . . . . . . . . . 148
280	       11.4.19. New Audio Channel  . . . . . . . . . . . . . . . . . 148
281	       11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 148
282	       11.4.21. Start Input Timers . . . . . . . . . . . . . . . . . 148
283	     11.5.  Verification Message Body  . . . . . . . . . . . . . . . 149
284	       11.5.1.  Verification Result Data . . . . . . . . . . . . . . 149
285	       11.5.2.  Verification Result Elements . . . . . . . . . . . . 149
286	     11.6.  START-SESSION  . . . . . . . . . . . . . . . . . . . . . 153
287	     11.7.  END-SESSION  . . . . . . . . . . . . . . . . . . . . . . 154
288	     11.8.  QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 155
289	     11.9.  DELETE-VOICEPRINT  . . . . . . . . . . . . . . . . . . . 156
290	     11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 157
291	     11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 157
292	     11.12. VERIFY-ROLLBACK  . . . . . . . . . . . . . . . . . . . . 160
293	     11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 160
294	     11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 161
295	     11.15. VERIFICATION-COMPLETE  . . . . . . . . . . . . . . . . . 162
296	     11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 162
297	     11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 163
298	     11.18. GET-INTERMEDIATE-RESULT  . . . . . . . . . . . . . . . . 163
299	   12. Security Considerations . . . . . . . . . . . . . . . . . . . 164
300	     12.1.  Rendezvous and Session Establishment . . . . . . . . . . 165
301	     12.2.  Control channel protection . . . . . . . . . . . . . . . 165
302	     12.3.  Media session protection . . . . . . . . . . . . . . . . 165
303	     12.4.  Indirect Content Access  . . . . . . . . . . . . . . . . 165
304	     12.5.  Protection of stored media . . . . . . . . . . . . . . . 166
305	   13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 166
306	     13.1.  New registries . . . . . . . . . . . . . . . . . . . . . 166
307	       13.1.1.  MRCPv2 resource types  . . . . . . . . . . . . . . . 166
308	       13.1.2.  MRCPv2 methods and events  . . . . . . . . . . . . . 167
309	       13.1.3.  MRCPv2 headers . . . . . . . . . . . . . . . . . . . 168
310	       13.1.4.  MRCPv2 status codes  . . . . . . . . . . . . . . . . 171
311	       13.1.5.  Grammar Reference List Parameters  . . . . . . . . . 171
312	       13.1.6.  MRCPv2 vendor-specific parameters  . . . . . . . . . 171
313	     13.2.  NLSML-related registrations  . . . . . . . . . . . . . . 172
314	       13.2.1.  application/nlsml+xml Media Type registration  . . . 172
315	     13.3.  NLSML XML Schema registration  . . . . . . . . . . . . . 173
316	     13.4.  MRCPv2 XML Namespace registration  . . . . . . . . . . . 173
317	     13.5.  text Media Type Registrations  . . . . . . . . . . . . . 173
318	       13.5.1.  text/grammar-ref-list  . . . . . . . . . . . . . . . 173
319	     13.6.  session URL scheme registration  . . . . . . . . . . . . 174
320	     13.7.  SDP parameter registrations  . . . . . . . . . . . . . . 175
321	       13.7.1.  sub-registry "proto" . . . . . . . . . . . . . . . . 175
322	       13.7.2.  sub-registry "att-field (session-level)" . . . . . . 176
323	       13.7.3.  sub-registry "att-field (media-level)" . . . . . . . 176
324	   14. Examples  . . . . . . . . . . . . . . . . . . . . . . . . . . 177
325	     14.1.  Message Flow . . . . . . . . . . . . . . . . . . . . . . 177
326	     14.2.  Recognition Result Examples  . . . . . . . . . . . . . . 186
327	       14.2.1.  Simple ASR Ambiguity . . . . . . . . . . . . . . . . 186
328	       14.2.2.  Mixed Initiative . . . . . . . . . . . . . . . . . . 187
329	       14.2.3.  DTMF Input . . . . . . . . . . . . . . . . . . . . . 188
330	       14.2.4.  Interpreting Meta-Dialog and Meta-Task Utterances  . 188
331	       14.2.5.  Anaphora and Deixis  . . . . . . . . . . . . . . . . 189
332	       14.2.6.  Distinguishing Individual Items from Sets with
333	                One Member . . . . . . . . . . . . . . . . . . . . . 190
334	       14.2.7.  Extensibility  . . . . . . . . . . . . . . . . . . . 191
335	   15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 191
336	   16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 206
337	     16.1.  NLSML Schema Definition  . . . . . . . . . . . . . . . . 206
338	     16.2.  Enrollment Results Schema Definition . . . . . . . . . . 207
339	     16.3.  Verification Results Schema Definition . . . . . . . . . 208
340	   17. References  . . . . . . . . . . . . . . . . . . . . . . . . . 212
341	     17.1.  Normative References . . . . . . . . . . . . . . . . . . 212
342	     17.2.  Informative References . . . . . . . . . . . . . . . . . 214
343	   Appendix A.  Contributors . . . . . . . . . . . . . . . . . . . . 216
344	   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . 217
345	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 217

347	1.  Introduction

349	   The MRCPv2 protocol is designed to allow a client device to control
350	   media processing resources on the network.  Some of these media
351	   processing resources include speech recognition engines, speech
352	   synthesis engines, speaker verification and speaker identification
353	   engines.  MRCPv2 enables the implementation of distributed
354	   Interactive Voice Response platforms using VoiceXML
355	   [W3C.REC-voicexml20-20040316] browsers or other client applications
356	   while maintaining separate back-end speech processing capabilities on
357	   specialized speech processing servers.  MRCPv2 is based on the
358	   earlier Media Resource Control Protocol (MRCP) [RFC4463] developed
359	   jointly by Cisco Systems, Inc., Nuance Communications, and
360	   Speechworks Inc.

362	   The protocol requirements of SPEECHSC [RFC4313] include that the
363	   solution be capable of reaching a media processing server and setting
364	   up communication channels to the media resources, and sending and
365	   receiving control messages and media streams to/from the server.  The
366	   Session Initiation Protocol (SIP) [RFC3261] meets these requirements.

368	   Note the above mentioned requirements document, RFC 4313, goes into
369	   detail on alternatives to SIP, such as RTSP [RFC2326], and why MRCPv2
370	   does not use RTSP, even though the proprietary version of MRCP did
371	   run over RTSP.

373	   MRCPv2 leverages these capabilities by building upon SIP and the
374	   Session Description Protocol (SDP) [RFC4566].  MRCPv2 uses SIP to
375	   setup and tear down media and control sessions with the server.  In
376	   addition, the client can use a SIP re-INVITE method (an INVITE dialog
377	   sent within an existing SIP Session) to change the characteristics of
378	   these media and control session while maintaining the SIP dialog
379	   between the client and server.  SDP is used to describe the
380	   parameters of the media sessions associated with that dialog.  It is
381	   mandatory to support SIP as the session establishment protocol to
382	   ensure interoperability.  Other protocols can be used for session
383	   establishment by prior agreement.  This document only describes the
384	   use of SIP and SDP.

386	   MRCPv2 uses SIP and SDP to create the speech client/server dialog and
387	   set up the media channels to the server.  It also uses SIP and SDP to
388	   establish MRCPv2 control sessions between the client and the server
389	   for each media processing resource required for that dialog.  The
390	   MRCPv2 protocol exchange between the client and the media resource is
391	   carried on that control session.  MRCPv2 protocol exchanges do not
392	   change the state of the SIP dialog, the media sessions, or other
393	   parameters of the dialog initiated via SIP.  It controls and affects
394	   the state of the media processing resource associated with the MRCPv2
395	   session(s).

397	   MRCPv2 defines the messages to control the different media processing
398	   resources and the state machines required to guide their operation.
399	   It also describes how these messages are carried over a transport
400	   layer protocol such as TCP or TLS (Note: SCTP is a viable transport
401	   for MRCPv2 as well, but the mapping onto SCTP is not described in
402	   this specification).

404	2.  Document Conventions

406	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
407	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
408	   document are to be interpreted as described in RFC2119 [RFC2119].

410	   Since many of the definitions and syntax are identical to HTTP/1.1
411	   (RFC2616 [RFC2616]), this specification refers to the section where
412	   they are defined rather than copying it.  For brevity, [HX.Y] is to
413	   be taken to refer to Section X.Y of RFC2616.

415	   All the mechanisms specified in this document are described in both
416	   prose and an augmented Backus-Naur form (ABNF [RFC5234]).

418	   The complete message format in ABNF form is provided in Section 15
419	   and is the normative format definition.

421	2.1.  Definitions

423	   Media Resource
424	                  An entity on the speech processing server that can be
425	                  controlled through the MRCPv2 protocol.
426	   MRCP Server
427	                  Aggregate of one or more "Media Resource" entities on
428	                  a Server, exposed through the MRCPv2 protocol
429	                  ("Server" for short).
430	   MRCP Client
431	                  An entity controlling one or more Media Resources
432	                  through the MRCPv2 protocol ("Client" for short).
433	   DTMF
434	                  Dual Tone Multi-Frequency; a method of transmitting
435	                  key presses in-band, either as actual tones (Q.23
436	                  [Q.23]) or as named tone events (RFC4733 [RFC4733]).
437	   Endpointing
438	                  The process of automatically detecting the beginning
439	                  and end of speech in an audio stream.  This is
440	                  critical both for speech recognition and for automated
441	                  recording as one would find in voice mail systems.

443	   Hotword Mode
444	                  A mode of speech recognition where a stream of
445	                  utterances is evaluated for match against a small set
446	                  of command words.  This is generally employed to
447	                  either trigger some action, or to control the
448	                  subsequent grammar to be used for further recognition

450	2.2.  State-Machine Diagrams

452	   The state-machine diagrams in this document do not show every
453	   possible method call.  Rather, they reflect the state of the resource
454	   based on the methods that have moved to IN-PROGRESS or COMPLETE
455	   states (see Section 5.3).  Note that since PENDING requests
456	   essentially have not affected the resource yet and are in queue to be
457	   processed, they are not reflected in the state-machine diagrams.

459	3.  Architecture

461	   A system using MRCPv2 consists of a client that requires the
462	   generation and/or consumption of media streams and a media resource
463	   server that has the resources or "engines" to process these streams
464	   as input or generate these streams as output.  The client uses SIP
465	   and SDP to establish an MRCPv2 control channel with the server to use
466	   its media processing resources.  MRCPv2 servers are addressed using
467	   SIP URIs.

469	   The session management protocol (SIP) uses SDP with the offer/answer
470	   model described in RFC3264 [RFC3264] to set up the MRCPv2 control
471	   channels and describe their characteristics.  A separate MRCPv2
472	   session is needed to control each of the media processing resources
473	   associated with the SIP dialog between the client and server.  Within
474	   a SIP dialog, the individual resource control channels for the
475	   different resources are added or removed through SDP offer/answer
476	   carried in a SIP re-INVITE transaction.

478	   The server, through the SDP exchange, provides the client with an
479	   unambiguous channel identifier and a TCP port number.  The client MAY
480	   then open a new TCP connection with the server on this port number.
481	   Multiple MRCPv2 channels can share a TCP connection between the
482	   client and the server.  All MRCPv2 messages exchanged between the
483	   client and the server carry the specified channel identifier that the
484	   server MUST ensure is unambiguous among all MRCPv2 control channels
485	   that are active on that server.  The client uses this channel
486	   identifier to indicate the media processing resource associated with
487	   that channel.  For information on message framing, see Section 5.

489	   The session management protocol (SIP) also establishes the media
490	   sessions between the client (or other source/sink of media) and the
491	   MRCPv2 server using SDP m-lines.  One or more media processing
492	   resources may share a media session under a SIP session, or each
493	   media processing resource may have its own media session.

495	   The following diagram shows the general architecture of a system that
496	   uses MRCPv2.  To simplify the diagram only a few resources are shown.

498	     MRCPv2 client                   MRCPv2 Media Resource Server
499	|--------------------|            |------------------------------------|
500	||------------------||            ||----------------------------------||
501	|| Application Layer||            ||Synthesis|Recognition|Verification||
502	||------------------||            || Engine  |  Engine   |   Engine   ||
503	||Media Resource API||            ||    ||   |    ||     |    ||      ||
504	||------------------||            ||Synthesis|Recognizer |  Verifier  ||
505	|| SIP  |  MRCPv2   ||            ||Resource | Resource  |  Resource  ||
506	||Stack |           ||            ||     Media Resource Management    ||
507	||      |           ||            ||----------------------------------||
508	||------------------||            ||   SIP  |        MRCPv2           ||
509	||   TCP/IP Stack   ||---MRCPv2---||  Stack |                         ||
510	||                  ||            ||----------------------------------||
511	||------------------||----SIP-----||           TCP/IP Stack           ||
512	|--------------------|            ||                                  ||
513	         |                        ||----------------------------------||
514	        SIP                       |------------------------------------|
515	         |                          /
516	|-------------------|             RTP
517	|                   |             /
518	| Media Source/Sink |------------/
519	|                   |
520	|-------------------|

522	                      Figure 1: Architectural Diagram

524	3.1.  MRCPv2 Media Resource Types

526	   An MRCPv2 server may offer one or more of the following media
527	   processing resources to its clients.
528	   Basic Synthesizer
529	                  A speech synthesizer resource with very limited
530	                  capabilities, that can generate its media stream
531	                  exclusively from concatenated audio clips.  The speech
532	                  data is described using a limited subset of SSML
533	                  [W3C.REC-speech-synthesis-20040907] elements.  A basic
534	                  synthesizer MUST support the SSML tags <speak>,
535	                  <audio>, <say-as> and <mark>.

537	   Speech Synthesizer
538	                  A full capability speech synthesis resource capable of
539	                  rendering speech from text.  Such a synthesizer MUST
540	                  have full SSML [W3C.REC-speech-synthesis-20040907]
541	                  support.
542	   Recorder
543	                  A resource capable of recording audio and providing a
544	                  URI pointer to the recording.  A recorder MUST provide
545	                  some endpointing capabilities for suppressing silence
546	                  at the beginning and end of a recording, and MAY also
547	                  suppress silence in the middle of a recording.  If
548	                  such suppression is done, the recorder MUST maintain
549	                  timing metadata to indicate the actual time stamps of
550	                  the recorded media.
551	   DTMF Recognizer
552	                  A recognition resource capable of extracting and
553	                  interpreting DTMF digits in a media stream and
554	                  matching them against a supplied digit grammar It
555	                  could also do a semantic interpretation based on
556	                  semantic tags in the grammar.
557	   Speech Recognizer
558	                  A full speech recognition resource that is capable of
559	                  receiving a media stream containing audio and
560	                  interpreting it to recognition results.  It also has a
561	                  natural language semantic interpreter to post-process
562	                  the recognized data according to the semantic data in
563	                  the grammar and provide semantic results along with
564	                  the recognized input.  The recognizer may also support
565	                  enrolled grammars, where the client can enroll and
566	                  create new personal grammars for use in future
567	                  recognition operations.
568	   Speaker Verifier
569	                  A resource capable of verifying the authenticity of a
570	                  claimed identity by matching a media stream containing
571	                  spoken input to a pre-existing voiceprint.  This may
572	                  also involve matching the caller's voice against more
573	                  than one voiceprint, also called multi-verification or
574	                  speaker identification.

576	3.2.  Server and Resource Addressing

578	   The MRCPv2 server is a generic SIP server, and is thus addressed by a
579	   SIP URI.

581	   For example:

583	        sip:mrcpv2@example.net

585	4.  MRCPv2 Protocol Basics

587	   MRCPv2 requires a connection-oriented transport layer protocol such
588	   as TCP or SCTP to guarantee reliable sequencing and delivery of
589	   MRCPv2 control messages between the client and the server.  In order
590	   to meet the requirements for security enumerated in SpeechSC
591	   Requirements [RFC4313], clients and servers MUST implement TLS as
592	   well.  One or more connections between the client and the server can
593	   be shared among different MRCPv2 channels to the server.  The
594	   individual messages carry the channel identifier to differentiate
595	   messages on different channels.  MRCPv2 protocol encoding is text
596	   based with mechanisms to carry embedded binary data.  This allows
597	   arbitrary data like recognition grammars, recognition results,
598	   synthesizer speech markup etc. to be carried in MRCPv2 messages.  For
599	   information on message framing, see Section 5.

601	4.1.  Connecting to the Server

603	   MRCPv2 employs a session establishment and management protocol such
604	   as SIP in conjunction with SDP.  The client reaches an MRCPv2 server
605	   using conventional INVITE and other SIP requests for establishing,
606	   maintaining, and terminating SIP dialogs.  The SDP offer/answer
607	   exchange model over SIP is used to establish a resource control
608	   channel for each resource.  The SDP offer/answer exchange is also
609	   used to establish media sessions between the server and the source or
610	   sink of audio.

612	4.2.  Managing Resource Control Channels

614	   The client needs a separate MRCPv2 resource control channel to
615	   control each media processing resource under the SIP dialog.  A
616	   unique channel identifier string identifies these resource control
617	   channels.  The channel identifier is an unambiguous, opaque string
618	   followed by an "@", then by a string token specifying the type of
619	   resource.  The server generates the channel identifier and MUST make
620	   sure it does not clash with the identifier of any other MRCP channel
621	   currently allocated by that server.  MRCPv2 defines the following
622	   IANA-registered types of media processing resources.  Additional
623	   resource types, their associated methods/events and state machines
624	   may be added as described below in Section 13.

626	          +---------------+----------------------+--------------+
627	          | Resource Type | Resource Description | Described in |
628	          +---------------+----------------------+--------------+
629	          | speechrecog   | Speech Recognizer    | Section 9    |
630	          | dtmfrecog     | DTMF Recognizer      | Section 9    |
631	          | speechsynth   | Speech Synthesizer   | Section 8    |
632	          | basicsynth    | Basic Synthesizer    | Section 8    |
633	          | speakverify   | Speaker Verification | Section 11   |
634	          | recorder      | Speech Recorder      | Section 10   |
635	          +---------------+----------------------+--------------+

637	                              Resource Types

639	   The SIP INVITE or re-INVITE transaction and the SDP offer/answer
640	   exchange it carries contain m-lines describing the resource control
641	   channel to be allocated.  There MUST be one SDP m-line for each
642	   MRCPv2 resource to be used in the session.  This m-line MUST have a
643	   media type field of "application" and a transport type field of
644	   either "TCP/MRCPv2" or "TCP/TLS/MRCPv2".  (The usage of SCTP with
645	   MRCPv2 may be addressed in a future specification).  The port number
646	   field of the m-line MUST contain the "discard" port of the transport
647	   protocol (port 9 for TCP) in the SDP offer from the client and MUST
648	   contain the TCP listen port on the server in the SDP answer.  The
649	   client may then either set up a TCP or TLS connection to that server
650	   port or share an already established connection to that port.  Since
651	   MRCPv2 allows multiple sessions to share the same TCP connection,
652	   multiple m-lines in a single SDP document may share the same port
653	   field value; MRCPv2 servers MUST NOT assume any relationship between
654	   resources using the same port other than the sharing of the
655	   communication channel.

657	   MRCPv2 resources do not use the port or format field of the m-line to
658	   distinguish themselves from other resources using the same channel.
659	   The client MUST specify the resource type identifier in the resource
660	   attribute associated with the control m-line of the SDP offer.  The
661	   server MUST respond with the full Channel-Identifier (which includes
662	   the resource type identifier and an unambiguous string) in the
663	   "channel" attribute associated with the control m-line of the SDP
664	   answer.  To remain backwards compatible with conventional SDP usage,
665	   the format field of the m-line MUST have the arbitrarily-selected
666	   value of "1".

668	   When the client wants to add a media processing resource to the
669	   session, it issues a SIP re-INVITE transaction.  The SDP offer/answer
670	   exchange carried by this SIP transaction contains one or more
671	   additional control m-lines for the new resources to be allocated to
672	   the session.  The server, on seeing the new m-line, allocates the
673	   resources (if they are available) and responds with a corresponding
674	   control m-line in the SDP answer carried in the SIP response.  If the
675	   new resources are not available, existing media processing going on
676	   before the RE-INVITE will continue as it was before.

678	   The a=setup attribute, as described in RFC4145 [RFC4145], MUST be
679	   "active" for the offer from the client and MUST be "passive" for the
680	   answer from the MRCPv2 server.  The a=connection attribute MUST have
681	   a value of "new" on the very first control m-line offer from the
682	   client to an MRCPv2 server.  Subsequent control m-line offers from
683	   the client to the MRCP server MAY contain "new" or "existing",
684	   depending on whether the client wants to set up a new connection or
685	   share an existing connection, respectively.  If the client specifies
686	   a value of "new", the server MUST respond with a value of "new".  If
687	   the client specifies a value of "existing", the server MAY respond
688	   with a value of "existing" if it prefers to share an existing
689	   connection or can answer with a value of "new", in which case the
690	   client MUST initiate a new transport connection.

692	   When the client wants to de-allocate the resource from this session,
693	   it issues a SIP re-INVITE transaction with the server.  The SDP MUST
694	   offer the control m-line with port 0.  The server MUST then answer
695	   the control m-line with a response of port 0.  This de-allocates the
696	   associated MRCPv2 identifier and resource.  The server MUST NOT close
697	   the TCP, SCTP or TLS connection if it is currently being shared among
698	   multiple MRCP channels.  When all MRCP channels that may be sharing
699	   the connection are released and/or the associated SIP dialog is
700	   terminated, the client or server terminates the connection.

702	   All servers MUST support TLS.  Servers MAY support TCP without TLS in
703	   physically secure environments.  It is up to the client, through the
704	   SDP offer, to choose which transport it wants to use for an MRCPv2
705	   session.  Aside from the exceptions given above, when using TCP the
706	   m-lines MUST conform to RFC4145 [RFC4145], which describes the usage
707	   of SDP for connection-oriented transport.  When using TLS the SDP
708	   m-line for the control pipe MUST conform to comedia over TLS
709	   [RFC4572], which specifies the usage of SDP for establishing a secure
710	   connection-oriented transport over TLS.

712	   This first example shows the power of using SIP to route to the
713	   appropriate resource.  In the example, note the use of a request to a
714	   domain's speech server service in the INVITE to
715	   mresources@example.com.  The SIP routing machinery in the domain
716	   locates the actual server, mresources@server.example.com, which gets
717	   returned in the 200 OK.  Note that "cmid" is defined in Section 4.3.

719	   This example exchange adds a resource control channel for a
720	   synthesizer.  Since a synthesizer also generates an audio stream,
721	   this interaction also creates a receive-only RTP media session for
722	   the server to send audio to.  The SIP dialog with the media source/
723	   sink is independent of MRCP and is not shown.

725	   C->S:  INVITE sip:mresources@example.com SIP/2.0
726	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
727	           branch=z9hG4bK74bf9
728	          Max-Forwards:6
729	          To:MediaServer <sip:mresources@example.com>
730	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
731	          Call-ID:a84b4c76e66710
732	          CSeq:314161 INVITE
733	          Contact:<sip:sarvi@client.example.com>
734	          Content-Type:application/sdp
735	          Content-Length:...

737	          v=0
738	          o=sarvi 2890844526 2890842808 IN IP4 192.0.2.4
739	          s=-
740	          c=IN IP4 192.0.2.12
741	          m=application 9 TCP/MRCPv2 1
742	          a=setup:active
743	          a=connection:new
744	          a=resource:speechsynth
745	          a=cmid:1
746	          m=audio 49170 RTP/AVP 0
747	          a=rtpmap:0 pcmu/8000
748	          a=recvonly
749	          a=mid:1

751	   S->C:  SIP/2.0 200 OK
752	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
753	           branch=z9hG4bK74bf9
754	          To:MediaServer <sip:mresources@example.com>;tag=62784
755	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
756	          Call-ID:a84b4c76e66710
757	          CSeq:314161 INVITE
758	          Contact:<sip:mresources@server.example.com>
759	          Content-Type:application/sdp
760	          Content-Length:...

762	          v=0
763	          o=- 2890844526 2890842808 IN IP4 192.0.2.4
764	          s=-
765	          c=IN IP4 192.0.2.11
766	          m=application 32416 TCP/MRCPv2 1
767	          a=setup:passive
768	          a=connection:new
769	          a=channel:32AECB234338@speechsynth
770	          a=cmid:1
771	          m=audio 48260 RTP/AVP 0
772	          a=rtpmap:0 pcmu/8000
773	          a=sendonly
774	          a=mid:1

776	   C->S:  ACK sip:mresources@server.example.com SIP/2.0
777	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
778	           branch=z9hG4bK74bf9
779	          Max-Forwards:6
780	          To:MediaServer <sip:mresources@example.com>;tag=62784
781	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
782	          Call-ID:a84b4c76e66710
783	          CSeq:314162 ACK
784	          Content-Length:...

786	                 Example: Add Synthesizer Control Channel

788	   This example exchange continues from the previous figure and
789	   allocates an additional resource control channel for a recognizer.
790	   Since a recognizer would need to receive an audio stream for
791	   recognition, this interaction also updates the audio stream to
792	   sendrecv, making it a 2-way RTP media session.

794	   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
795	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
796	           branch=z9hG4bK74bf9
797	          Max-Forwards:6
798	          To:MediaServer <sip:mresources@example.com>;tag=62784
799	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
800	          Call-ID:a84b4c76e66710
801	          CSeq:314163 INVITE
802	          Contact:<sip:sarvi@client.example.com>
803	          Content-Type:application/sdp
804	          Content-Length:...

806	          v=0
807	          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
808	          s=-
809	          c=IN IP4 192.0.2.12
810	          m=application 9 TCP/MRCPv2 1
811	          a=setup:active
812	          a=connection:existing
813	          a=resource:speechsynth
814	          a=cmid:1
815	          m=audio 49170 RTP/AVP 0 96
816	          a=rtpmap:0 pcmu/8000
817	          a=rtpmap:96 telephone-event/8000
818	          a=fmtp:96 0-15
819	          a=sendrecv
820	          a=mid:1
821	          m=application 9 TCP/MRCPv2 1
822	          a=setup:active
823	          a=connection:existing
824	          a=resource:speechrecog
825	          a=cmid:1

827	   S->C:  SIP/2.0 200 OK
828	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
829	           branch=z9hG4bK74bf9
830	          To:MediaServer <sip:mresources@example.com>;tag=62784
831	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
832	          Call-ID:a84b4c76e66710
833	          CSeq:314163 INVITE
834	          Contact:<sip:sarvi@example.com>
835	          Content-Type:application/sdp
836	          Content-Length:...

838	          v=0
839	          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
840	          s=-
841	          c=IN IP4 192.0.2.11
842	          m=application 32416 TCP/MRCPv2 1
843	          a=setup:passive
844	          a=connection:existing
845	          a=channel:32AECB234338@speechsynth
846	          a=cmid:1
847	          m=audio 48260 RTP/AVP 0 96
848	          a=rtpmap:0 pcmu/8000
849	          a=rtpmap:96 telephone-event/8000
850	          a=fmtp:96 0-15
851	          a=sendrecv
852	          a=mid:1
853	          m=application 32416 TCP/MRCPv2 1
854	          a=setup:passive
855	          a=connection:existing
856	          a=channel:32AECB234338@speechrecog
857	          a=cmid:1

859	   C->S:  ACK sip:mresources@server.example.com SIP/2.0
860	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
861	           branch=z9hG4bK74bf9

863	          Max-Forwards:6
864	          To:MediaServer <sip:mresources@example.com>;tag=62784
865	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
866	          Call-ID:a84b4c76e66710
867	          CSeq:314164 ACK
868	          Content-Length:...

870	                          Add Recognizer example

872	   This example exchange continues from the previous figure and de-
873	   allocates recognizer channel.  Since a recognizer no longer needs to
874	   receive an audio stream, this interaction also updates the RTP media
875	   session to recvonly.

877	   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
878	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
879	           branch=z9hG4bK74bf9
880	          Max-Forwards:6
881	          To:MediaServer <sip:mresources@example.com>;tag=62784
882	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
883	          Call-ID:a84b4c76e66710
884	          CSeq:314163 INVITE
885	          Contact:<sip:sarvi@client.example.com>
886	          Content-Type:application/sdp
887	          Content-Length:...

889	          v=0
890	          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
891	          s=-
892	          c=IN IP4 192.0.2.12
893	          m=application 9 TCP/MRCPv2 1
894	          a=resource:speechsynth
895	          a=cmid:1
896	          m=audio 49170 RTP/AVP 0
897	          a=rtpmap:0 pcmu/8000
898	          a=recvonly
899	          a=mid:1
900	          m=application 0 TCP/MRCPv2 1
901	          a=resource:speechrecog
902	          a=cmid:1

904	   S->C:  SIP/2.0 200 OK
905	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
906	           branch=z9hG4bK74bf9
907	          To:MediaServer <sip:mresources@example.com>;tag=62784
908	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
909	          Call-ID:a84b4c76e66710
910	          CSeq:314163 INVITE
911	          Contact:<sip:sarvi@example.com>
912	          Content-Type:application/sdp
913	          Content-Length:...

915	          v=0
916	          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
917	          s=-
918	          c=IN IP4 192.0.2.11
919	          m=application 32416 TCP/MRCPv2 1
920	          a=channel:32AECB234338@speechsynth
921	          a=cmid:1
922	          m=audio 48260 RTP/AVP 0
923	          a=rtpmap:0 pcmu/8000
924	          a=sendonly
925	          a=mid:1
926	          m=application 0 TCP/MRCPv2 1
927	          a=channel:32AECB234338@speechrecog
928	          a=cmid:1

930	   C->S:  ACK sip:mresources@server.example.com SIP/2.0
931	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
932	           branch=z9hG4bK74bf9
933	          Max-Forwards:6
934	          To:MediaServer <sip:mresources@example.com>;tag=62784
935	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
936	          Call-ID:a84b4c76e66710
937	          CSeq:314164 ACK
938	          Content-Length:...

940	                       Deallocate Recognizer example

942	4.3.  Media Streams and RTP Ports

944	   Since MRCPv2 resources either generate or consume media streams, the
945	   client or the server needs to associate media sessions with their
946	   corresponding resource or resources.  More than one resource could be
947	   associated with a single media session or each resource could be
948	   assigned a separate media session.  Also note that more than one
949	   media session can be associated with a single resource if need be,
950	   but this scenario is not useful for the current set of resources.
951	   For example, a synthesizer and a recognizer could be associated to
952	   the same media session (m=audio line), if it is opened in "sendrecv"
953	   mode.  Alternatively, the recognizer could have its own "sendonly"
954	   audio session and the synthesizer could have its own "recvonly" audio
955	   session.

957	   The association between control channels and their corresponding
958	   media sessions is established using a new "resource channel media
959	   identifier" media-level attribute ("cmid").  Valid values of this
960	   attribute are the values of the "mid" attribute defined in RFC3388
961	   [RFC3388].  If there is more than 1 audio m-line, then each audio
962	   m-line MUST have a "mid" attribute.  Each control m-line MAY have one
963	   or more "cmid" attributes that match the resource control channel to
964	   the "mid" attributes of the audio m-lines it is associated with.
965	   Note that if a control m-line does not have a "cmid" attribute it
966	   will not be associated with any media.  The operations on such a
967	   resource will hence be limited.  For example, if it was a recognizer
968	   resource, the RECOGNIZE method requires an associated media to
969	   process while the INTERPRET method does not.  The formatting of the
970	   "cmid" attribute is described by the following ABNF:

972	   cmid-attribute = "a=cmid:" identification-tag
973	   identification-tag = token

975	   To allow this flexible mapping of media sessions to MRCPv2 control
976	   channels, a single audio m-line can be associated with multiple
977	   resources or each resource can have its own audio m-line.  For
978	   example, if the client wants to allocate a recognizer and a
979	   synthesizer and associate them with a single 2-way audio pipe, the
980	   SDP offer would contain two control m-lines and a single audio m-line
981	   with an attribute of "sendrecv".  Each of the control m-lines would
982	   have a "cmid" attribute whose value matches the "mid" of the audio
983	   m-line.  If, on the other hand, the client wants to allocate a
984	   recognizer and a synthesizer each with its own separate audio pipe,
985	   the SDP offer would carry two control m-lines (one for the recognizer
986	   and another for the synthesizer) and two audio m-lines (one with the
987	   attribute "sendonly" and another with attribute "recvonly").  The
988	   "cmid" attribute of the recognizer control m-line would match the
989	   "mid" value of the "sendonly" audio m-line and the "cmid" attribute
990	   of the synthesizer control m-line would match the "mid" attribute of
991	   the "recvonly" m-line.

993	   When a server receives media (e.g. audio) on a media session that is
994	   associated with more than one media processing resource, it is the
995	   responsibility of the server to receive and fork the media to the
996	   resources that need to consume it.  If multiple resources in an
997	   MRCPv2 session are generating audio (or other media) to be sent on a
998	   single associated media session, it is the responsibility of the
999	   server to either multiplex the multiple streams onto the single RTP
1000	   session or contain an embedded RTP mixer (see RFC3550 [RFC3550]) to
1001	   combine the multiple streams into one.  In the former case, the media
1002	   stream will contain RTP packets generated by different sources, and
1003	   hence the packets will have different Synchronization Source
1004	   identifiers (SSRCs).  In the latter case, the RTP packets will
1005	   contain multiple (CSRCs) corresponding to the original streams before
1006	   being combined by the mixer.  An MRCPv2 implementation MUST either
1007	   multiplex or mix unless it cannot correctly do either, in which case
1008	   the server MUST disallow the client from associating multiple such
1009	   resources to a single audio pipe by rejecting the SDP offer with a
1010	   SIP 501 "Not Implemented" error.

1012	4.4.  MRCPv2 Message Transport

1014	   The MRCPv2 messages defined in this document are transported over a
1015	   TCP, TLS or SCTP (in the future) connection between the client and
1016	   the server.  The method for setting up this transport connection and
1017	   the resource control channel is discussed in Section 4.1 and
1018	   Section 4.2.  Multiple resource control channels between a client and
1019	   a server that belong to different SIP dialogs can share one or more
1020	   TLS, TCP or SCTP connections between them; the server and client MUST
1021	   support this mode of operation.  The individual MRCPv2 messages carry
1022	   the MRCPv2 channel identifier in their Channel-Identifier header,
1023	   which MUST be used to differentiate MRCPv2 messages from different
1024	   resource channels (see Section 6.2.1 for details).  All MRCPv2
1025	   servers MUST support TLS.  Servers MAY support TCP without TLS in
1026	   physically secure environments.  It is up to the client to choose
1027	   which mode of transport it wants to use for an MRCPv2 session.

1029	   Most examples from here on show only the MRCPv2 messages and do not
1030	   show the SIP messages and headers that may have been used to
1031	   establish the MRCPv2 control channel.

1033	5.  MRCPv2 Specification

1035	   MRCPv2 messages are textual using the ISO 10646 character set in the
1036	   UTF-8 encoding (RFC3629 [RFC3629]) to allow many different languages
1037	   to be represented.  However, to assist in compact representations,
1038	   MRCPv2 also allows message bodies to be represented in other
1039	   character sets such as ISO 8859-1.  This may be useful for languages
1040	   such as Chinese where the default character set for most documents is
1041	   not UTF-8.  The MRCPv2 protocol headers (the first line of an MRCP
1042	   message) and header names use only the US-ASCII subset of UTF-8.
1043	   Internationalization only applies to certain fields like grammar,
1044	   results, speech markup etc, and not to MRCPv2 as a whole.

1046	   Lines are terminated by CRLF.  Also, some parameters in the message
1047	   may contain binary data or a record spanning multiple lines.  Such
1048	   fields have a length value associated with the parameter, which
1049	   indicates the number of octets immediately following the parameter.

1051	5.1.  Common Protocol Elements

1053	   The MRCPv2 message set consists of requests from the client to the
1054	   server, responses from the server to the client and asynchronous
1055	   events from the server to the client.  All these messages consist of
1056	   a start-line, one or more headers, an empty line (i.e. a line with
1057	   nothing preceding the CRLF) indicating the end of the header fields,
1058	   and an optional message body.

1060	   generic-message  =    start-line
1061	                         message-header
1062	                         CRLF
1063	                         [ message-body ]

1065	   start-line       =    request-line / response-line / event-line

1067	   message-header   =    1*(generic-header / resource-header)

1069	   resource-header  =    recognizer-header
1070	                    /    synthesizer-header
1071	                    /    recorder-header
1072	                    /    verifier-header

1074	   The message-body contains resource-specific and message-specific
1075	   data.  The actual Media Types used to carry the data are specified
1076	   later in the sections defining the individual messages.  Generic
1077	   headers are described in Section 6.2.

1079	   If a message contains a message body, the message MUST contain
1080	   content-headers indicating the Media Type and encoding of the data in
1081	   the message body.

1083	   Request, response and event messages (described in following
1084	   sections) include the version of MRCP that the message conforms to.
1085	   Version compatibility rules follow [H3.1] regarding version ordering,
1086	   compliance requirements, and upgrading of version numbers.  The
1087	   version information is indicated by "MRCP" (as opposed to "HTTP" in
1088	   [H3.1]) or "MRCP/2.0" (as opposed to "HTTP/1.1" in [H3.1]).  To be
1089	   compliant with this specification, clients and servers sending MRCPv2
1090	   messages MUST indicate an mrcp-version of "MRCP/2.0".

1092	   mrcp-version   =    "MRCP" "/" 1*2DIGIT "." 1*2DIGIT

1094	   The message-length field specifies the length of the message in
1095	   bytes, including the start-line, and MUST be the 2nd token from the
1096	   beginning of the message.  This is to make the framing and parsing of
1097	   the message simpler to do.  This field specifies the length of the
1098	   message including data that may be encoded into the body of the
1099	   message.  Note that this value MAY be printed as a fixed-length
1100	   integer that is zero-padded in front in order to eliminate or reduce
1101	   inefficiency in cases where the message-length value would change as
1102	   a result of the length of the message-length token itself.

1104	   message-length =    1*19DIGIT

1106	   All MRCPv2 messages, responses and events MUST carry the Channel-
1107	   Identifier header so the server or client can differentiate messages
1108	   from different control channels that may share the same transport
1109	   connection.

1111	   In the resource-specific header descriptions in sections 8-11, a
1112	   header is disallowed on a method (request, response, or event) for
1113	   that resource unless specifically listed as being allowed.  Also, the
1114	   phrasing "This header MAY occur on method X" indicates that the
1115	   header is allowed on that method but is not required to be used in
1116	   every instance of that method.

1118	5.2.  Request

1120	   An MRCPv2 request consists of a Request line followed by message
1121	   headers and an optional message body containing data specific to the
1122	   request message.

1124	   The Request message from a client to the server includes within the
1125	   first line the method to be applied, a method tag for that request
1126	   and the version of the protocol in use.

1128	   request-line   =    mrcp-version SP message-length SP method-name
1129	                       SP request-id CRLF

1131	   The mrcp-version field is the MRCP protocol version that is being
1132	   used by the client.

1134	   The message-length field specifies the length of the message,
1135	   including the start-line.

1137	   Details about the mrcp-version and message-length fields are given in
1138	   Section 5.1.

1140	   The method-name field identifies the specific request that the client
1141	   is making to the server.  Each resource supports a subset of the
1142	   MRCPv2 methods.  The subset for each resource is defined in the
1143	   section of the specification for the corresponding resource.

1145	   method-name    =    generic-method
1146	                  /    synthesizer-method
1147	                  /    recorder-method
1148	                  /    recognizer-method
1149	                  /    verifier-method

1151	   The request-id field is a unique identifier representable as an
1152	   unsigned 32 bit integer created by the client and sent to the server.
1153	   Consecutive requests within an MRCP session MUST utilize
1154	   monotonically increasing request-id's.  The request-id space is
1155	   linear, (i.e. not mod(32)) so the space does not wrap and validity
1156	   can be checked with a simple unsigned comparison operation.  The
1157	   client may choose any initial value for its first request, but a
1158	   small integer is RECOMMENDED to avoid exhausting the space in long
1159	   sessions.  If the server receives duplicate or out-of-order requests
1160	   the server MUST reject the request with a response code of 410.
1161	   Since request-id's are scoped to the MRCP session, they are unique
1162	   across all TCP connections and all resource channels in the session.

1164	   The server resource MUST use the client-assigned identifier in its
1165	   response to the request.  If the request does not complete
1166	   synchronously, future asynchronous events associated with this
1167	   request MUST carry the client-assigned request-id.

1169	   request-id     =    1*10DIGIT

1171	5.3.  Response

1173	   After receiving and interpreting the request message for a method,
1174	   the server resource responds with an MRCPv2 response message.  The
1175	   response consists of a response line followed by message headers and
1176	   an optional message body containing data specific to the method.

1178	   response-line  =    mrcp-version SP message-length SP request-id
1179	                                    SP status-code SP request-state CRLF

1181	   The mrcp-version field MUST contain the version of the request if
1182	   supported; otherwise, it must contain the highest version of the
1183	   MRCPv2 protocol supported by the server.

1185	   The message-length field specifies the length of the message,
1186	   including the start-line.

1188	   Details about the mrcp-version and message-length fields are given in
1189	   Section 5.1.

1191	   The request-id used in the response MUST match the one sent in the
1192	   corresponding request message.

1194	   The status-code field is a 3-digit code representing the success or
1195	   failure or other status of the request.

1197	   The request-state field indicates if the action initiated by the
1198	   Request is PENDING, IN-PROGRESS or COMPLETE.  The COMPLETE status
1199	   means that the Request was processed to completion and that there
1200	   will be no more events or other messages from that resource to the
1201	   client with that request-id.  The PENDING status means that the
1202	   request has been placed on a queue and will be processed in first-in-
1203	   first-out order.  The IN-PROGRESS status means that the request is
1204	   being processed and is not yet complete.  A PENDING or IN-PROGRESS
1205	   status indicates that further Event messages may be delivered with
1206	   that request-id.

1208	   request-state    =  "COMPLETE"
1209	                    /  "IN-PROGRESS"
1210	                    /  "PENDING"

1212	5.4.  Status Codes

1214	   The status codes are classified under the Success (2XX) codes, Client
1215	   Failure (4XX) codes, and Server Failure (5XX).

1217	                               Success Codes

1219	        +------------+--------------------------------------------+
1220	        | Code       | Meaning                                    |
1221	        +------------+--------------------------------------------+
1222	        | 200        | Success                                    |
1223	        | 201        | Success with some optional headers ignored |
1224	        +------------+--------------------------------------------+

1226	                                Success 2xx

1228	                         Client Failure 4xx Codes

1230	   +------------+------------------------------------------------------+
1231	   | Code       | Meaning                                              |
1232	   +------------+------------------------------------------------------+
1233	   | 401        | Method not allowed                                   |
1234	   | 402        | Method not valid in this state                       |
1235	   | 403        | Unsupported Header                                   |
1236	   | 404        | Illegal Value for Header.  This is the error for a   |
1237	   |            | syntax violation.                                    |
1238	   | 405        | Resource not allocated for this session or does not  |
1239	   |            | exist                                                |
1240	   | 406        | Mandatory Header Missing                             |
1241	   | 407        | Method or Operation Failed (e.g., Grammar            |
1242	   |            | compilation failed in the recognizer.  Detailed      |
1243	   |            | cause codes MAY BE available through a resource      |
1244	   |            | specific header.)                                    |
1245	   | 408        | Unrecognized or unsupported message entity           |
1246	   | 409        | Unsupported Header Value.  This is a value that is   |
1247	   |            | syntactically legal but exceeds the implementation's |
1248	   |            | capabilities or expectations.                        |
1249	   | 410        | Non-Monotonic or Out of order sequence number in     |
1250	   |            | request.                                             |
1251	   | 411-420    | Reserved for future assignment                       |
1252	   +------------+------------------------------------------------------+

1254	                            Client Failure 4xx

1256	                         Server Failure 5xx Codes

1258	   +------------+------------------------------------------------------+
1259	   | Code       | Meaning                                              |
1260	   +------------+------------------------------------------------------+
1261	   | 501        | Server Internal Error                                |
1262	   | 502        | Protocol Version not supported                       |
1263	   | 503        | Proxy Timeout.  The MRCP Proxy did not receive a     |
1264	   |            | response from the MRCP server.                       |
1265	   | 504        | Message too large                                    |
1266	   +------------+------------------------------------------------------+

1268	                            Server Failure 4xx

1270	5.5.  Events

1272	   The server resource may need to communicate a change in state or the
1273	   occurrence of a certain event to the client.  These messages are used
1274	   when a request does not complete immediately and the response returns
1275	   a status of PENDING or IN-PROGRESS.  The intermediate results and
1276	   events of the request are indicated to the client through the event
1277	   message from the server.  The event message consists of an event
1278	   header line followed by message headers and an optional message body
1279	   containing data specific to the event message.  The header line has
1280	   the request-id of the corresponding request and status value.  The
1281	   request-state value is COMPLETE if the request is done and this was
1282	   the last event, else it is IN-PROGRESS.

1284	   event-line       =  mrcp-version SP message-length SP event-name
1285	                                    SP request-id SP request-state CRLF

1287	   The mrcp-version used here is identical to the one used in the
1288	   Request/Response Line and indicates the version of the MRCPv2
1289	   protocol running on the server.

1291	   The message-length field specifies the length of the message,
1292	   including the start-line.

1294	   Details about the mrcp-version and message-length fields are given in
1295	   Section 5.1.

1297	   The event-name identifies the nature of the event generated by the
1298	   media resource.  The set of valid event names depends on the resource
1299	   generating it.  See the corresponding resource-specific section of
1300	   the document.

1302	   event-name       =  synthesizer-event
1303	                    /  recognizer-event
1304	                    /  recorder-event
1305	                    /  verifier-event

1307	   The request-id used in the event MUST match the one sent in the
1308	   request that caused this event.

1310	   The request-state indicates whether the Request/Command causing this
1311	   event is complete or still in progress, and is the same as the one
1312	   mentioned in Section 5.3.  The final event for a request has a
1313	   COMPLETE status indicating the completion of the request.

1315	6.  MRCPv2 Generic Methods, Headers, and Result Structure

1317	   MRCPv2 supports a set of methods and headers that are common to all
1318	   resources.  These are discussed here; resource-specific methods and
1319	   headers are discussed in the corresponding resource-specific section
1320	   of the document.

1322	6.1.  Generic Methods

1324	   MRCPv2 supports two generic methods for reading and writing the state
1325	   associated with a resource.

1327	   generic-method      =    "SET-PARAMS"
1328	                       /    "GET-PARAMS"

1330	   These are described in the following sub-sections.

1332	6.1.1.  SET-PARAMS

1334	   The "SET-PARAMS" method, from the client to the server, tells the
1335	   MRCPv2 resource to define parameters for the session, such as voice
1336	   characteristics and prosody on synthesizers, recognition timers on
1337	   recognizers, etc.  If the server accepts and sets all parameters it
1338	   MUST return a Response-Status of 200.  If it chooses to ignore some
1339	   optional headers that can be safely ignored without affecting
1340	   operation of the server it MUST return 201.

1342	   If one or more of the headers being sent is incorrect, error 403,
1343	   404, or 409 MUST be returned as follows:
1344	   o  If one or more of the headers being set has an illegal value, the
1345	      server MUST reject the request with a 404 Illegal Value for
1346	      Header.
1347	   o  If one or more of the headers being set is unsupported for the
1348	      resource, the server MUST reject the request with a 403
1349	      Unsupported Header, except as described in the next paragraph.
1350	   o  If one or more of the headers being set has an unsupported value,
1351	      the server MUST reject the request with a 409 Unsupported Header
1352	      Value, except as described in the next paragraph.

1354	   If both error 404 and another error have occurred, only error 404
1355	   MUST be returned.  If both errors 403 and 409 have occurred, but not
1356	   error 404, only error 403 MUST be returned.

1358	   If error 403, 404, or 409 is returned, the response MUST include the
1359	   bad or unsupported headers and their values exactly as they were sent
1360	   from the client.  Session parameters modified using "SET-PARAMS" do
1361	   not override parameters explicitly specified on individual requests
1362	   or requests that are in-PROGRESS.

1364	   C->S:  MRCP/2.0 124 SET-PARAMS 543256
1365	          Channel-Identifier:32AECB23433802@speechsynth
1366	          Voice-gender:female
1367	          Voice-variant:3

1369	   S->C:  MRCP/2.0 47 543256 200 COMPLETE
1370	          Channel-Identifier:32AECB23433802@speechsynth

1372	6.1.2.  GET-PARAMS

1374	   The "GET-PARAMS" method, from the client to the server, asks the
1375	   MRCPv2 resource for its current session parameters, such as voice
1376	   characteristics and prosody on synthesizers, recognition-timer on
1377	   recognizers, etc.  For every empty header field the client sends in
1378	   the request, the server MUST include the corresponding headers and
1379	   their values in the response.  If no parameter headers are specified
1380	   by the client then the server MUST return all the settable parameters
1381	   and their values in the corresponding headers of the response,
1382	   including vendor-specific parameters.  Such wild-card parameter
1383	   requests can be very processing-intensive, since the number of
1384	   settable parameters can be large depending on the implementation.
1385	   Hence, it is RECOMMENDED that the client not use the wildcard
1386	   "GET-PARAMS" operation very often.  Note that "GET-PARAMS" returns
1387	   header values that apply to the whole session and not values that
1388	   have a request level scope.

1390	   If all of the headers requested are supported, the server MUST return
1391	   a Response-Status of 200.  If some of the headers being retrieved are
1392	   unsupported for the resource, the server MUST reject the request with
1393	   a 403 Unsupported Header.  Such a response MUST include the (empty)
1394	   unsupported headers exactly as they were sent from the client.

1396	   C->S:   MRCP/2.0 136 GET-PARAMS 543256
1397	           Channel-Identifier:32AECB23433802@speechsynth
1398	           Voice-gender:
1399	           Voice-variant:
1400	           Vendor-Specific-Parameters:com.example.param1;
1401	                         com.example.param2

1403	   S->C:   MRCP/2.0 163 543256 200 COMPLETE
1404	           Channel-Identifier:32AECB23433802@speechsynth
1405	           Voice-gender:female
1406	           Voice-variant:3
1407	           Vendor-Specific-Parameters:com.example.param1="Company Name";
1408	                         com.example.param2="124324234@example.com"

1410	6.2.  Generic Message Headers

1412	   All MRCPv2 headers, which include both the generic-headers defined in
1413	   the following sub-sections and the resource-specific headers defined
1414	   later, follow the same generic format as that given in Section 3.1 of
1415	   RFC5322 [RFC5322].  Each header consists of a name followed by a
1416	   colon (":") and the value.  Header names are case-insensitive.  The
1417	   value MAY be preceded by any amount of LWS, though a single SP is
1418	   preferred.  Headers may extend over multiple lines by preceding each
1419	   extra line with at least one SP or HT.

1421	   message-header = field-name ":" [ field-value ]
1422	   field-name     = token
1423	   field-value    = *LWS field-content *( CRLF 1*LWS field-content)
1424	   field-content  = <the OCTETs making up the field-value
1425	                       and consisting of either *TEXT or combinations
1426	                       of token, separators, and quoted-string>

1428	   The field-content does not include any leading or trailing LWS (i.e.
1429	   linear white space occurring before the first non-whitespace
1430	   character of the field-value or after the last non-whitespace
1431	   character of the field-value).  Such leading or trailing LWS MAY be
1432	   removed without changing the semantics of the field value.  Any LWS
1433	   that occurs between field-content MAY be replaced with a single SP
1434	   before interpreting the field value or forwarding the message
1435	   downstream.

1437	   MRCPv2 servers and clients MUST NOT depend on header order.  It is
1438	   "good practice" to send general-header fields first, followed by
1439	   request-header or response-header fields, and ending with the entity-
1440	   header fields.  However, MRCPv2 servers and clients MUST be prepared
1441	   to process the headers in any order.  The only exception to this rule
1442	   is when there are multiple headers with the same header name in a
1443	   message.

1445	   Multiple headers with the same name MAY be present in a message if
1446	   and only if the entire value for that header is defined as a comma-
1447	   separated list [i.e., #(values)].

1449	   Since vendor-specific parameters may be order-dependent, it MUST be
1450	   possible to combine multiple headers of the same name into one
1451	   "header:value" pair without changing the semantics of the message, by
1452	   appending each subsequent value to the first, each separated by a
1453	   comma.  The order in which headers with the same name are received is
1454	   therefore significant to the interpretation of the combined header
1455	   value, and thus an intermediary MUST NOT change the order of these
1456	   values when a message is forwarded.

1458	   generic-header      =    channel-identifier
1459	                       /    accept
1460	                       /    active-request-id-list
1461	                       /    proxy-sync-id
1462	                       /    accept-charset
1463	                       /    content-type
1464	                       /    content-id
1465	                       /    content-base
1466	                       /    content-encoding
1467	                       /    content-location
1468	                       /    content-length
1469	                       /    fetch-timeout
1470	                       /    cache-control
1471	                       /    logging-tag
1472	                       /    set-cookie
1473	                       /    set-cookie2
1474	                       /    vendor-specific

1476	6.2.1.  Channel-Identifier

1478	   All MRCPv2 requests, responses and events MUST contain the Channel-
1479	   Identifier header.  The value is allocated by the server when a
1480	   control channel is added to the session and communicated to the
1481	   client by the "a=channel" attribute in the SDP answer from the
1482	   server.  The header value consists of 2 parts separated by the '@'
1483	   symbol.  The first part is an unambiguous string identifying the
1484	   MRCPv2 session.  The second part is a string token which specifies
1485	   one of the media processing resource types listed in Section 3.1.
1486	   The unambiguous string (first part) MUST BE unique among the resource
1487	   instances managed by the server and is common to all resource
1488	   channels with that server established through a single SIP dialog.

1490	   channel-identifier  = "Channel-Identifier" ":" channel-id CRLF
1491	   channel-id          = 1*alphanum "@" 1*alphanum

1493	6.2.2.  Accept

1495	   The Accept header field follows the syntax defined in [H14.1].  The
1496	   semantics are also identical, with the exception that if no Accept
1497	   header field is present, the server MUST assume a default value that
1498	   is specific to the resource type that is being controlled.  This
1499	   default value can be changed for a resource on a session by sending
1500	   this header in a SET-PARAMS method.  The current default value of
1501	   this header for a resource in a session can be found through a GET-
1502	   PARAMS method.  This header MAY occur on any request.

1504	6.2.3.  Active-Request-Id-List

1506	   In a request, this header indicates the list of request-ids to which
1507	   the request applies.  This is useful when there are multiple requests
1508	   that are PENDING or IN-PROGRESS and the client wants this request to
1509	   apply to one or more of these specifically.

1511	   In a response, this header returns the list of request-ids that the
1512	   method modified or affected.  There could be one or more requests in
1513	   a request-state of PENDING or IN-PROGRESS.  When a method affecting
1514	   one or more PENDING or IN-PROGRESS requests is sent from the client
1515	   to the server, the response MUST contain the list of request-ids that
1516	   were affected or modified by this command in its header.

1518	   The active-request-id-list is only used in requests and responses,
1519	   not in events.

1521	   For example, if a "STOP" request with no active-request-id-list is
1522	   sent to a synthesizer resource which has one or more "SPEAK" requests
1523	   in the PENDING or IN-PROGRESS state, all "SPEAK" requests MUST be
1524	   cancelled, including the one IN-PROGRESS.  The response to the "STOP"
1525	   request contains in the active-request-id-list the request-ids of all
1526	   the "SPEAK" requests that were terminated.  After sending the STOP
1527	   response, the server MUST NOT send any SPEAK-COMPLETE or RECOGNITION-
1528	   COMPLETE events for the terminated requests.

1530	   active-request-id-list  =  "Active-Request-Id-List" ":"
1531	                              request-id *("," request-id) CRLF

1533	6.2.4.  Proxy-Sync-Id

1535	   When any server resource generates a barge-in-able event, it also
1536	   generates a unique tag.  The tag is sent as this header's value in an
1537	   event to the client.  The client then acts as a intermediary among
1538	   the server resources and sends a BARGE-IN-OCCURRED method to the
1539	   synthesizer server resource with the Proxy-Sync-Id it received from
1540	   the server resource.  When the recognizer and synthesizer resources
1541	   are part of the same session, they may choose to work together to
1542	   achieve quicker interaction and response.  Here the proxy-sync-id
1543	   helps the resource receiving the event, intermediated by the client,
1544	   to decide if this event has been processed through a direct
1545	   interaction of the resources.  This header MAY occur only on events
1546	   and the BARGE-IN-OCCURRED method.

1548	   proxy-sync-id    =  "Proxy-Sync-Id" ":" 1*VCHAR CRLF

1550	6.2.5.  Accept-Charset

1552	   See [H14.2].  This specifies the acceptable character sets for
1553	   entities returned in the response or events associated with this
1554	   request.  This is useful in specifying the character set to use in
1555	   the NLSML results of a "RECOGNITION-COMPLETE" event.  This header is
1556	   only used on requests.

1558	6.2.6.  Content-Type

1560	   See [H14.17].  MRCPv2 supports a restricted set of registered Media
1561	   Types for content, including speech markup, grammar, and recognition
1562	   results.  The content types applicable to each MRCPv2 resource-type
1563	   are specified in the corresponding section of the document.  The
1564	   multi-part content type "multi-part/mixed" is supported to
1565	   communicate multiple of the above mentioned contents, in which case
1566	   the body parts MUST NOT contain any MRCPv2 specific headers.  This
1567	   header MAY occur on all messages.

1569	6.2.7.  Content-ID

1571	   This header contains an ID or name for the content by which it can be
1572	   referenced.  This header operates according to the specification in
1573	   RFC2392 [RFC2392] and is required for content disambiguation in
1574	   multi-part messages.  In MRCPv2 whenever the associated content is
1575	   stored, by either the client or the server, it MUST be retrievable
1576	   using this ID.  Such content can be referenced later in a session by
1577	   addressing it with the "session:" URI scheme described in
1578	   Section 13.6.  This header MAY occur on all messages.

1580	6.2.8.  Content-Base

1582	   The content-base entity-header may be used to specify the base URI
1583	   for resolving relative URLs within the entity.

1585	   content-base      = "Content-Base" ":" absoluteURI CRLF

1587	   Note, however, that the base URI of the contents within the entity-
1588	   body may be redefined within that entity-body.  An example of this
1589	   would be multi-part media, which in turn can have multiple entities
1590	   within it.  This header MAY occur on all messages.

1592	6.2.9.  Content-Encoding

1594	   The content-encoding entity-header is used as a modifier to the
1595	   media-type.  When present, its value indicates what additional
1596	   content encoding has been applied to the entity-body, and thus what
1597	   decoding mechanisms must be applied in order to obtain the media-type
1598	   referenced by the content-type header.  Content-encoding is primarily
1599	   used to allow a document to be compressed without losing the identity
1600	   of its underlying media type.  Note that the SDP session can be used
1601	   to determine accepted encodings (see Section 7).  This header MAY
1602	   occur on all messages.

1604	   content-encoding  = "Content-Encoding" ":"
1605	                       *WSP content-coding
1606	                       *(*WSP "," *WSP content-coding *WSP )
1607	                       CRLF

1609	   Content-coding is defined in [H3.5].  An example of its use is
1610	   Content-Encoding:gzip

1612	   If multiple encodings have been applied to an entity, the content
1613	   encodings MUST be listed in the order in which they were applied.

1615	6.2.10.  Content-Location

1617	   The content-location entity-header MAY be used to supply the resource
1618	   location for the entity enclosed in the message when that entity is
1619	   accessible from a location separate from the requested resource's
1620	   URI.  Refer to [H14.14].

1622	   content-location  =  "Content-Location" ":"
1623	                        ( absoluteURI / relativeURI ) CRLF

1625	   The content-location value is a statement of the location of the
1626	   resource corresponding to this particular entity at the time of the
1627	   request.  This header is provided for optimization purposes only.
1628	   The receiver of this header MAY assume that the entity being sent is
1629	   identical to what would have been retrieved or might already have
1630	   been retrieved from the content-location URI.

1632	   For example, if the client provided a grammar markup inline, and it
1633	   had previously retrieved it from a certain URI, that URI can be
1634	   provided as part of the entity, using the content-location header.
1635	   This allows a resource like the recognizer to look into its cache to
1636	   see if this grammar was previously retrieved, compiled and cached.
1637	   In this case, it might optimize by using the previously compiled
1638	   grammar object.

1640	   If the content-location is a relative URI, the relative URI is
1641	   interpreted relative to the content-base URI.  This header MAY occur
1642	   on all messages.

1644	6.2.11.  Content-Length

1646	   This header contains the length of the content of the message body
1647	   (i.e. after the double CRLF following the last header field).  Unlike
1648	   HTTP, it MUST be included in all messages that carry content beyond
1649	   the header portion of the message.  If it is missing, a default value
1650	   of zero is assumed.  Otherwise, it is interpreted according to
1651	   [H14.13].  When a message having no use for a message body contains
1652	   one, i.e. the Content-Length is non-zero, the receiver MUST ignore
1653	   the content of the message body.  This header MAY occur on all
1654	   messages.

1656	6.2.12.  Fetch Timeout

1658	   When the recognizer or synthesizer needs to fetch documents or other
1659	   resources this header controls the corresponding URI access
1660	   properties.  This defines the timeout for content that the server may
1661	   need to fetch over the network.  The value is interpreted to be in
1662	   milliseconds and ranges from 0 to an implementation-specific maximum
1663	   value.  The default value for this header is implementation-specific.
1664	   This header MAY occur in "DEFINE-GRAMMAR", "RECOGNIZE", "SPEAK",
1665	   "SET-PARAMS" or "GET-PARAMS".

1667	   fetch-timeout       =   "Fetch-Timeout" ":" 1*19DIGIT CRLF

1669	6.2.13.  Cache-Control

1671	   If the server implements content caching, it MUST adhere to the cache
1672	   correctness rules of HTTP 1.1 [RFC2616] when accessing and caching
1673	   stored content.  In particular, the "expires" and "cache-control"
1674	   headers of the cached URI or document MUST be honored and take
1675	   precedence over the Cache-Control defaults set by this header.  The
1676	   cache-control directives are used to define the default caching
1677	   algorithms on the server for the session or request.  The scope of
1678	   the directive is based on the method it is sent on.  If the
1679	   directives are sent on a "SET-PARAMS" method, it applies for all
1680	   requests for external documents the server makes during that session,
1681	   unless overridden by a cache-control header on an individual request.
1682	   If the directives are sent on any other requests they apply only to
1683	   external document requests the server makes for that request.  An
1684	   empty cache-control header on the "GET-PARAMS" method is a request
1685	   for the server to return the current cache-control directives setting
1686	   on the server.  This header MAY occur only on requests.

1688	   cache-control       = "Cache-Control" ":" cache-directive
1689	                         *("," *LWS cache-directive) CRLF

1691	   cache-directive     = "max-age" "=" delta-seconds
1692	                       / "max-stale" [ "=" delta-seconds ]
1693	                       / "min-fresh" "=" delta-seconds

1695	   delta-seconds       = 1*19DIGIT

1697	   Here delta-seconds is a decimal time value specifying the number of
1698	   seconds since the instant the message response or data was received
1699	   by the server.

1701	   The cache-directives allow the client to ask the server to override
1702	   the default cache expiration mechanisms.
1703	   max-age        Indicates that the client can tolerate the server
1704	                  using content whose age is no greater than the
1705	                  specified time in seconds.  Unless a max-stale
1706	                  directive is also included, the client is not willing
1707	                  to accept a response based on stale data.
1708	   min-fresh      Indicates that the client is willing to accept a
1709	                  server response with cached data whose expiration is
1710	                  no less than its current age plus the specified time
1711	                  in seconds.  If the server's cache time to live
1712	                  exceeds the client-supplied min-fresh value, the
1713	                  server MUST NOT utilize cached content.
1714	   max-stale      Indicates that the client is willing to allow a server
1715	                  to utilize cached data that has exceeded its
1716	                  expiration time.  If max-stale is assigned a value,
1717	                  then the client is willing to allow the server to use
1718	                  cached data that has exceeded its expiration time by
1719	                  no more than the specified number of seconds.  If no
1720	                  value is assigned to max-stale, then the client is
1721	                  willing to allow the server to use stale data of any
1722	                  age.

1724	   The server cache MAY be requested to use stale response/data without
1725	   validation, but only if this does not conflict with any "MUST"-level
1726	   requirements concerning cache validation (e.g., a "must-revalidate"
1727	   cache-control directive in the HTTP 1.1 specification pertaining to
1728	   the corresponding URI).

1730	   If both the MRCPv2 cache-control directive and the cached entry on
1731	   the server include "max-age" directives, then the lesser of the two
1732	   values is used for determining the freshness of the cached entry for
1733	   that request.

1735	6.2.14.  Logging-Tag

1737	   This header MAY be sent as part of a "SET-PARAMS"/"GET-PARAMS" method
1738	   to set or retrieve the logging tag for logs generated by the server.
1739	   Once set, the value persists until a new value is set or the session
1740	   ends.  The MRCPv2 server MAY provide a mechanism to subset its output
1741	   logs so that system administrators can examine or extract only the
1742	   log file portion during which the logging tag was set to a certain
1743	   value.

1745	   It is RECOMMENDED that clients have some identifying information in
1746	   the logging tag, so that one can determine which client request
1747	   generated a given log message at the server.

1749	   logging-tag    = "Logging-Tag" ":" 1*UTFCHAR CRLF

1751	6.2.15.  Set-Cookie and Set-Cookie2

1753	   Since the associated HTTP client on an MRCPv2 server fetches
1754	   documents for processing on behalf of the MRCPv2 client, the cookie
1755	   store in the HTTP client of the MRCPv2 server is treated as an
1756	   extension of the cookie store in the HTTP client of the MRCPv2
1757	   client.  This requires that the MRCPv2 client and server be able to
1758	   synchronize their common cookie store as needed.  To enable the
1759	   MRCPv2 client to push its stored cookies to the MRCPv2 server and get
1760	   new cookies from the MRCPv2 server stored back to the MRCPv2 client,
1761	   the set-cookie and set-cookie2 entity-header fields MAY be included
1762	   in MRCPv2 requests to update the cookie store on a server and be
1763	   returned in final MRCPv2 responses or events to subsequently update
1764	   the client's own cookie store.  The stored cookies on the server
1765	   persist for the duration of the MRCPv2 session and MUST be destroyed
1766	   at the end of the session.  To ensure support for the type of cookie
1767	   header dictated by the HTTP origin server, MRCPv2 clients and servers
1768	   MUST support both the set-cookie and set-cookie2 entity header
1769	   fields.

1771	   set-cookie      =       "Set-Cookie:" cookies CRLF
1772	   cookies         =       cookie *("," *LWS cookie)
1773	   cookie          =       attribute "=" value *(";" cookie-av)
1774	   cookie-av       =       "Comment" "=" value
1775	                   /       "Domain" "=" value
1776	                   /       "Max-Age" "=" value
1777	                   /       "Path" "=" value
1778	                   /       "Secure"
1779	                   /       "Version" "=" 1*19DIGIT
1780	                   /       "Age" "=" delta-seconds

1782	   set-cookie2     =       "Set-Cookie2:" cookies2 CRLF
1783	   cookies2        =       cookie2 *("," *LWS cookie2)
1784	   cookie2         =       attribute "=" value *(";" cookie-av2)
1785	   cookie-av2      =       "Comment" "=" value
1786	                   /       "CommentURL" "=" DQUOTE uri DQUOTE
1787	                   /       "Discard"
1788	                   /       "Domain" "=" value
1789	                   /       "Max-Age" "=" value
1790	                   /       "Path" "=" value
1791	                   /       "Port" [ "=" DQUOTE portlist DQUOTE ]
1792	                   /       "Secure"
1793	                   /       "Version" "=" 1*19DIGIT
1794	                   /       "Age" "=" delta-seconds
1795	   portlist        =       portnum *("," *LWS portnum)
1796	   portnum         =       1*19DIGIT

1798	   The set-cookie and set-cookie2 headers are specified in RFC2109
1799	   [RFC2109] and RFC2965 [RFC2965], respectively.  The "Age" attribute
1800	   is introduced in this specification to indicate the age of the cookie
1801	   and is optional.  An MRCPv2 client or server MUST calculate the age
1802	   of the cookie according to the age calculation rules in the HTTP/1.1
1803	   specification [RFC2616] and append the "Age" attribute accordingly.

1805	   The MRCPv2 client or server MUST supply defaults for the Domain and
1806	   Path attributes if omitted by the HTTP origin server as specified in
1807	   RFC2109 (set-cookie) and RFC2965 (set-cookie2).  Note that there is
1808	   no leading dot present in the Domain attribute value in this case.
1809	   Although an explicitly specified Domain value received via the HTTP
1810	   protocol may be modified to include a leading dot, an MRCPv2 client
1811	   or server MUST NOT modify the Domain value when received via the
1812	   MRCPv2 protocol.

1814	   An MRCPv2 client or server MAY combine multiple cookie headers of the
1815	   same type into a single "field-name:field-value" pair as described in
1816	   Section 6.2.

1818	   The set-cookie and set-cookie2 headers MAY be specified in any
1819	   request that subsequently results in the server performing an HTTP
1820	   access.  When a server receives new cookie information from an HTTP
1821	   origin server, and assuming the cookie store is modified according
1822	   RFC2109 or RFC2965, the server MUST return the new cookie information
1823	   in the MRCPv2 COMPLETE response or event as appropriate to allow the
1824	   client to update its own cookie store.

1826	   The "SET-PARAMS" request MAY specify the set-cookie and set-cookie2
1827	   headers to update the cookie store on a server.  The GET-PARAMS
1828	   request MAY be used to return the entire cookie store of "Set-Cookie"
1829	   or "Set-Cookie2" type cookies to the client.

1831	6.2.16.  Vendor Specific Parameters

1833	   This set of headers allows for the client to set or retrieve Vendor
1834	   Specific parameters.

1836	   vendor-specific          =    "Vendor-Specific-Parameters" ":"
1837	                                 [vendor-specific-av-pair
1838	                                 *(";" vendor-specific-av-pair)] CRLF

1840	   vendor-specific-av-pair  = vendor-av-pair-name "="
1841	                              value

1843	   Headers of this form MAY be sent in any method (request) and are used
1844	   to manage implementation-specific parameters on the server side.  The
1845	   vendor-av-pair-name follows the reverse Internet Domain Name
1846	   convention (see Section 13.1.6 for syntax and registration
1847	   information).  The value of the vendor attribute is specified after
1848	   the "=" symbol and MAY be quoted.  For example:

1850	   com.example.companyA.paramxyz=256
1851	   com.example.companyA.paramabc=High
1852	   com.example.companyB.paramxyz=Low

1854	   When used in GET-PARAMS to get the current value of these parameters
1855	   from the server, this header value may contain a semicolon-separated
1856	   list of implementation-specific attribute names.

1858	6.3.  Generic Result Structure

1860	   Result data from the server for the Recognizer and Verification
1861	   resources is carried as a typed media entity in the MRCPv2 message
1862	   body of various events.  The Natural Language Semantics Markup
1863	   Language (NLSML), an XML markup based on an early draft from the W3C,
1864	   is the default standard for returning results back to the client.

1866	   Hence, all servers implementing these resource types MUST support the
1867	   Media Type application/nlsml+xml.  When the Extensible MultiModal
1868	   Annotation [W3C.REC-emma-20090210] specification being developed at
1869	   the W3C has reached a stable standards state, it can be used to
1870	   return results as well.  This can be done by negotiating the format
1871	   at session establishment time with SDP (a=resultformat:application/
1872	   emma+xml) or with SIP (Allow/Accept).  With SIP, for example, if a
1873	   client wants results in EMMA, an MRCPv2 proxy can route the request
1874	   to a server that supports EMMA by inspecting the SIP headers, rather
1875	   than having to introspect in to the SDP.

1877	   MRCPv2 uses this representation to convey content among the clients
1878	   and servers that generate and make use of the markup.  MRCPv2 uses
1879	   NSLML specifically to convey recognition, enrollment, and
1880	   verification results between the corresponding resource on the MRCPv2
1881	   server and the MRCPv2 client.  Details of this result format are
1882	   fully described in Section 6.3.1.

1884	   Content-Type:application/nlsml+xml
1885	   Content-Length:...

1887	   <?xml version="1.0"?>
1888	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
1889	           xmlns:ex="http://www.example.com/example"
1890	           grammar="http://theYesNoGrammar">
1891	       <interpretation>
1892	           <instance>
1893	                   <ex:response>yes</ex:response>
1894	           </instance>
1895	           <input>ok</input>
1896	       </interpretation>
1897	   </result>

1899	                              Result Example

1901	6.3.1.  Natural Language Semantics Markup Language

1903	   The Natural Language Semantics Markup Language (NLSML) is an XML data
1904	   structure with elements and attributes designed to carry result
1905	   information from recognizer (including enrollment) and verification
1906	   resources.  The normative definition of NLSML is the RelaxNG schema
1907	   in Section 16.1.  Note that the elements and attributes of this
1908	   format are defined in the MRCPv2 namespace.  In the result structure,
1909	   they must either be prefixed by a namespace prefix declared within
1910	   the result or must be children of an element identified as belonging
1911	   to the respective namespace.  For details on how to use XML
1912	   Namespaces, see [W3C.REC-xml-names11-20040204].  Section 2 of
1913	   [W3C.REC-xml-names11-20040204] provides details on how to declare
1914	   namespaces and namespace prefixes.

1916	   The root element of NLSML is <result>.  Optional child elements are
1917	   <interpretation>, <enrollment-result>, and <verification-result>, at
1918	   least one of which must be present.  A single <result> may contain
1919	   all of the optional child elements.  Details of the <result> and
1920	   <interpretation> elements and their subelements and attributes can be
1921	   found in Section 9.6.  Details of the <enrollment-result> element and
1922	   its subelements can be found in Section 9.7.  Details of the
1923	   <veriifcation-result> element and its subelements can be found in
1924	   Section 11.5.2.

1926	7.  Resource Discovery

1928	   Server resources may be discovered and their capabilities learned by
1929	   clients through standard SIP machinery.  The client can issue a SIP
1930	   OPTIONS transaction to a server, which has the effect of requesting
1931	   the capabilities of the server.  The server MUST respond to such a
1932	   request with an SDP-encoded description of its capabilities according
1933	   to RFC3264 [RFC3264].  The MRCPv2 capabilities are described by a
1934	   single m-line containing the media type "application" and transport
1935	   type "TCP/TLS/MRCPv2" or "TCP/MRCPv2".  There MUST be one "resource"
1936	   attribute for each media resource that the server supports with the
1937	   resource type identifier as its value.

1939	   The SDP description MUST also contain m-lines describing the audio
1940	   capabilities and the coders the server supports.

1942	   In this example, the client uses the SIP OPTIONS method to query the
1943	   capabilities of the MRCPv2 server.

1945	   C->S:
1946	        OPTIONS sip:mrcp@server.example.com SIP/2.0
1947	        Max-Forwards:6
1948	        To:<sip:mrcp@example.com>;tag=62784
1949	        From:Sarvi <sip:sarvi@example.com>;tag=1928301774
1950	        Call-ID:a84b4c76e66710
1951	        CSeq:63104 OPTIONS
1952	        Contact:<sip:sarvi@client.example.com>
1953	        Accept:application/sdp
1954	        Content-Length:...

1956	   S->C:
1957	        SIP/2.0 200 OK
1958	        To:<sip:mrcp@example.com>;tag=62784
1959	        From:Sarvi <sip:sarvi@example.com>;tag=1928301774
1960	        Call-ID:a84b4c76e66710
1961	        CSeq:63104 OPTIONS
1962	        Contact:<sip:mrcp@server.example.com>
1963	        Allow:INVITE, ACK, CANCEL, OPTIONS, BYE
1964	        Accept:application/sdp
1965	        Accept-Encoding:gzip
1966	        Accept-Language:en
1967	        Supported:foo
1968	        Content-Type:application/sdp
1969	        Content-Length:...

1971	        v=0
1972	        o=sarvi 2890844526 2890842807 IN IP4 192.0.2.4
1973	        s=
1974	        i=MRCPv2 server capabilities
1975	        c=IN IP4 192.0.2.12/127
1976	        m=application 0 TCP/MRCPv2 1
1977	        a=resource:speechsynth
1978	        a=resource:speechrecog
1979	        a=resource:speakverify
1980	        m=audio 0 RTP/AVP 0 3
1981	        a=rtpmap:0 PCMU/8000
1982	        a=rtpmap:3 GSM/8000

1984	         Using SIP OPTIONS for MRCPv2 Server Capability Discovery

1986	8.  Speech Synthesizer Resource

1988	   This resource processes text markup provided by the client and
1989	   generates a stream of synthesized speech in real-time.  Depending
1990	   upon the server implementation and capability of this resource, the
1991	   client can also dictate parameters of the synthesized speech such as
1992	   voice characteristics, speaker speed, etc.

1994	   The synthesizer resource is controlled by MRCPv2 requests from the
1995	   client.  Similarly, the resource can respond to these requests or
1996	   generate asynchronous events to the client to indicate conditions of
1997	   interest to the client during the generation of the synthesized
1998	   speech stream.

2000	   This section applies for the following resource types:
2001	      speechsynth
2002	      basicsynth

2004	   The capabilities of these resources are defined in Section 3.1.

2006	8.1.  Synthesizer State Machine

2008	   The synthesizer maintains a state machine to process MRCPv2 requests
2009	   from the client.  The state transitions shown below describe the
2010	   states of the synthesizer and reflect the state of the request at the
2011	   head of the synthesizer resource queue.  A "SPEAK" request in the
2012	   PENDING state can be deleted or stopped by a "STOP" request without
2013	   affecting the state of the resource.

2015	   Idle                    Speaking                  Paused
2016	   State                   State                     State
2017	     |                        |                          |
2018	     |----------SPEAK-------->|                 |--------|
2019	     |<------STOP-------------|             CONTROL      |
2020	     |<----SPEAK-COMPLETE-----|                 |------->|
2021	     |<----BARGE-IN-OCCURRED--|                          |
2022	     |              |---------|                          |
2023	     |          CONTROL       |-----------PAUSE--------->|
2024	     |              |-------->|<----------RESUME---------|
2025	     |                        |               |----------|
2026	     |----------|             |              PAUSE       |
2027	     |    BARGE-IN-OCCURRED   |               |--------->|
2028	     |<---------|             |----------|               |
2029	     |                        |      SPEECH-MARKER       |
2030	     |                        |<---------|               |
2031	     |----------|             |----------|               |
2032	     |         STOP           |       RESUME             |
2033	     |          |             |<---------|               |
2034	     |<---------|             |                          |
2035	     |<---------------------STOP-------------------------|
2036	     |----------|             |                          |
2037	     |     DEFINE-LEXICON     |                          |
2038	     |          |             |                          |
2039	     |<---------|             |                          |
2040	     |<---------------BARGE-IN-OCCURRED------------------|

2042	                         Synthesizer State Machine

2044	8.2.  Synthesizer Methods

2046	   The synthesizer supports the following methods.

2048	   synthesizer-method   =  "SPEAK"
2049	                        /  "STOP"
2050	                        /  "PAUSE"
2051	                        /  "RESUME"
2052	                        /  "BARGE-IN-OCCURRED"
2053	                        /  "CONTROL"
2054	                        /  "DEFINE-LEXICON"

2056	8.3.  Synthesizer Events

2058	   The synthesizer may generate the following events.

2060	   synthesizer-event    =  "SPEECH-MARKER"
2061	                        /  "SPEAK-COMPLETE"

2063	8.4.  Synthesizer Header Fields

2065	   A synthesizer method may contain headers containing request options
2066	   and information to augment the Request, Response or Event it is
2067	   associated with.

2069	   synthesizer-header  =  jump-size
2070	                       /  kill-on-barge-in
2071	                       /  speaker-profile
2072	                       /  completion-cause
2073	                       /  completion-reason
2074	                       /  voice-parameter
2075	                       /  prosody-parameter
2076	                       /  speech-marker
2077	                       /  speech-language
2078	                       /  fetch-hint
2079	                       /  audio-fetch-hint
2080	                       /  failed-uri
2081	                       /  failed-uri-cause
2082	                       /  speak-restart
2083	                       /  speak-length
2084	                       /  load-lexicon
2085	                       /  lexicon-search-order

2087	8.4.1.  Jump-Size

2089	   This header MAY be specified in a CONTROL method and controls the
2090	   amount to jump forward or backward in an active "SPEAK" request.  A +
2091	   or - indicates a relative value to what is being currently played.
2092	   This header MAY also be specified in a "SPEAK" request as a desired
2093	   offset into the synthesized speech.  In this case, the synthesizer
2094	   MUST begin speaking from this amount of time into the speech markup.
2095	   Note that an offset that extends beyond the end of the produced
2096	   speech will result in audio of length zero.  The different speech
2097	   length units supported are dependent on the synthesizer
2098	   implementation.  If the synthesizer resource does not support a unit
2099	   or the operation, the resource MUST respond with a status code of 409
2100	   "Unsupported Header Value".

2102	   jump-size             =   "Jump-Size" ":" speech-length-value CRLF

2104	   speech-length-value   =   numeric-speech-length
2105	                         /   text-speech-length

2107	   text-speech-length    =   1*UTFCHAR SP "Tag"

2109	   numeric-speech-length =    ("+" / "-") positive-speech-length

2111	   positive-speech-length =   1*19DIGIT SP numeric-speech-unit

2113	   numeric-speech-unit   =   "Second"
2114	                         /   "Word"
2115	                         /   "Sentence"
2116	                         /   "Paragraph"

2118	8.4.2.  Kill-On-Barge-In

2120	   This header MAY be sent as part of the "SPEAK" method to enable kill-
2121	   on-barge-in support.  If enabled, the "SPEAK" method is interrupted
2122	   by DTMF input detected by a signal detector resource or by the start
2123	   of speech sensed or recognized by the speech recognizer resource.

2125	   kill-on-barge-in      =   "Kill-On-Barge-In" ":" BOOLEAN CRLF

2127	   The client MUST send a BARGE-IN-OCCURRED method to the synthesizer
2128	   resource when it receives a barge-in-able event from any source.
2129	   This source could be a synthesizer resource or signal detector
2130	   resource and MAY be either local or distributed.  If this header is
2131	   not specified in a "SPEAK" request or explicitly set by a
2132	   "SET-PARAMS", the default value for this header is "true".

2134	   If the recognizer or signal detector resource is on the same server
2135	   as the synthesizer and both are part of the same session, the server
2136	   MAY work with both to provide internal notification to the
2137	   synthesizer so that audio may be stopped without having to wait for
2138	   the client's BARGE-IN-OCCURRED event.

2140	8.4.3.  Speaker Profile

2142	   This header MAY be part of the "SET-PARAMS"/"GET-PARAMS" or "SPEAK"
2143	   request from the client to the server and specifies a URI which
2144	   references the profile of the speaker.  Speaker profiles are
2145	   collections of voice parameters like gender, accent etc.

2147	   speaker-profile       =   "Speaker-Profile" ":" uri CRLF

2149	8.4.4.  Completion Cause

2151	   This header MUST be specified in a "SPEAK-COMPLETE" event coming from
2152	   the synthesizer resource to the client.  This indicates the reason
2153	   the "SPEAK" request completed.

2155	   completion-cause      =   "Completion-Cause" ":" 3DIGIT SP
2156	                             1*VCHAR CRLF

2158	   +------------+-----------------------+------------------------------+
2159	   | Cause-Code | Cause-Name            | Description                  |
2160	   +------------+-----------------------+------------------------------+
2161	   | 000        | normal                | SPEAK completed normally.    |
2162	   | 001        | barge-in              | SPEAK request was terminated |
2163	   |            |                       | because of barge-in.         |
2164	   | 002        | parse-failure         | SPEAK request terminated     |
2165	   |            |                       | because of a failure to      |
2166	   |            |                       | parse the speech markup      |
2167	   |            |                       | text.                        |
2168	   | 003        | uri-failure           | SPEAK request terminated     |
2169	   |            |                       | because access to one of the |
2170	   |            |                       | URIs failed.                 |
2171	   | 004        | error                 | SPEAK request terminated     |
2172	   |            |                       | prematurely due to           |
2173	   |            |                       | synthesizer error.           |
2174	   | 005        | language-unsupported  | Language not supported.      |
2175	   | 006        | lexicon-load-failure  | Lexicon loading failed.      |
2176	   | 007        | cancelled             | A prior SPEAK request failed |
2177	   |            |                       | while this one was still in  |
2178	   |            |                       | the queue.                   |
2179	   +------------+-----------------------+------------------------------+

2181	                Synthesizer Resource Compleion Cause Codes

2183	8.4.5.  Completion Reason

2185	   This header MAY be specified in a "SPEAK-COMPLETE" event coming from
2186	   the synthesizer resource to the client.  This contains the reason
2187	   text behind the "SPEAK" request completion.  This header communicates
2188	   text describing the reason for the failure, such as an error in
2189	   parsing the speech markup text.

2191	   completion-reason   =   "Completion-Reason" ":"
2192	                           quoted-string CRLF

2194	   The completion reason text is provided for client use in logs and for
2195	   debugging and instrumentation purposes.  Clients MUST NOT interpret
2196	   the completion reason text.

2198	8.4.6.  Voice-Parameter

2200	   This set of headers defines the voice of the speaker.

2202	   voice-parameter    =   voice-gender
2203	                       /   voice-age
2204	                       /   voice-variant
2205	                       /   voice-name

2207	   voice-gender        =   "Voice-Gender:" voice-gender-value CRLF
2208	   voice-gender-value  =   "male"
2209	                       /   "female"
2210	                       /   "neutral"
2211	   voice-age           =   "Voice-Age:" 1*3DIGIT CRLF
2212	   voice-variant       =   "Voice-Variant:" 1*19DIGIT CRLF
2213	   voice-name          =   "Voice-Name:"
2214	                           1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF

2216	   The Voice- parameters are derived from the similarly-named attributes
2217	   of the voice element specified in W3C's Speech Synthesis Markup
2218	   Language Specification [W3C.REC-speech-synthesis-20040907].  Legal
2219	   values for these parameters are as defined in that specification.

2221	   These headers MAY be sent in "SET-PARAMS"/"GET-PARAMS" request to
2222	   define/get default values for the entire session or MAY be sent in
2223	   the "SPEAK" request to define default values for that speak request.
2224	   Note that SSML content can itself set these values internal to the
2225	   SSML document, of course.

2227	   Voice parameter headers MAY also be sent in a CONTROL method to
2228	   affect a "SPEAK" request in progress and change its behavior on the
2229	   fly.  If the synthesizer resource does not support this operation, it
2230	   MUST reject the request with a status of 403 "Unsupported Header".

2232	8.4.7.  Prosody-Parameters

2234	   This set of headers defines the prosody of the speech.

2236	   prosody-parameter   =   "Prosody-" prosody-param-name ":"
2237	                           prosody-param-value CRLF

2239	   prosody-param-name is any one of the attribute names under the
2240	   prosody element specified in W3C's Speech Synthesis Markup Language
2241	   Specification [W3C.REC-speech-synthesis-20040907].  The prosody-
2242	   param-value is any one of the value choices of the corresponding
2243	   prosody element attribute specified in the above section.

2245	   These headers MAY be sent in "SET-PARAMS"/"GET-PARAMS" request to
2246	   define/get default values for the entire session or MAY be sent in
2247	   the "SPEAK" request to define default values for that speak request.
2248	   Furthermore, these attributes can be part of the speech text marked
2249	   up in SSML.

2251	   The prosody parameter headers in the "SET-PARAMS" or "SPEAK" request
2252	   only apply if the speech data is of type text/plain and does not use
2253	   a speech markup format.

2255	   These prosody parameter headers MAY also be sent in a CONTROL method
2256	   to affect a "SPEAK" request in progress and change its behavior on
2257	   the fly.  If the synthesizer resource does not support this
2258	   operation, it MUST respond back to the client with a status of 403
2259	   "Unsupported Header".

2261	8.4.8.  Speech Marker

2263	   This header contains timestamp information in a "timestamp" field.
2264	   This is an NTP timestamp, a 64 bit number in decimal form.  It MUST
2265	   be synced with the RTP timestamp of the media stream through RTCP.

2267	   Markers are bookmarks that are defined within the markup.  Most
2268	   speech markup formats provide mechanisms to embed marker fields
2269	   within speech texts.  The synthesizer generates SPEECH-MARKER events
2270	   when it reaches these marker fields.  This header MUST be part of the
2271	   SPEECH-MARKER event and contain the marker tag value after the
2272	   timestamp, separated by a semicolon.  In these events the timestamp
2273	   marks the time the text corresponding to the marker was emitted as
2274	   speech by the synthesizer.

2276	   This header MUST also be returned in responses to STOP, CONTROL, and
2277	   BARGE-IN-OCCURRED methods, in the "SPEAK-COMPLETE" event, and in an
2278	   IN-PROGRESS SPEAK response.  In these messages, if any markers have
2279	   been encountered for the current SPEAK, the marker tag value MUST be
2280	   the last embedded marker encountered.  If no markers have yet been
2281	   encountered for the current SPEAK, only the timestamp is REQUIRED.
2282	   Note than in these events the purpose of this header is to provide
2283	   timestamp information associated with important events within the
2284	   lifecycle of a request (start of SPEAK processing, end of SPEAK
2285	   processing, receipt of CONTROL/STOP/BARGE-IN-OCCURRED).

2287	   timestamp           =   "timestamp" "=" time-stamp-value

2289	   time-stamp-value    =   1*20DIGIT

2291	   speech-marker       =   "Speech-Marker" ":"
2292	                           timestamp
2293	                           [";" 1*(UTFCHAR / %x20)] CRLF

2295	8.4.9.  Speech Language

2297	   This header specifies the default language of the speech data if the
2298	   language is not specified in the markup.  The value of this header
2299	   MUST follow RFC4646 [RFC4646] for its values.  The header MAY occur
2300	   in "SPEAK", "SET-PARAMS" or "GET-PARAMS" requests.

2302	   speech-language     =   "Speech-Language" ":" 1*VCHAR CRLF

2304	8.4.10.  Fetch Hint

2306	   When the synthesizer needs to fetch documents or other resources like
2307	   speech markup or audio files, this header controls the corresponding
2308	   URI access properties.  This provides client policy on when the
2309	   synthesizer should retrieve content from the server.  A value of
2310	   "prefetch" indicates the content MAY be downloaded when the request
2311	   is received, whereas "safe" indicates that content MUST NOT be
2312	   downloaded until actually referenced.  The default value is
2313	   "prefetch".  This header MAY occur in "SPEAK", "SET-PARAMS" or
2314	   "GET-PARAMS" requests.

2316	   fetch-hint          =   "Fetch-Hint" ":" 1*ALPHA CRLF

2318	8.4.11.  Audio Fetch Hint

2320	   When the synthesizer needs to fetch documents or other resources like
2321	   speech audio files, this header controls the corresponding URI access
2322	   properties.  This provides client policy whether or not the
2323	   synthesizer may attempt to optimize speech by pre-fetching audio.
2324	   The value is either "safe" to say that audio is only fetched when it
2325	   is referenced, never before; "prefetch" to permit, but not require
2326	   the implementation to pre-fetch the audio; or "stream" to allow it to
2327	   stream the audio fetches.  The default value is "prefetch".  This
2328	   header MAY occur in "SPEAK", "SET-PARAMS" or "GET-PARAMS". requests.

2330	   audio-fetch-hint    =   "Audio-Fetch-Hint" ":" 1*ALPHA CRLF

2332	8.4.12.  Failed URI

2334	   When a synthesizer method needs a synthesizer to fetch or access a
2335	   URI and the access fails, the server SHOULD provide the failed URI in
2336	   this header in the method response, unless there are multiple URI
2337	   failures, in which case one of the failed URIs MUST be provided in
2338	   this header in the method response.

2340	   failed-uri          =   "Failed-URI" ":" Uri CRLF

2342	8.4.13.  Failed URI Cause

2344	   When a synthesizer method needs a synthesizer to fetch or access a
2345	   URI and the access fails the server MUST provide the URI-specific or
2346	   protocol-specific response code for the URI in the Failed-URI header
2347	   in the method response through this header.  The value encoding is
2348	   UTF-8 to accommodate any access protocol, some of which might have a
2349	   response string instead of a numeric response code.
2350	   failed-uri-cause    =   "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

2352	8.4.14.  Speak Restart

2354	   When a CONTROL request to jump backward is issued to a currently
2355	   speaking synthesizer resource, and the target jump point is before
2356	   the start of the current "SPEAK" request, the current "SPEAK" request
2357	   MUST restart from the beginning of its speech data and the response
2358	   to the CONTROL request MUST contain this header indicating a restart.

2360	   speak-restart       =   "Speak-Restart" ":" BOOLEAN CRLF

2362	8.4.15.  Speak Length

2364	   This header MAY be specified in a CONTROL method to control the
2365	   length of speech to speak, relative to the current speaking point in
2366	   the currently active "SPEAK" request.  If numeric, the value MUST be
2367	   a positive integer.  If a header with a Tag unit is specified, then
2368	   the speech output continues until the tag is reached or the "SPEAK"
2369	   request complete, whichever comes first.  This header MAY be
2370	   specified in a "SPEAK" request to indicate the length to speak from
2371	   the speech data and is relative to the point in speech that the
2372	   "SPEAK" request starts.  The different speech length units supported
2373	   are synthesizer implementation dependent.  If a server does not
2374	   support the specified unit, the resource MUST respond with a status
2375	   code of 409 "Unsupported Header Value".

2377	   speak-length          =   "Speak-Length" ":" positive-length-value
2378	                             CRLF

2380	   positive-length-value =   positive-speech-length
2381	                         /   text-speech-length

2383	   text-speech-length    =   1*UTFCHAR SP "Tag"

2385	   positive-speech-length =  1*19DIGIT SP numeric-speech-unit

2387	   numeric-speech-unit   =   "Second"
2388	                         /   "Word"
2389	                         /   "Sentence"
2390	                         /   "Paragraph"

2392	8.4.16.  Load-Lexicon

2394	   This header is used to indicate whether a lexicon has to be loaded or
2395	   unloaded.  The default value for this header is "true".  This header
2396	   MAY be specified in a DEFINE-LEXICON method.

2398	   load-lexicon       =   "Load-Lexicon" ":" BOOLEAN CRLF

2400	8.4.17.  Lexicon-Search-Order

2402	   This header is used to specify a list of active Lexicon URIs and the
2403	   search order among the active lexicons.  Lexicons specified within
2404	   the SSML document take precedence over the lexicons specified in this
2405	   header.  This header MAY be specified in the SPEAK, SET-PARAMS, and
2406	   GET-PARAMS methods.

2408	   lexicon-search-order =   "Lexicon-Search-Order" ":"
2409	             "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF

2411	8.5.  Synthesizer Message Body

2413	   A synthesizer message may contain additional information associated
2414	   with the Request, Response or Event in its message body.

2416	8.5.1.  Synthesizer Speech Data

2418	   Marked-up text for the synthesizer to speak is specified as a typed
2419	   media entity in the message body.  The speech data to be spoken by
2420	   the synthesizer can be specified inline by embedding the data in the
2421	   message body or by reference by providing a URI for accessing the
2422	   data.  In either case the data and the format used to markup the
2423	   speech needs to be of a content type supported by the server.

2425	   All MRCPv2 servers containing synthesizer resources MUST support both
2426	   plain text speech data and W3C's Speech Synthesis Markup Language
2427	   [W3C.REC-speech-synthesis-20040907] and hence MUST support the Media
2428	   Types text/plain and application/ssml+xml.  Other formats MAY be
2429	   supported.

2431	   If the speech data is to be fetched by URI reference, the Media Type
2432	   text/uri-list RFC2483 [RFC2483] is used to indicate one or more URIs
2433	   that, when dereferenced, will contain the content to be spoken.  If a
2434	   list of speech URIs is specified, speech data provided by each URI
2435	   MUST be spoken in the order in which the URIs are specified in the
2436	   content.

2438	   A mix of URI and inline speech data may be indicated through the
2439	   multipart/mixed Media Type.  Embedded within the multipart there MAY
2440	   be content for the text/uri-list, application/ssml+xml and/or text/
2441	   plain media types.  The character set and encoding used in the speech
2442	   data is specified according to standard Media Type definitions.  The
2443	   multi-part content MAY also contain actual audio data in .wav or sun
2444	   audio format.  Clients may have recorded audio clips stored in memory
2445	   or on a local device and wish to play it as part of the "SPEAK"
2446	   request.  The audio portions MAY be sent by the client as part of the
2447	   multi-part content block.  This audio is referenced in the speech
2448	   markup data that is another part in the multi-part content block
2449	   according to the multipart/mixed Media Type specification.

2451	   Content-Type:text/uri-list
2452	   Content-Length:...

2454	   http://www.example.com/ASR-Introduction.ssml
2455	   http://www.example.com/ASR-Document-Part1.ssml
2456	   http://www.example.com/ASR-Document-Part2.ssml
2457	   http://www.example.com/ASR-Conclusion.ssml

2459	                             URI List Example

2461	   Content-Type:application/ssml+xml
2462	   Content-Length:...

2464	   <?xml version="1.0"?>
2465	        <speak version="1.0"
2466	               xmlns="http://www.w3.org/2001/10/synthesis"
2467	               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2468	               xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2469	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2470	               xml:lang="en-US">
2471	          <p>
2472	            <s>You have 4 new messages.</s>
2473	            <s>The first is from Stephanie Williams
2474	            and arrived at <break/>
2475	            <say-as interpret-as="vxml:time">0345p</say-as>.</s>

2477	            <s>The subject is <prosody
2478	            rate="-20%">ski trip</prosody></s>
2479	         </p>
2480	        </speak>

2482	                               SSML Example

2484	   Content-Type:multipart/mixed; boundary="break"

2486	   --break
2487	   Content-Type:text/uri-list
2488	   Content-Length:...

2490	   http://www.example.com/ASR-Introduction.ssml
2491	   http://www.example.com/ASR-Document-Part1.ssml
2492	   http://www.example.com/ASR-Document-Part2.ssml
2493	   http://www.example.com/ASR-Conclusion.ssml

2495	   --break
2496	   Content-Type:application/ssml+xml
2497	   Content-Length:...

2499	   <?xml version="1.0"?>
2500	       <speak version="1.0"
2501	              xmlns="http://www.w3.org/2001/10/synthesis"
2502	              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2503	              xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2504	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2505	              xml:lang="en-US">
2506	          <p>
2507	            <s>You have 4 new messages.</s>
2508	            <s>The first is from Stephanie Williams
2509	            and arrived at <break/>
2510	            <say-as interpret-as="vxml:time">0345p</say-as>.</s>

2512	            <s>The subject is <prosody
2513	            rate="-20%">ski trip</prosody></s>
2514	          </p>
2515	       </speak>
2516	   --break--

2518	                             Multipart Example

2520	8.5.2.  Lexicon Data

2522	   Synthesizer lexicon data from the client to the server can be
2523	   provided inline or by reference.  Either way they are carried as
2524	   typed media in the message body of the MRCPv2 request message.

2526	   When a lexicon is specified in-line in the message, the client MUST
2527	   provide a Content-ID for that lexicon as part of the content headers.
2528	   The server MUST store the lexicon associated with that Content-ID for
2529	   the duration of the session.  A stored lexicon can be overwritten by
2530	   defining a new lexicon with the same Content-ID.  Lexicons that have
2531	   been associated with a Content-ID can be referenced through the
2532	   "session:" URI scheme (see Section 13.6).

2534	   If lexicon data is specified by external URI reference, the Media
2535	   Type text/uri-list RFC2483 [RFC2483] is used to list the one or more
2536	   URIs that may be dereferenced to obtain the lexicon data.  All MRCPv2
2537	   servers MUST support the HTTP and HTTPS uri access mechanisms, and
2538	   MAY support other mechanisms.

2540	   If the data in the message body consists of a mix of URI and inline
2541	   lexicon data the multipart/mixed Media Type is used.  The character
2542	   set and encoding used in the lexicon data may be specified according
2543	   to standard Media Type definitions.

2545	8.6.  SPEAK Method

2547	   The "SPEAK" Request provides the synthesizer resource with the speech
2548	   text and initiates speech synthesis and streaming.  The "SPEAK"
2549	   method can carry voice and prosody headers that alter the behavior of
2550	   the voice being synthesized, as well as a typed media message body
2551	   containing the actual marked-up text to be spoken.

2553	   The SPEAK method implementation MUST do a fetch of all external URIs
2554	   that are part of that operation.  If caching is implemented, this URI
2555	   fetching MUST conform to the cache control hints and parameter
2556	   headers associated with the method in deciding whether it is to be
2557	   fetched from cache or from the external server.  If these hints/
2558	   parameters are not specified in the method, the values set for the
2559	   session using SET-PARAMS/GET-PARAMS apply.  If it was not set for the
2560	   session their default values apply.

2562	   When applying voice parameters there are 3 levels of precedence.  The
2563	   highest precedence are those specified within the speech markup text,
2564	   followed by those specified in the headers of the "SPEAK" request and
2565	   hence apply for that "SPEAK" request only, followed by the session
2566	   default values which can be set using the "SET-PARAMS" request and
2567	   apply for subsequent methods invoked during the session.

2569	   If the resource was idle at the time the "SPEAK" request arrived at
2570	   the server and the "SPEAK" method is being actively processed, the
2571	   resource responds immediately with a success status code and a
2572	   request-state of IN-PROGRESS.

2574	   If the resource is in the speaking or paused state when the "SPEAK"
2575	   method arrives at the server, i.e. it is in the middle of processing
2576	   a previous "SPEAK" request, the status returns success with a
2577	   request-state of PENDING.  The server places the "SPEAK" request in
2578	   the synthesizer resource request queue.  The request queue operates
2579	   strictly FIFO: requests are processed serially in order of receipt.

2581	   If the current SPEAK fails, all SPEAK methods in the pending queue
2582	   are cancelled and each generates a SPEAK-COMPLETE event with a
2583	   Completion-Cause of "cancelled".

2585	   For the synthesizer resource, "SPEAK" is the only method that can
2586	   return a request-state of IN-PROGRESS or PENDING.  When the text has
2587	   been synthesized and played into the media stream, the resource
2588	   issues a "SPEAK-COMPLETE" event with the request-id of the "SPEAK"
2589	   request and a request-state of COMPLETE.

2591	   C->S: MRCP/2.0 489 SPEAK 543257
2592	         Channel-Identifier:32AECB23433802@speechsynth
2593	         Voice-gender:neutral
2594	         Voice-Age:25
2595	         Prosody-volume:medium
2596	         Content-Type:application/ssml+xml
2597	         Content-Length:...

2599	         <?xml version="1.0"?>
2600	            <speak version="1.0"
2601	                xmlns="http://www.w3.org/2001/10/synthesis"
2602	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2603	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2604	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2605	                xml:lang="en-US">
2606	            <p>
2607	             <s>You have 4 new messages.</s>
2608	             <s>The first is from Stephanie Williams and arrived at
2609	                <break/>
2610	                <say-as interpret-as="vxml:time">0345p</say-as>.
2611	                </s>
2612	             <s>The subject is
2613	                    <prosody rate="-20%">ski trip</prosody>
2614	             </s>
2615	            </p>
2616	           </speak>

2618	   S->C: MRCP/2.0 28 543257 200 IN-PROGRESS
2619	         Channel-Identifier:32AECB23433802@speechsynth
2620	         Speech-Marker:timestamp=857206027059

2622	   S->C: MRCP/2.0 79 SPEAK-COMPLETE 543257 COMPLETE
2623	         Channel-Identifier:32AECB23433802@speechsynth
2624	         Completion-Cause:000 normal
2625	         Speech-Marker:timestamp=857206027059
2626	                               SPEAK Example

2628	8.7.  STOP

2630	   The "STOP" method from the client to the server tells the synthesizer
2631	   resource to stop speaking if it is speaking something.

2633	   The "STOP" request can be sent with an active-request-id-list header
2634	   to stop the zero or more specific "SPEAK" requests that may be in
2635	   queue and return a response code of 200 (Success).  If no active-
2636	   request-id-list header is sent in the "STOP" request the server
2637	   terminates all outstanding "SPEAK" requests.

2639	   If a "STOP" request successfully terminated one or more PENDING or
2640	   IN-PROGRESS "SPEAK" requests, then the response MUST contain an
2641	   active-request-id-list header enumerating the "SPEAK" request-ids
2642	   that were terminated.  Otherwise there is no active-request-id-list
2643	   header in the response.  No "SPEAK-COMPLETE" events are sent for such
2644	   terminated requests.

2646	   If a "SPEAK" request that was IN-PROGRESS and speaking was stopped,
2647	   the next pending "SPEAK" request, if any, becomes IN-PROGRESS at the
2648	   resource and enters the speaking state.

2650	   If a "SPEAK" request that was IN-PROGRESS and paused was stopped, the
2651	   next pending "SPEAK" request, if any, becomes IN-PROGRESS and enters
2652	   the paused state.

2654	   C->S: MRCP/2.0 423 SPEAK 543258
2655	         Channel-Identifier:32AECB23433802@speechsynth
2656	         Content-Type:application/ssml+xml
2657	         Content-Length:...

2659	         <?xml version="1.0"?>
2660	           <speak version="1.0"
2661	                xmlns="http://www.w3.org/2001/10/synthesis"
2662	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2663	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2664	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2665	                xml:lang="en-US">
2666	            <p>
2667	             <s>You have 4 new messages.</s>
2668	             <s>The first is from Stephanie Williams and arrived at
2669	                <break/>
2670	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
2671	             <s>The subject is
2672	                 <prosody rate="-20%">ski trip</prosody></s>
2673	            </p>
2674	           </speak>

2676	   S->C: MRCP/2.0 48 543258 200 IN-PROGRESS
2677	         Channel-Identifier:32AECB23433802@speechsynth
2678	         Speech-Marker:timestamp=857206027059

2680	   C->S: MRCP/2.0 44 STOP 543259
2681	         Channel-Identifier:32AECB23433802@speechsynth

2683	   S->C: MRCP/2.0 66 543259 200 COMPLETE
2684	         Channel-Identifier:32AECB23433802@speechsynth
2685	         Active-Request-Id-List:543258
2686	         Speech-Marker:timestamp=857206039059

2688	                               STOP Example

2690	8.8.  BARGE-IN-OCCURED

2692	   The BARGE-IN-OCCURRED method, when used with the synthesizer
2693	   resource, provides a client which has detected a barge-in-able event
2694	   a means to communicate the occurrence of the event to the synthesizer
2695	   resource.

2697	   This method is useful in two scenarios,
2698	   1.  The client has detected DTMF digits in the input media or some
2699	       other barge-in-able event and wants to communicate that to the
2700	       synthesizer resource.

2702	   2.  The recognizer resource and the synthesizer resource are in
2703	       different servers.  In this case the client acts as an
2704	       intermediary for the two servers.  It receives an event from the
2705	       recognition resource and sends a BARGE-IN-OCCURRED request to the
2706	       synthesizer.  In such cases, the BARGE-IN-OCCURRED method would
2707	       also have a proxy-sync-id header received from the resource
2708	       generating the original event.

2710	   If a "SPEAK" request is active with kill-on-barge-in enabled, and the
2711	   BARGE-IN-OCCURRED event is received, the synthesizer MUST immediately
2712	   stop streaming out audio.  It MUST also terminate any speech requests
2713	   queued behind the current active one, irrespective of whether they
2714	   have barge-in enabled or not.  If a barge-in-able "SPEAK" request was
2715	   playing and it was terminated, the response MUST contain the an
2716	   active-request-list header listing the request-ids of all "SPEAK"
2717	   requests that were terminated.  The server generates no
2718	   "SPEAK-COMPLETE" events for these requests.

2720	   If there were no "SPEAK" requests terminated by the synthesizer
2721	   resource as a result of the BARGE-IN-OCCURRED method, the server
2722	   responds to the BARGE-IN-OCCURRED with a 200 success which MUST NOT
2723	   contain an active-request-id-list header.

2725	   If the synthesizer and recognizer resources are part of the same
2726	   MRCPv2 session, they can be optimized for a quicker kill-on-barge-in
2727	   response if the recognizer and synthesizer interact directly.  In
2728	   these cases, the client MUST still react to a START-OF-INPUT event
2729	   from the recognizer by invoking the BARGE-IN-OCCURRED method to the
2730	   synthesizer.  The client MUST invoke the BARGE-IN-OCCURRED if it has
2731	   any outstanding requests to the synthesizer resource in either the
2732	   PENDING or IN-PROGRESS state.

2734	   C->S: MRCP/2.0 433 SPEAK 543258
2735	         Channel-Identifier:32AECB23433802@speechsynth
2736	         Voice-gender:neutral
2737	         Voice-Age:25
2738	         Prosody-volume:medium
2739	         Content-Type:application/ssml+xml
2740	         Content-Length:...

2742	         <?xml version="1.0"?>
2743	           <speak version="1.0"
2744	                xmlns="http://www.w3.org/2001/10/synthesis"
2745	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2746	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2747	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2748	                xml:lang="en-US">
2749	            <p>
2750	             <s>You have 4 new messages.</s>
2751	             <s>The first is from Stephanie Williams and arrived at
2752	                <break/>
2753	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
2754	             <s>The subject is
2755	                <prosody rate="-20%">ski trip</prosody></s>
2756	            </p>
2757	           </speak>

2759	   S->C: MRCP/2.0 47 543258 200 IN-PROGRESS
2760	         Channel-Identifier:32AECB23433802@speechsynth
2761	         Speech-Marker:timestamp=857206027059

2763	   C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259
2764	         Channel-Identifier:32AECB23433802@speechsynth
2765	         Proxy-Sync-Id:987654321

2767	   S->C:MRCP/2.0 72 543259 200 COMPLETE
2768	         Channel-Identifier:32AECB23433802@speechsynth
2769	         Active-Request-Id-List:543258
2770	         Speech-Marker:timestamp=857206039059

2772	                         BARGE-IN-OCCURED Example

2774	8.9.  PAUSE

2776	   The PAUSE method from the client to the server tells the synthesizer
2777	   resource to pause speech output if it is speaking something.  If a
2778	   PAUSE method is issued on a session when a "SPEAK" is not active the
2779	   server MUST respond with a status of 402 "Method not valid in this
2780	   state".  If a PAUSE method is issued on a session when a "SPEAK" is
2781	   active and paused the server MUST respond with a status of 200
2782	   "Success".  If a "SPEAK" request was active the server MUST return an
2783	   active-request-id-list header with the request-id of the "SPEAK"
2784	   request that was paused.

2786	   C->S: MRCP/2.0 434 SPEAK 543258
2787	         Channel-Identifier:32AECB23433802@speechsynth
2788	         Voice-gender:neutral
2789	         Voice-Age:25
2790	         Prosody-volume:medium
2791	         Content-Type:application/ssml+xml
2792	         Content-Length:...

2794	         <?xml version="1.0"?>
2795	           <speak version="1.0"
2796	                xmlns="http://www.w3.org/2001/10/synthesis"
2797	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2798	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2799	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2800	                xml:lang="en-US">
2801	            <p>
2802	             <s>You have 4 new messages.</s>
2803	             <s>The first is from Stephanie Williams and arrived at
2804	                <break/>
2805	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>

2807	             <s>The subject is
2808	                <prosody rate="-20%">ski trip</prosody></s>
2809	            </p>
2810	           </speak>

2812	   S->C: MRCP/2.0 48 543258 200 IN-PROGRESS
2813	         Channel-Identifier:32AECB23433802@speechsynth
2814	         Speech-Marker:timestamp=857206027059

2816	   C->S: MRCP/2.0 43 PAUSE 543259
2817	         Channel-Identifier:32AECB23433802@speechsynth

2819	   S->C: MRCP/2.0 68 543259 200 COMPLETE
2820	         Channel-Identifier:32AECB23433802@speechsynth
2821	         Active-Request-Id-List:543258

2823	                               PAUSE Example

2825	8.10.  RESUME

2827	   The RESUME method from the client to the server tells a paused
2828	   synthesizer resource to resume speaking.  If a RESUME request is
2829	   issued on a session with no active "SPEAK" request, the server MUST
2830	   respond with a status of 402 "Method not valid in this state".  If a
2831	   RESUME request is issued on a session with an active "SPEAK" request
2832	   that is speaking (i.e., not paused) the server MUST respond with a
2833	   status of 200 "Success".  If a "SPEAK" request was paused the server
2834	   MUST return an active-request-id-list header with the request-id of
2835	   the "SPEAK" request that was resumed.

2837	   C->S: MRCP/2.0 434 SPEAK 543258
2838	         Channel-Identifier:32AECB23433802@speechsynth
2839	         Voice-gender:neutral
2840	         Voice-age:25
2841	         Prosody-volume:medium
2842	         Content-Type:application/ssml+xml
2843	         Content-Length:...

2845	         <?xml version="1.0"?>
2846	           <speak version="1.0"
2847	                xmlns="http://www.w3.org/2001/10/synthesis"
2848	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2849	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2850	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2851	                xml:lang="en-US">
2852	            <p>
2853	             <s>You have 4 new messages.</s>
2854	             <s>The first is from Stephanie Williams and arrived at
2855	                <break/>
2856	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
2857	             <s>The subject is
2858	                <prosody rate="-20%">ski trip</prosody></s>
2859	            </p>
2860	           </speak>

2862	   S->C: MRCP/2.0 48 543258 200 IN-PROGRESS@speechsynth
2863	         Channel-Identifier:32AECB23433802
2864	         Speech-Marker:timestamp=857206027059

2866	   C->S: MRCP/2.0 44 PAUSE 543259
2867	         Channel-Identifier:32AECB23433802@speechsynth

2869	   S->C: MRCP/2.0 47 543259 200 COMPLETE
2870	         Channel-Identifier:32AECB23433802@speechsynth
2871	         Active-Request-Id-List:543258

2873	   C->S: MRCP/2.0 44 RESUME 543260
2874	         Channel-Identifier:32AECB23433802@speechsynth

2876	   S->C: MRCP/2.0 66 543260 200 COMPLETE
2877	         Channel-Identifier:32AECB23433802@speechsynth
2878	         Active-Request-Id-List:543258

2880	                              RESUME Example

2882	8.11.  CONTROL

2884	   The CONTROL method from the client to the server tells a synthesizer
2885	   that is speaking to modify what it is speaking on the fly.  This
2886	   method is used to request the synthesizer to jump forward or backward
2887	   in what it is speaking, change speaker rate, speaker parameters, etc.
2888	   It affects only the currently IN-PROGRESS "SPEAK" request.  Depending
2889	   on the implementation and capability of the synthesizer resource it
2890	   may or may not support the various modifications indicated by headers
2891	   in the CONTROL request.

2893	   When a client invokes a CONTROL method to jump forward and the
2894	   operation goes beyond the end of the active "SPEAK" method's text,
2895	   the CONTROL request still succeeds.  The active "SPEAK" request
2896	   completes and returns a "SPEAK-COMPLETE" event following the response
2897	   to the CONTROL method.  If there are more "SPEAK" requests in the
2898	   queue, the synthesizer resource starts at the beginning of the next
2899	   "SPEAK" request in the queue.

2901	   When a client invokes a CONTROL method to jump backward and the
2902	   operation jumps to the beginning or beyond the beginning of the
2903	   speech data of the active "SPEAK" method, the CONTROL request still
2904	   succeeds.  The response to the CONTROL request contains the speak-
2905	   restart header, and the active "SPEAK" request restarts from the
2906	   beginning of its speech data.

2908	   These two behaviors can be used to rewind or fast-forward across
2909	   multiple speech requests, if the client wants to break up a speech
2910	   markup text to multiple "SPEAK" requests.

2912	   If a "SPEAK" request was active when the CONTROL method was received
2913	   the server MUST return an active-request-id-list header with the
2914	   Request-id of the "SPEAK" request that was active.

2916	   C->S: MRCP/2.0 434 SPEAK 543258
2917	         Channel-Identifier:32AECB23433802@speechsynth
2918	         Voice-gender:neutral
2919	         Voice-age:25
2920	         Prosody-volume:medium
2921	         Content-Type:application/ssml+xml
2922	         Content-Length:...

2924	         <?xml version="1.0"?>
2925	           <speak version="1.0"
2926	                xmlns="http://www.w3.org/2001/10/synthesis"
2927	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2928	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2929	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2930	                xml:lang="en-US">
2931	            <p>
2932	             <s>You have 4 new messages.</s>
2933	             <s>The first is from Stephanie Williams
2934	                and arrived at <break/>
2935	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>

2937	             <s>The subject is <prosody
2938	                rate="-20%">ski trip</prosody></s>
2939	            </p>
2940	           </speak>

2942	   S->C: MRCP/2.0 47 543258 200 IN-PROGRESS
2943	         Channel-Identifier:32AECB23433802@speechsynth
2944	         Speech-Marker:timestamp=857205016059

2946	   C->S: MRCP/2.0 63 CONTROL 543259
2947	         Channel-Identifier:32AECB23433802@speechsynth
2948	         Prosody-rate:fast

2950	   S->C: MRCP/2.0 67 543259 200 COMPLETE
2951	         Channel-Identifier:32AECB23433802@speechsynth
2952	         Active-Request-Id-List:543258
2953	         Speech-Marker:timestamp=857206027059

2955	   C->S: MRCP/2.0 68 CONTROL 543260
2956	         Channel-Identifier:32AECB23433802@speechsynth
2957	         Jump-Size:-15 Words

2959	   S->C: MRCP/2.0 69 543260 200 COMPLETE
2960	         Channel-Identifier:32AECB23433802@speechsynth
2961	         Active-Request-Id-List:543258
2962	         Speech-Marker:timestamp=857206039059
2963	                              CONTROL Example

2965	8.12.  SPEAK-COMPLETE

2967	   This is an Event message from the synthesizer resource to the client
2968	   indicating that the corresponding "SPEAK" request was completed.  The
2969	   request-id header matches the request-id of the "SPEAK" request that
2970	   initiated the speech that just completed.  The request-state field is
2971	   set to COMPLETE by the server, indicating that this is the last event
2972	   with the corresponding request-id.  The completion-cause header
2973	   specifies the cause code pertaining to the status and reason of
2974	   request completion such as the "SPEAK" completed normally or because
2975	   of an error, kill-on-barge-in etc.

2977	   C->S: MRCP/2.0 434 SPEAK 543260
2978	         Channel-Identifier:32AECB23433802@speechsynth
2979	         Voice-gender:neutral
2980	         Voice-age:25
2981	         Prosody-volume:medium
2982	         Content-Type:application/ssml+xml
2983	         Content-Length:...

2985	         <?xml version="1.0"?>
2986	           <speak version="1.0"
2987	                xmlns="http://www.w3.org/2001/10/synthesis"
2988	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2989	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2990	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2991	                xml:lang="en-US">
2992	            <p>
2993	             <s>You have 4 new messages.</s>
2994	             <s>The first is from Stephanie Williams
2995	                and arrived at <break/>
2996	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
2997	             <s>The subject is
2998	                <prosody rate="-20%">ski trip</prosody></s>
2999	            </p>
3000	           </speak>

3002	   S->C: MRCP/2.0 48 543260 200 IN-PROGRESS
3003	         Channel-Identifier:32AECB23433802@speechsynth
3004	         Speech-Marker:timestamp=857206027059

3006	   S->C: MRCP/2.0 73 SPEAK-COMPLETE 543260 COMPLETE
3007	         Channel-Identifier:32AECB23433802@speechsynth
3008	         Completion-Cause:000 normal
3009	         Speech-Marker:timestamp=857206039059
3010	                          SPEAK-COMPLETE Example

3012	8.13.  SPEECH-MARKER

3014	   This is an event generated by the synthesizer resource to the client
3015	   when the synthesizer encounters a marker tag in the speech markup it
3016	   is currently processing.  The request-id field in the header matches
3017	   the corresponding "SPEAK" request.  The request-state field indicates
3018	   IN-PROGRESS as the speech is still not complete.  The value of the
3019	   speech marker tag hit, describing where the synthesizer is in the
3020	   speech markup, is returned in the speech-marker header, along with an
3021	   NTP timestamp indicating the instant in the output speech stream that
3022	   the marker was encountered.  The SPEECH-MARKER event MUST also be
3023	   generated with a null marker value and output NTP timestamp when a
3024	   SPEAK request in Pending-State (i.e. in the queue) changes state to
3025	   IN-PROGRESS and starts speaking.  The NTP timestamp MUST be
3026	   synchronized with the RTP timestamp used to generate the speech
3027	   stream through standard RTCP machinery.

3029	   C->S: MRCP/2.0 434 SPEAK 543261
3030	         Channel-Identifier:32AECB23433802@speechsynth
3031	         Voice-gender:neutral
3032	         Voice-age:25
3033	         Prosody-volume:medium
3034	         Content-Type:application/ssml+xml
3035	         Content-Length:...

3037	         <?xml version="1.0"?>
3038	           <speak version="1.0"
3039	                xmlns="http://www.w3.org/2001/10/synthesis"
3040	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3041	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
3042	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
3043	                xml:lang="en-US">
3044	            <p>
3045	             <s>You have 4 new messages.</s>
3046	             <s>The first is from Stephanie Williams
3047	                and arrived at <break/>
3048	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
3049	                <mark name="here"/>
3050	             <s>The subject is
3051	                <prosody rate="-20%">ski trip</prosody>
3052	             </s>
3053	             <mark name="ANSWER"/>
3054	            </p>
3055	           </speak>

3057	   S->C: MRCP/2.0 48 543261 200 IN-PROGRESS
3058	         Channel-Identifier:32AECB23433802@speechsynth
3059	         Speech-Marker:timestamp=857205015059

3061	   S->C: MRCP/2.0 73 SPEECH-MARKER 543261 IN-PROGRESS
3062	         Channel-Identifier:32AECB23433802@speechsynth
3063	         Speech-Marker:timestamp=857206027059;here

3065	   S->C: MRCP/2.0 74 SPEECH-MARKER 543261 IN-PROGRESS
3066	         Channel-Identifier:32AECB23433802@speechsynth
3067	         Speech-Marker:timestamp=857206039059;ANSWER

3069	   S->C: MRCP/2.0 73 SPEAK-COMPLETE 543261 COMPLETE
3070	         Channel-Identifier:32AECB23433802@speechsynth
3071	         Completion-Cause:000 normal
3072	         Speech-Marker:timestamp=857207689259;ANSWER

3074	                           SPEECH-MARKER Example

3076	8.14.  DEFINE-LEXICON

3078	   The DEFINE-LEXICON method, from the client to the server, provides a
3079	   lexicon and tells the server to load, unload, activate or deactivate
3080	   the lexicon.

3082	   If the server resource is in the speaking or paused state, the server
3083	   MUST respond 402 (Method not valid in this state) failure status.

3085	   If the resource is in the idle state and is able to successfully
3086	   load/unload/activate/deactivate the lexicon the status MUST return a
3087	   success code and the request-state MUST be COMPLETE.

3089	   If the synthesizer could not define the lexicon for some reason, for
3090	   example because the download failed or the lexicon was in an
3091	   unsupported form, the server MUST respond with a failure status code
3092	   of 407, and a Completion-Cause header describing the failure reason.

3094	9.  Speech Recognizer Resource

3096	   The speech recognizer resource receives an incoming voice stream and
3097	   provides the client with an interpretation of what was spoken in
3098	   textual form.

3100	   The recognizer resource is controlled by MRCPv2 requests from the
3101	   client.  The recognizer resource can both respond to these requests
3102	   and generate asynchronous events to the client to indicate conditions
3103	   of interest during the processing of the method.

3105	   This section applies to the following resource types.
3106	   1.  speechrecog
3107	   2.  dtmfrecog

3109	   The difference between the above two resources is in their level of
3110	   support for recognition grammars.  The "dtmfrecog" resource type is
3111	   capable of recognizing only DTMF digits and hence accepts only DTMF
3112	   grammars.  It only generates barge-in for DTMF inputs and ignores
3113	   speech.  The "speechrecog" resource type can recognize regular speech
3114	   as well as DTMF digits and hence MUST support grammars describing
3115	   either speech or DTMF.  This resource generates barge-in events for
3116	   speech and/or DTMF.  By analyzing the grammars that are activated by
3117	   the RECOGNIZE method, it determines if a barge-in should occur for
3118	   speech and/or DTMF.  When the recognizer decides it needs to generate
3119	   barge-in it also generates a START-OF-INPUT event to the client.  The
3120	   recognition resource may support recognition in the normal or hotword
3121	   modes or both (although note that a single speechrecog resource does
3122	   not perform normal and hotword mode recognition simultaneously).  For
3123	   implementations where a single recognition resource does not support
3124	   both modes, or simultaneous normal and hotword recognition is
3125	   desired, the two modes can be invoked through separate resources
3126	   allocated to the same SIP dialog (with different MRCP session
3127	   identifiers) and share the RTP audio feed.

3129	   The capabilities of the recognition resource are enumerated below:

3131	   Normal Mode Recognition  Normal mode recognition tries to match all
3132	      of the speech or DTMF against the grammar and returns a no-match
3133	      status if the input fails to match or the method times out.
3134	   Hotword Mode Recognition  Hotword mode is where the recognizer looks
3135	      for a match against specific speech grammar or DTMF sequence and
3136	      ignores speech or DTMF that does not match.  The recognition
3137	      completes only for a successful match of grammar or if the client
3138	      cancels the request or if there is a a non-input or recognition
3139	      timeout.
3140	   Voice Enrolled Grammars  A recognition resource may optionally
3141	      support Voice Enrolled Grammars.  With this functionality,
3142	      enrollment is performed using a person's voice.  For example, a
3143	      list of contacts can be created and maintained by recording the
3144	      person's names using the caller's voice.  This technique is
3145	      sometimes also called speaker-dependent recognition.
3146	   Interpretation  A recognition resource may be employed strictly for
3147	      its natural language interpretation capabilities by supplying it
3148	      with a text string as input instead of speech.  In this mode the
3149	      resource takes text as input and produces an "interpretation" of
3150	      the input according to the supplied grammar.

3152	   Voice Enrollment has the concept of an enrollment session.  A session
3153	   to add a new phrase to a personal grammar involves the initial
3154	   enrollment followed by a repeat of enough utterances before
3155	   committing the new phrase to the personal grammar.  Each time an
3156	   utterance is recorded, it is compared for similarity with the other
3157	   samples and a clash test is performed against other entries in the
3158	   personal grammar to ensure there are no similar and confusable
3159	   entries.

3161	   Enrollment is done using a recognizer resource.  Controlling which
3162	   utterances are to be considered for enrollment of a new phrase is
3163	   done by setting a header (see Section 9.4.39) in the Recognize
3164	   request.

3166	   Interpretation is accomplished through the INTERPRET method
3167	   (Section 9.20) and the interpret-text header (Section 9.4.30).

3169	9.1.  Recognizer State Machine

3171	   The recognizer resource maintains a state machine to process MRCPv2
3172	   requests from the client.

3174	   Idle                   Recognizing               Recognized
3175	   State                  State                     State
3176	    |                       |                          |
3177	    |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->|
3178	    |<------STOP------------|<-----RECOGNIZE-----------|
3179	    |                       |                          |
3180	    |              |--------|              |-----------|
3181	    |       START-OF-INPUT  |       GET-RESULT         |
3182	    |              |------->|              |---------->|
3183	    |------------|          |                          |
3184	    |      DEFINE-GRAMMAR   |----------|               |
3185	    |<-----------|          | START-INPUT-TIMERS       |
3186	    |                       |<---------|               |
3187	    |------|                |                          |
3188	    |  INTERPRET            |                          |
3189	    |<-----|                |------|                   |
3190	    |                       |   RECOGNIZE              |
3191	    |-------|               |<-----|                   |
3192	    |      STOP                                        |
3193	    |<------|                                          |
3194	    |<-------------------STOP--------------------------|
3195	    |<-------------------DEFINE-GRAMMAR----------------|

3197	                         Recognizer State Machine

3199	   If a recognition resource supports voice enrolled grammars, starting
3200	   an enrollment session does not change the state of the recognizer
3201	   resource.  Once an enrollment session is started, then utterances are
3202	   enrolled by calling the RECOGNIZE method repeatedly.  The state of
3203	   the speech recognizer resource goes from IDLE to RECOGNIZING state
3204	   each time RECOGNIZE is called.

3206	9.2.  Recognizer Methods

3208	   The recognizer supports the following methods.

3210	   recognizer-method    =  recog-only-method
3211	                        /  enrollment-method

3213	   recog-only-method    =  "DEFINE-GRAMMAR"
3214	                        /  "RECOGNIZE"
3215	                        /  "INTERPRET"
3216	                        /  "GET-RESULT"
3217	                        /  "START-INPUT-TIMERS"
3218	                        /  "STOP"

3220	   It is OPTIONAL for a recognizer resource to support voice enrolled
3221	   grammars.  If the recognizer resource does support voice enrolled
3222	   grammars it MUST support the following methods.

3224	   enrollment-method    =  "START-PHRASE-ENROLLMENT"
3225	                        /  "ENROLLMENT-ROLLBACK"
3226	                        /  "END-PHRASE-ENROLLMENT"
3227	                        /  "MODIFY-PHRASE"
3228	                        /  "DELETE-PHRASE"

3230	9.3.  Recognizer Events

3232	   The recognizer may generate the following events.

3234	   recognizer-event     =  "START-OF-INPUT"
3235	                        /  "RECOGNITION-COMPLETE"
3236	                        /  "INTERPRETATION-COMPLETE"

3238	9.4.  Recognizer Header Fields

3240	   A recognizer message may contain headers containing request options
3241	   and information to augment the Method, Response or Event message it
3242	   is associated with.

3244	   recognizer-header    =  recog-only-header
3245	                        /  enrollment-header

3247	   recog-only-header    =  confidence-threshold
3248	                        /  sensitivity-level
3249	                        /  speed-vs-accuracy
3250	                        /  n-best-list-length
3251	                        /  no-input-timeout
3252	                        /  input-type
3253	                        /  recognition-timeout
3254	                        /  waveform-uri
3255	                        /  input-waveform-uri
3256	                        /  completion-cause
3257	                        /  completion-reason
3258	                        /  recognizer-context-block
3259	                        /  start-input-timers
3260	                        /  speech-complete-timeout
3261	                        /  speech-incomplete-timeout
3262	                        /  dtmf-interdigit-timeout
3263	                        /  dtmf-term-timeout
3264	                        /  dtmf-term-char
3265	                        /  failed-uri
3266	                        /  failed-uri-cause
3267	                        /  save-waveform
3268	                        /  media-type
3269	                        /  new-audio-channel
3270	                        /  speech-language
3271	                        /  ver-buffer-utterance
3272	                        /  recognition-mode
3273	                        /  cancel-if-queue
3274	                        /  hotword-max-duration
3275	                        /  hotword-min-duration
3276	                        /  interpret-text
3277	                        /  dtmf-buffer-time
3278	                        /  clear-dtmf-buffer
3279	                        /  early-no-match

3281	   If a recognition resource supports voice enrolled grammars, the
3282	   following headers are also used.

3284	   enrollment-header    =  num-min-consistent-pronunciations
3285	                        /  consistency-threshold
3286	                        /  clash-threshold
3287	                        /  personal-grammar-uri
3288	                        /  enroll-utterance
3289	                        /  phrase-id
3290	                        /  phrase-nl
3291	                        /  weight
3292	                        /  save-best-waveform
3293	                        /  new-phrase-id
3294	                        /  confusable-phrases-uri
3295	                        /  abort-phrase-enrollment

3297	   For enrollment-specific headers that can appear as part of
3298	   "SET-PARAMS" or "GET-PARAMS" methods, the following general rule
3299	   applies: the START-PHRASE-ENROLLMENT method must be invoked before
3300	   these headers may be set through the "SET-PARAMS" method or retrieved
3301	   through the "GET-PARAMS" method.

3303	   Note that the Waveform-URI header of the Recognizer resource can also
3304	   appear in the response to the END-PHRASE-ENROLLMENT method.

3306	9.4.1.  Confidence Threshold

3308	   When a recognition resource recognizes or matches a spoken phrase
3309	   with some portion of the grammar, it associates a confidence level
3310	   with that match.  The confidence-threshold header tells the
3311	   recognizer resource what confidence level the client considers a
3312	   successful match.  This is a float value between 0.0-1.0 indicating
3313	   the recognizer's confidence in the recognition.  If the recognizer
3314	   determines that there is no candidate match with a confidence that is
3315	   greater than the confidence threshold, then it MUST return no-match
3316	   as the recognition result.  This header MAY occur in RECOGNIZE,
3317	   "SET-PARAMS" or "GET-PARAMS".  The default value for this header is
3318	   implementation specific.

3320	   confidence-threshold     =  "Confidence-Threshold" ":" FLOAT CRLF

3322	9.4.2.  Sensitivity Level

3324	   To filter out background noise and not mistake it for speech, the
3325	   recognizer may support a variable level of sound sensitivity.  The
3326	   sensitivity-level header is a float value between 0.0 and 1.0 and
3327	   allows the client to set the sensitivity level for the recognizer.
3328	   This header MAY occur in RECOGNIZE, "SET-PARAMS" or "GET-PARAMS".  A
3329	   higher value for this header means higher sensitivity.  The default
3330	   value for this header is implementation specific.

3332	   sensitivity-level        =  "Sensitivity-Level" ":" FLOAT CRLF

3334	9.4.3.  Speed Vs Accuracy

3336	   Depending on the implementation and capability of the recognizer
3337	   resource it may be tunable towards Performance or Accuracy.  Higher
3338	   accuracy may mean more processing and higher CPU utilization, meaning
3339	   fewer active sessions per server and vice versa.  The value is a
3340	   float between 0.0 and 1.0.  A value of 0.0 means fastest recognition.
3341	   A value of 1.0 means best accuracy.  This header MAY occur in
3342	   RECOGNIZE, "SET-PARAMS" or "GET-PARAMS".  The default value for this
3343	   header is implementation specific.

3345	   speed-vs-accuracy        =  "Speed-Vs-Accuracy" ":" FLOAT CRLF

3347	9.4.4.  N Best List Length

3349	   When the recognizer matches an incoming stream with the grammar, it
3350	   may come up with more than one alternative match because of
3351	   confidence levels in certain words or conversation paths.  If this
3352	   header is not specified, by default, the recognition resource returns
3353	   only the best match above the confidence threshold.  The client, by
3354	   setting this header, can ask the recognition resource to send it more
3355	   than 1 alternative.  All alternatives must still be above the
3356	   confidence-threshold.  A value greater than one does not guarantee
3357	   that the recognizer will provide the requested number of
3358	   alternatives.  This header MAY occur in RECOGNIZE, "SET-PARAMS" or
3359	   "GET-PARAMS".  The minimum value for this header is 1.  The default
3360	   value for this header is 1.

3362	   n-best-list-length       =  "N-Best-List-Length" ":" 1*19DIGIT CRLF

3364	9.4.5.  Input Type

3366	   When the recognizer detects barge-in-able input and generates a
3367	   START-OF-INPUT event, that event MUST carry this header field to
3368	   specify where the input that caused the barge-in was DTMF or speech.

3370	   input-type         =  "Input-Type" ":"  inputs CRLF
3371	   inputs             =  "speech" / "dtmf"

3373	9.4.6.  No Input Timeout

3375	   When recognition is started and there is no speech detected for a
3376	   certain period of time, the recognizer can send a RECOGNITION-
3377	   COMPLETE event to the client with a Completion-Cause of "no-input-
3378	   timeout" and terminate the recognition operation.  The client can use
3379	   the no-input-timeout header to set this timeout.  The value is in
3380	   milliseconds and may range from 0 to an implementation specific
3381	   maximum value.  This header MAY occur in RECOGNIZE, "SET-PARAMS" or
3382	   "GET-PARAMS".  The default value is implementation specific.

3384	   no-input-timeout         =  "No-Input-Timeout" ":" 1*19DIGIT CRLF

3386	9.4.7.  Recognition Timeout

3388	   When recognition is started and there is no match for a certain
3389	   period of time, the recognizer can send a RECOGNITION-COMPLETE event
3390	   to the client and terminate the recognition operation.  The
3391	   Recognition-Timeout header allows the client to set this timeout
3392	   value.  The value is in milliseconds.  The value for this header
3393	   ranges from 0 to an implementation specific maximum value.  The
3394	   default value is 10 seconds.  This header MAY occur in RECOGNIZE,
3395	   SET-PARAMS or GET-PARAMS.

3397	   recognition-timeout      =  "Recognition-Timeout" ":" 1*19DIGIT CRLF

3399	9.4.8.  Waveform URI

3401	   If the Save-Waveform header is set to true, the recognizer MUST
3402	   record the incoming audio stream of the recognition into a stored
3403	   form and provide a URI for the client to access it.  This header MUST
3404	   be present in the RECOGNITION-COMPLETE event if the Save-Waveform
3405	   header was set to true.  The value of the header MUST be empty if
3406	   there was some error condition preventing the server from recording.
3407	   Otherwise, the URI generated by the server MUST be unambiguous across
3408	   the server and all its recognition sessions.  The content associated
3409	   with the URI MUST be available to the client until the MRCPv2 session
3410	   terminates.

3412	   Similarly, if the Save-Best-Waveform header is set to true, the
3413	   recognizer MUST save the audio stream for the best repetition of the
3414	   phrase that was used during the enrollment session.  The recognizer
3415	   MUST then record the recognized audio and make it available to the
3416	   client by returning a URI in the Waveform-URI header in the response
3417	   to the END-PHRASE-ENROLLMENT method.  The value of the header MUST be
3418	   empty if there was some error condition preventing the server from
3419	   recording.  Otherwise, the URI generated by the server MUST be
3420	   unambiguous across the server and all its recognition sessions.  The
3421	   content associated with the URI MUST be available to the client until
3422	   the MRCPv2 session terminates.

3424	   The server MUST also return the size in bytes and the duration in
3425	   milliseconds of the recorded audio wave-form as parameters associated
3426	   with the header.

3428	   waveform-uri             =  "Waveform-URI" ":" ["<" Uri ">"
3429	                               ";" "size" "=" 1*19DIGIT
3430	                               ";" "duration" "=" 1*19DIGIT] CRLF

3432	9.4.9.  Media Type

3434	   This header MAY be specified in the SET-PARAMS, GET-PARAMS or the
3435	   RECOGNIZE methods and tells the server resource the Media Type in
3436	   which to store captured audio or video such as the one captured and
3437	   returned by the Waveform-URI header.

3439	   Media-type               =  "Media-Type" ":" media-type-value
3440	                               CRLF

3442	9.4.10.  Input-Waveform-URI

3444	   This optional header specifies a URI pointing to audio content to be
3445	   processed by the RECOGNIZE operation.  This enables the client to
3446	   request recognition from a specified buffer or audio file.

3448	   input-waveform-uri       =  "Input-Waveform-URI" ":" Uri CRLF

3450	9.4.11.  Completion Cause

3452	   This header MUST be part of a RECOGNITION-COMPLETE, event coming from
3453	   the recognizer resource to the client.  It indicates the reason
3454	   behind the RECOGNIZE method completion.  This header MUST be sent in
3455	   the DEFINE-GRAMMAR and RECOGNIZE responses, if they return with a
3456	   failure status and a COMPLETE state.

3458	   completion-cause         =  "Completion-Cause" ":" 3DIGIT SP
3459	                               1*VCHAR CRLF

3461	   +---------+--------------------------+------------------------------+
3462	   | Cause-C | Cause-Name               | Description                  |
3463	   | ode     |                          |                              |
3464	   +---------+--------------------------+------------------------------+
3465	   | 000     | success                  | RECOGNIZE completed with a   |
3466	   |         |                          | match or DEFINE-GRAMMAR      |
3467	   |         |                          | succeeded in downloading and |
3468	   |         |                          | compiling the grammar        |
3469	   | 001     | no-match                 | RECOGNIZE completed, but no  |
3470	   |         |                          | match was found              |
3471	   | 002     | no-input-timeout         | RECOGNIZE completed without  |
3472	   |         |                          | a match due to a             |
3473	   |         |                          | no-input-timeout             |
3474	   | 003     | hotword-maxtime          | RECOGNIZE in hotword mode    |
3475	   |         |                          | completed without a match    |
3476	   |         |                          | due to a recognition-timeout |
3477	   | 004     | grammar-load-failure     | RECOGNIZE failed due grammar |
3478	   |         |                          | load failure.                |
3479	   | 005     | grammar-compilation-fail | RECOGNIZE failed due to      |
3480	   |         | ure                      | grammar compilation failure. |
3481	   | 006     | recognizer-error         | RECOGNIZE request terminated |
3482	   |         |                          | prematurely due to a         |
3483	   |         |                          | recognizer error.            |
3484	   | 007     | speech-too-early         | RECOGNIZE request terminated |
3485	   |         |                          | because speech was too       |
3486	   |         |                          | early.  This happens when    |
3487	   |         |                          | the audio stream is already  |
3488	   |         |                          | "in-speech" when the         |
3489	   |         |                          | RECOGNIZE request was        |
3490	   |         |                          | received.                    |
3491	   | 008     | success-maxtime          | RECOGNIZE request terminated |
3492	   |         |                          | because speech was too long  |
3493	   |         |                          | but whatever was spoken till |
3494	   |         |                          | that point was a full match. |
3495	   | 009     | uri-failure              | Failure accessing a URI.     |
3496	   | 010     | language-unsupported     | Language not supported.      |
3497	   | 011     | cancelled                | A new RECOGNIZE cancelled    |
3498	   |         |                          | this one, or a prior         |
3499	   |         |                          | RECOGNIZE failed while this  |
3500	   |         |                          | one was still in the queue.  |
3501	   | 012     | semantics-failure        | Recognition succeeded but    |
3502	   |         |                          | semantic interpretation of   |
3503	   |         |                          | the recognized input failed. |
3504	   |         |                          | The RECOGNITION-COMPLETE     |
3505	   |         |                          | event MUST contain the       |
3506	   |         |                          | Recognition result with only |
3507	   |         |                          | input text and no            |
3508	   |         |                          | interpretation.              |
3509	   | 013     | partial-match            | Speech Incomplete timeout    |
3510	   |         |                          | expired before there was a   |
3511	   |         |                          | full match.  But whatever    |
3512	   |         |                          | that was spoken till that    |
3513	   |         |                          | point was a partial match to |
3514	   |         |                          | one or more grammars.        |
3515	   | 014     | partial-match-maxtime    | The Recognition-Timer        |
3516	   |         |                          | expired before full match    |
3517	   |         |                          | was achieved.  But whatever  |
3518	   |         |                          | was spoken till that point   |
3519	   |         |                          | was a partial match to one   |
3520	   |         |                          | or more grammars.            |
3521	   | 015     | no-match-maxtime         | The Recognition-Timer        |
3522	   |         |                          | expired.  Whatever was       |
3523	   |         |                          | spoken till that point       |
3524	   |         |                          | either did not match any of  |
3525	   |         |                          | the grammars.  This cause    |
3526	   |         |                          | could also be returned if    |
3527	   |         |                          | the recognizer does not      |
3528	   |         |                          | support detecting partial    |
3529	   |         |                          | grammar matches.             |
3530	   | 016     | grammar-definition-failu | any DEFINE-GRAMMAR error     |
3531	   |         | re                       | other than                   |
3532	   |         |                          | grammar-load-failure and     |
3533	   |         |                          | grammar-compilation-failure. |
3534	   +---------+--------------------------+------------------------------+

3536	9.4.12.  Completion Reason

3538	   This header MAY be specified in a RECOGNITION-COMPLETE event coming
3539	   from the recognizer resource to the client.  This contains the reason
3540	   text behind the RECOGNIZE request completion.  The server uses this
3541	   header to communicate text describing the reason for the failure,
3542	   such as the specific error encountered in parsing a grammar markup.

3544	   The completion reason text is provided for client use in logs and for
3545	   debugging and instrumentation purposes.  Clients MUST NOT interpret
3546	   the completion reason text.

3548	   completion-reason        =  "Completion-Reason" ":"
3549	                               quoted-string CRLF

3551	9.4.13.  Recognizer Context Block

3553	   This header MAY be sent as part of the "SET-PARAMS" or "GET-PARAMS"
3554	   request.  If the "GET-PARAMS" method contains this header with no
3555	   value, then it is a request to the recognizer to return the
3556	   recognizer context block.  The response to such a message MAY contain
3557	   a recognizer context block as a typed media message body.  If the
3558	   server returns a recognizer context block, the response MUST contain
3559	   this header and its value MUST match the Content-ID of the
3560	   corresponding media block.

3562	   If the "SET-PARAMS" method contains this header, it MUST also contain
3563	   a message body containing the recognizer context data, and a
3564	   Content-ID matching this header value.  This Content-ID MUST match
3565	   the Content-ID that came with the context data during the
3566	   "GET-PARAMS" operation.

3568	   An implementation choosing to use this mechanism to hand off
3569	   recognizer context data between servers MUST distinguish its
3570	   implementation-specific block of data by using an IANA-registered
3571	   content type in the IANA Media Type vendor tree.

3573	   recognizer-context-block  =  "Recognizer-Context-Block" ":"
3574	                                1*VCHAR CRLF

3576	9.4.14.  Start Input Timers

3578	   This header MAY be sent as part of the RECOGNIZE request.  A value of
3579	   false tells the recognizer to start recognition, but not to start the
3580	   no-input timer yet.  The recognizer MUST NOT start the timers until
3581	   the client sends a START-INPUT-TIMERS request to the recognizer.
3582	   This is useful in the scenario when the recognizer and synthesizer
3583	   engines are not part of the same session.  In such configurations,
3584	   when a kill-on-barge-in prompt is being played, the client wants the
3585	   RECOGNIZE request to be simultaneously active so that it can detect
3586	   and implement kill-on-barge-in.  However, the recognizer ought not
3587	   start the no-input timers until the prompt is finished.  The default
3588	   value is "true".

3590	   start-input-timers  =  "Start-Input-Timers" ":" BOOLEAN CRLF

3592	9.4.15.  Speech Complete Timeout

3594	   This header specifies the length of silence required following user
3595	   speech before the speech recognizer finalizes a result (either
3596	   accepting it or generating a nomatch event).  The speech-complete-
3597	   timeout value applies when the recognizer currently has a complete
3598	   match against an active grammar, and specifies how long the
3599	   recognizer MUST wait for more input before declaring a match.  By
3600	   contrast, the incomplete timeout is used when the speech is an
3601	   incomplete match to an active grammar.  The value is in milliseconds.

3603	  speech-complete-timeout = "Speech-Complete-Timeout" ":" 1*19DIGIT CRLF
3604	   A long speech-complete-timeout value delays the result to the client
3605	   and therefore makes the application's response to a user slow.  A
3606	   short speech-complete-timeout may lead to an utterance being broken
3607	   up inappropriately.  Reasonable speech complete timeout values are
3608	   typically in the range of 0.3 seconds to 1.0 seconds.  The value for
3609	   this header ranges from 0 to an implementation specific maximum
3610	   value.  The default value for this header is implementation specific.
3611	   This header MAY occur in RECOGNIZE, "SET-PARAMS" or "GET-PARAMS".

3613	9.4.16.  Speech Incomplete Timeout

3615	   This header specifies the required length of silence following user
3616	   speech after which a recognizer finalizes a result.  The incomplete
3617	   timeout applies when the speech prior to the silence is an incomplete
3618	   match of all active grammars.  In this case, once the timeout is
3619	   triggered, the partial result is rejected (with a Completion-Cause of
3620	   "partial-match").  The value is in milliseconds.  The value for this
3621	   header ranges from 0 to an implementation specific maximum value.
3622	   The default value for this header is implementation specific.

3624	   speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":" 1*19DIGIT
3625	                                CRLF

3627	   The speech-incomplete-timeout also applies when the speech prior to
3628	   the silence is a complete match of an active grammar, but where it is
3629	   possible to speak further and still match the grammar.  By contrast,
3630	   the complete timeout is used when the speech is a complete match to
3631	   an active grammar and no further spoken words can continue to
3632	   represent a match.

3634	   A long speech-incomplete-timeout value delays the result to the
3635	   client and therefore makes the application's response to a user slow.
3636	   A short speech-incomplete-timeout may lead to an utterance being
3637	   broken up inappropriately.

3639	   The speech-incomplete-timeout is usually longer than the speech-
3640	   complete-timeout to allow users to pause mid-utterance (for example,
3641	   to breathe).  This header MAY occur in RECOGNIZE, "SET-PARAMS" or
3642	   "GET-PARAMS".

3644	9.4.17.  DTMF Interdigit Timeout

3646	   This header specifies the inter-digit timeout value to use when
3647	   recognizing DTMF input.  The value is in milliseconds.  The value for
3648	   this header ranges from 0 to an implementation specific maximum
3649	   value.  The default value is 5 seconds.  This header MAY occur in
3650	   RECOGNIZE, "SET-PARAMS" or "GET-PARAMS".

3652	  dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":" 1*19DIGIT CRLF

3654	9.4.18.  DTMF Term Timeout

3656	   This header specifies the terminating timeout to use when recognizing
3657	   DTMF input.  The DTMF-Term-Timeout applies only when no additional
3658	   input is allowed by the grammar; otherwise, the DTMF-Interdigit-
3659	   Timeout applies.  The value is in milliseconds.  The value for this
3660	   header ranges from 0 to an implementation specific maximum value.
3661	   The default value is 10 seconds.  This header MAY occur in RECOGNIZE,
3662	   "SET-PARAMS" or "GET-PARAMS".

3664	   dtmf-term-timeout        =  "DTMF-Term-Timeout" ":" 1*19DIGIT CRLF

3666	9.4.19.  DTMF-Term-Char

3668	   This header specifies the terminating DTMF character for DTMF input
3669	   recognition.  The default value is NULL which is indicated by an
3670	   empty header value.  This header MAY occur in RECOGNIZE, "SET-PARAMS"
3671	   or "GET-PARAMS".

3673	   dtmf-term-char           =  "DTMF-Term-Char" ":" VCHAR CRLF

3675	9.4.20.  Failed URI

3677	   When a recognizer needs to fetch or access a URI and the access fails
3678	   the server SHOULD provide the failed URI in this header in the method
3679	   response, unless there are multiple URI failures, in which case one
3680	   of the failed URIs MUST be provided in this header in the method
3681	   response.

3683	   failed-uri               =  "Failed-URI" ":" Uri CRLF

3685	9.4.21.  Failed URI Cause

3687	   When a recognizer method needs a recognizer to fetch or access a URI
3688	   and the access fails the server MUST provide the URI specific or
3689	   protocol specific response code for the URI in the Failed-URI header
3690	   through this header in the method response.  The value encoding is
3691	   UTF-8 to accommodate any access protocol, some of which might have a
3692	   response string instead of a numeric response code.

3694	   failed-uri-cause         =  "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

3696	9.4.22.  Save Waveform

3698	   This header allows the client to request the recognizer resource to
3699	   save the audio input to the recognizer.  The recognizer resource MUST
3700	   then attempt to record the recognized audio, without endpointing, and
3701	   make it available to the client in the form of a URI returned in the
3702	   Waveform-URI header in the RECOGNITION-COMPLETE event.  If there was
3703	   an error in recording the stream or the audio content is otherwise
3704	   not available, the recognizer MUST return an empty Waveform-URI
3705	   header.  The default value for this field is "false".  This header
3706	   MAY occur in RECOGNIZE, "SET-PARAMS" or "GET-PARAMS".

3708	   save-waveform            =  "Save-Waveform" ":" BOOLEAN CRLF

3710	9.4.23.  New Audio Channel

3712	   This header MAY be specified in a RECOGNIZE request and allows the
3713	   client to tell the server that, from this point on, further input
3714	   audio comes from a different audio source, channel or speaker.  If
3715	   the recognition resource had collected any input statistics or
3716	   adaptation state, the recognition resource MUST do what is
3717	   appropriate for the specific recognition technology, which includes
3718	   but is not limited to discarding any collected input statistics or
3719	   adaptation state before starting the RECOGNIZE request.  Note that if
3720	   there are multiple resources that are sharing a media pipe and are
3721	   collecting or using this data, and the client issues this header to
3722	   one of the resources, the reset operation applies to all resources
3723	   that use the shared media stream.  This helps in a number of use
3724	   cases, including where the client wishes to reuse an open recognition
3725	   session with an existing media session for multiple telephone calls.

3727	   new-audio-channel        =  "New-Audio-Channel" ":" BOOLEAN
3728	                               CRLF

3730	9.4.24.  Speech-Language

3732	   This header specifies the language of recognition grammar data within
3733	   a session or request, if it is not specified within the data.  The
3734	   value of this header MUST follow RFC4646 [RFC4646] for its values.
3735	   This MAY occur in DEFINE-GRAMMAR, RECOGNIZE, "SET-PARAMS" or
3736	   "GET-PARAMS" request.

3738	   speech-language          =  "Speech-Language" ":" 1*VCHAR CRLF

3740	9.4.25.  Ver-Buffer-Utterance

3742	   This header lets the client request the server to buffer the
3743	   utterance associated with this recognition request into a buffer
3744	   available to a co-resident verification resource.  The buffer is
3745	   shared across resources within a session and is allocated when a
3746	   verification resource is added to this session.  The client MUST NOT
3747	   send this header unless a verification resource is instantiated for
3748	   the session.  The buffer is released when the verification resource
3749	   is released from the session.

3751	9.4.26.  Recognition-Mode

3753	   This header specifies what mode the RECOGNIZE method will operate in.
3754	   The value choices are "normal" or "hotword".  If the value is
3755	   "normal", the RECOGNIZE starts matching speech and DTMF to the
3756	   grammars specified in the RECOGNIZE request.  If any portion of the
3757	   speech does not match the grammar, the RECOGNIZE command completes
3758	   with a no-match status.  Timers may be active to detect speech in the
3759	   audio (see Section 9.4.14), so the RECOGNIZE method may complete
3760	   because of a timeout waiting for speech.  If the value of this header
3761	   is "hotword", the RECOGNIZE method operates in hotword mode, where it
3762	   only looks for the particular keywords or DTMF sequences specified in
3763	   the grammar and ignores silence or other speech in the audio stream.
3764	   The default value for this header is "normal".  This header MAY occur
3765	   on the RECOGNIZE method.

3767	   recognition-mode         =  "Recognition-Mode" ":" 1*ALPHA CRLF

3769	9.4.27.  Cancel-If-Queue

3771	   This header specifies what will happen if the client attempts to
3772	   invoke another RECOGNIZE method when this RECOGNIZE request is
3773	   already in progress for the resource.  The value for this header is
3774	   Boolean.  A value of "true" means the server MUST terminate this
3775	   RECOGNIZE request, with a Completion-Cause of "cancelled", if the
3776	   client issues another RECOGNIZE request for the same resource.  A
3777	   value of "false" for this header indicates to the server that this
3778	   RECOGNIZE request will continue to completion and if the client
3779	   issues more RECOGNIZE requests to the same resource, they are queued.
3780	   When the currently active RECOGNIZE request is stopped or completes
3781	   with a successful match, the first RECOGNIZE method in the queue
3782	   becomes active.  If the current RECOGNIZE fails, all RECOGNIZE
3783	   methods in the pending queue are cancelled and each generates a
3784	   RECOGNITION-COMPLETE event with a Completion-Cause of "cancelled".
3785	   This header MUST be present in every RECOGNIZE request.  There is no
3786	   default value.

3788	   cancel-if-queue          =  "Cancel-If-Queue" ":" BOOLEAN CRLF

3790	9.4.28.  Hotword-Max-Duration

3792	   This header MAY be sent in a hotword mode RECOGNIZE request.  It
3793	   specifies the maximum length of an utterance (in seconds) that will
3794	   be considered for Hotword recognition.  This header, along with
3795	   Hotword-Min-Duration, can be used to tune performance by preventing
3796	   the recognizer from evaluating utterances that are too short or too
3797	   long to be one of the hotwords in the grammar(s).  The value is in
3798	   milliseconds.  The default is implementation dependent.  If present
3799	   in a RECOGNIZE request specifying a mode other than "hotword", the
3800	   header is ignored.

3802	   hotword-max-duration     =  "Hotword-Max-Duration" ":" 1*19DIGIT
3803	                               CRLF

3805	9.4.29.  Hotword-Min-Duration

3807	   This header MAY be sent in a hotword mode RECOGNIZE request.  It
3808	   specifies the minimum length of an utterance (in seconds) that will
3809	   be considered for Hotword recognition.  This header, along with
3810	   Hotword-Max-Duration, can be used to tune performance by preventing
3811	   the recognizer from evaluating utterances that are too short or too
3812	   long to be one of the hotwords in the grammar(s).  The value is in
3813	   milliseconds.  The default value is implementation dependent.  If
3814	   present in a RECOGNIZE request specifying a mode other than
3815	   "hotword", the header is ignored.

3817	   hotword-min-duration     =  "Hotword-Min-Duration" ":" 1*19DIGIT CRLF

3819	9.4.30.  Interpret-Text

3821	   The value of this header is used to provide a pointer to the text for
3822	   which a natural language interpretation is desired.  The value is
3823	   either a URI or text.  If the value is a URI, it MUST be a Content-ID
3824	   that refers to an entity of type text/plain in the body of the
3825	   message.  Otherwise, the server MUST treat the value as the text to
3826	   be interpreted.  This header MUST be used when invoking the INTERPRET
3827	   method.

3829	   interpret-text           =  "Interpret-Text" ":" 1*VCHAR CRLF

3831	9.4.31.  DTMF-Buffer-Time

3833	   This header MAY be specified in a GET-PARAMS or SET-PARAMS method and
3834	   is used to specify the size in time, in milliseconds, of the
3835	   typeahead buffer for the recognizer.  This is the buffer that
3836	   collects DTMF digits as they are pressed even when there is no
3837	   RECOGNIZE command active.  When a subsequent RECOGNIZE method is
3838	   received it MAY look to this buffer to match the RECOGNIZE request.
3839	   If the digits in the buffer is not sufficient then it can continue to
3840	   listen to more digits to match the grammar.  The default size of this
3841	   DTMF buffer is platform specific.

3843	   dtmf-buffer-time  =  "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF

3845	9.4.32.  Clear-DTMF-Buffer

3847	   This header MAY be specified in a RECOGNIZE method and is used to
3848	   tell the recognizer to clear the DTMF type-ahead buffer before
3849	   starting the recognize.  The default value of this header is FALSE,
3850	   which does not clear the typeahead buffer before starting the
3851	   RECOGNIZE method.  If this header is specified to be TRUE, then the
3852	   recognize will clear the DTMF buffer before starting recognition.
3853	   This means digits pressed by the caller before the RECOGNIZE command
3854	   was issued are discarded.

3856	   clear-dtmf-buffer  = "Clear-DTMF-Buffer" ":" BOOLEAN CRLF

3858	9.4.33.  Early-No-Match

3860	   This header MAY be specified in a RECOGNIZE method and is used to
3861	   tell the recognizer that it MUST NOT wait for the end of speech
3862	   before processing the collected speech to match active grammars.  A
3863	   value of TRUE indicates the recognizer MUST do early matching.  The
3864	   default value for this header if not specified is FALSE.  If the
3865	   recognizer does not support the processing of the collected audio
3866	   before the end of speech this header field can be safely ignored.

3868	   early-no-match  = "Early-No-Match" ":" BOOLEAN CRLF

3870	9.4.34.  Num-Min-Consistent-Pronunciations

3872	   This header MAY be specified in a START-PHRASE-ENROLLMENT,
3873	   "SET-PARAMS", or "GET-PARAMS" method and is used to specify the
3874	   minimum number of consistent pronunciations that must be obtained to
3875	   voice enroll a new phrase.  The minimum value is 1.  The default
3876	   value is implementation specific and MAY be greater than 1.

3878	   num-min-consistent-pronunciations  =
3879	                 "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF

3881	9.4.35.  Consistency-Threshold

3883	   This header MAY be sent as part of the START-PHRASE-ENROLLMENT,
3884	   "SET-PARAMS", or "GET-PARAMS" method.  Used during voice-enrollment,
3885	   this header specifies how similar to a previously enrolled
3886	   pronunciation of the same phrase an utterance needs to be in order to
3887	   be considered "consistent."  The higher the threshold, the closer the
3888	   match between an utterance and previous pronunciations must be for
3889	   the pronunciation to be considered consistent.  The range for this
3890	   threshold is a float value between is 0.0 to 1.0.  The default value
3891	   for this header is implementation specific.

3893	   consistency-threshold    =  "Consistency-Threshold" ":" FLOAT CRLF

3895	9.4.36.  Clash-Threshold

3897	   This header MAY be sent as part of the START-PHRASE-ENROLLMENT, SET-
3898	   PARAMS, or "GET-PARAMS" method.  Used during voice-enrollment, this
3899	   header specifies how similar the pronunciations of two different
3900	   phrases can be before they are considered to be clashing.  For
3901	   example, pronunciations of phrases such as "John Smith" and "Jon
3902	   Smits" may be so similar that they are difficult to distinguish
3903	   correctly.  A smaller threshold reduces the number of clashes
3904	   detected.  The range for this threshold is float value between 0.0
3905	   and 1.0.  The default value for this header is implementation
3906	   specific.  Clash testing can be turned off completely by setting the
3907	   Clash-Threshold header value to 0.

3909	   clash-threshold          =  "Clash-Threshold" ":" FLOAT CRLF

3911	9.4.37.  Personal-Grammar-URI

3913	   This header specifies the speaker-trained grammar to be used or
3914	   referenced during enrollment operations.  Phrases are added to this
3915	   grammar during enrollment.  For example, a contact list for user
3916	   "Jeff" could be stored at the Personal-Grammar-URI
3917	   "http://myserver.example.com/myenrollmentdb/jeff-list".  The
3918	   generated grammar syntax MAY be implementation specific.  There is no
3919	   default value for this header.  This header MAY be sent as part of
3920	   the START-PHRASE-ENROLLMENT, SET-PARAMS, or "GET-PARAMS" method.

3922	   personal-grammar-uri     =  "Personal-Grammar-URI" ":" Uri CRLF

3924	9.4.38.  Enroll-Utterance

3926	   This header MAY be specified in the RECOGNIZE method.  If this header
3927	   is set to "true" and an Enrollment is active, the RECOGNIZE command
3928	   MUST add the collected utterance to the personal grammar that is
3929	   being enrolled.  The way in which this occurs is engine-specific and
3930	   may be an area of future standardization.  The default value for this
3931	   header is "false".

3933	   enroll-utterance     =  "Enroll-Utterance" ":" boolean-Value CRLF

3935	9.4.39.  Phrase-Id

3937	   This header in a request identifies a phrase in an existing personal
3938	   grammar for which enrollment is desired.  It is also returned to the
3939	   client in the RECOGNIZE complete event.  This header MAY occur in
3940	   START-PHRASE-ENROLLMENT, MODIFY-PHRASE or DELETE-PHRASE requests.
3941	   There is no default value for this header.

3943	   phrase-id                =  "Phrase-ID" ":" 1*VCHAR CRLF

3945	9.4.40.  Phrase-NL

3947	   This string specifies the interpreted text to be returned when the
3948	   phrase is recognized.  This header MAY occur in START-PHRASE-
3949	   ENROLLMENT and MODIFY-PHRASE requests.  There is no default value for
3950	   this header.

3952	   phrase-nl                =  "Phrase-NL" ":" 1*UTFCHAR CRLF

3954	9.4.41.  Weight

3956	   The value of this header represents the occurrence likelihood of a
3957	   phrase in an enrolled grammar.  When using grammar enrollment, the
3958	   system is essentially constructing a grammar segment consisting of a
3959	   list of possible match phrases.  This can be thought of to be similar
3960	   to the dynamic construction of a <one-of> tag in the W3C grammar
3961	   specification.  Each enrolled-phrase becomes an item in the list that
3962	   can be matched against spoken input similar to the <item> within a
3963	   <one-of> list.  This header allows you to assign a weight to the
3964	   phrase (i.e., <item> entry) in the <one-of> list that is enrolled.
3965	   Grammar weights are normalized to a sum of one at grammar compilation
3966	   time, so a weight value of 1 for each phrase in an enrolled grammar
3967	   list indicates all items in that list have the same weight.  This
3968	   header MAY occur in START-PHRASE-ENROLLMENT and MODIFY-PHRASE
3969	   requests.  The default value for this header is implementation
3970	   specific.

3972	   weight                   =  "Weight" ":" weight-value CRLF
3973	   weight-value             =  FLOAT

3975	9.4.42.  Save-Best-Waveform

3977	   This header allows the client to request the recognizer resource to
3978	   save the audio stream for the best repetition of the phrase that was
3979	   used during the enrollment session.  The recognizer MUST attempt to
3980	   record the recognized audio and make it available to the client in
3981	   the form of a URI returned in the Waveform-URI header in the response
3982	   to the END-PHRASE-ENROLLMENT method.  If there was an error in
3983	   recording the stream or the audio data is otherwise not available,
3984	   the recognizer MUST return an empty Waveform-URI header.  This header
3985	   MAY occur in the START-PHRASE-ENROLLMENT, SET-PARAMS, and GET-PARAMS
3986	   methods.

3988	   save-best-waveform  =  "Save-Best-Waveform" ":" BOOLEAN CRLF

3990	9.4.43.  New-Phrase-Id

3992	   This header replaces the id used to identify the phrase in a personal
3993	   grammar.  The recognizer returns the new id when using an enrollment
3994	   grammar.  This header MAY occur in MODIFY-PHRASE requests.

3996	   new-phrase-id            =  "New-Phrase-ID" ":" 1*VCHAR CRLF

3998	9.4.44.  Confusable-Phrases-URI

4000	   This header specifies a grammar that defines invalid phrases for
4001	   enrollment.  For example, typical applications do not allow an
4002	   enrolled phrase that is also a command word.  This header MAY occur
4003	   in RECOGNIZE requests that are part of an enrollment session.

4005	   confusable-phrases-uri   =  "Confusable-Phrases-URI" ":" Uri CRLF

4007	9.4.45.  Abort-Phrase-Enrollment

4009	   This header can optionally be specified in the END-PHRASE-ENROLLMENT
4010	   method to abort the phrase enrollment, rather than committing the
4011	   phrase to the personal grammar.

4013	   abort-phrase-enrollment  =  "Abort-Phrase-Enrollment" ":"
4014	                               BOOLEAN CRLF

4016	9.5.  Recognizer Message Body

4018	   A recognizer message may carry additional data associated with the
4019	   request, response or event.  The client may provide the grammar to be
4020	   recognized in DEFINE-GRAMMAR or RECOGNIZE requests.  When one or more
4021	   grammars are specified using the DEFINE-GRAMMAR method, the server
4022	   MUST attempt to fetch, compile and optimize the grammar before
4023	   returning a response to the DEFINE-GRAMMAR method.  A RECOGNIZE
4024	   request MUST completely specify the grammars to be active during the
4025	   recognition operation, except when the RECOGNIZE method is being used
4026	   to enroll a grammar.  During grammar enrollment, such grammars are
4027	   optional.  The server resource may send the recognition results in
4028	   the RECOGNITION-COMPLETE event or the GET-RESULT response.  Grammars
4029	   and recognition results are carried in the message body of the
4030	   corresponding MRCPv2 messages.

4032	9.5.1.  Recognizer Grammar Data

4034	   Recognizer grammar data from the client to the server can be provided
4035	   inline or by reference.  Either way, grammar data is carried as typed
4036	   media entities in the message body of the RECOGNIZE or DEFINE-GRAMMAR
4037	   request.  All MRCPv2 servers MUST accept grammars in the XML form
4038	   (Media Type application/srgs+xml) of the W3C's XML-based Speech
4039	   Grammar Markup Format (SRGS) [W3C.REC-speech-grammar-20040316] and
4040	   MAY accept grammars in other formats.  Examples include but are not
4041	   limited to:
4042	   o  the ABNF form (Media Type application/srgs) of SRGS
4043	   o  Sun's Java Speech Grammar Format [refs.javaSpeechGrammarFormat]
4044	   Additionally, MRCPv2 servers MAY support the Semantic Interpretation
4045	   for Speech Recognition (SISR)
4046	   [W3C.REC-semantic-interpretation-20070405] specification.

4048	   When a grammar is specified inline in the request, the client MUST
4049	   provide a Content-ID for that grammar as part of the content headers.
4050	   If there is no space on the server to store the inline grammar, the
4051	   request MUST return with a Completion-Cause code of 016 grammar-
4052	   definition-failure.  Otherwise, the server MUST associate the inline
4053	   grammar block with that Content-ID and MUST store it on the server
4054	   for the duration of the session.  However, if the Content-ID is
4055	   redefined later in the session through a subsequent DEFINE-GRAMMAR,
4056	   the inline grammar previously associated with the Content-ID MUST be
4057	   freed.  If the Content-ID is redefined through a subsequent DEFINE-
4058	   GRAMMAR with an empty message body (i.e. no grammar definition), then
4059	   in addition to freeing any grammar previously associated with the
4060	   Content-ID the server MUST clear all bindings and associations to the
4061	   Content-ID.  Unless and until subsequently redefined, this URI MUST
4062	   be interpreted by the server as one that has never been set.

4064	   Grammars that have been associated with a Content-ID can be
4065	   referenced through the "session:" URI scheme (see Section 13.6).  For
4066	   example:
4067	   session:help@root-level.store

4069	   Grammar data MAY be specified using external URI references.  To do
4070	   so, the client uses a body of Media Type text/uri-list RFC2483

4072	   [RFC2483] to list the one or more URIs that point to the grammar
4073	   data.  The client can use a body of Media Type text/grammar-ref-list
4074	   if it wants to assign weights to the list of grammar URI.  All MRCPv2
4075	   servers MUST support grammar access using the HTTP and HTTPS uri
4076	   schemes.

4078	   If the grammar data the client wishes to be used on a request
4079	   consists of a mix of URI and inline grammar data the client uses the
4080	   multipart/mixed Media Type to enclose the text/uri-list, application/
4081	   srgs or application/srgs+xml content entities.  The character set and
4082	   encoding used in the grammar data are specified using to standard
4083	   Media Type definitions.

4085	   When more than one grammar URI or inline grammar block is specified
4086	   in a message body of the RECOGNIZE request, the server interprets
4087	   this as a list of grammar alternatives to match against.

4089	   Content-Type:application/srgs+xml
4090	   Content-ID:<request1@form-level.store>
4091	   Content-Length:...

4093	   <?xml version="1.0"?>

4095	   <!-- the default grammar language is US English -->
4096	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4097	            xml:lang="en-US" version="1.0" root="request">

4099	   <!-- single language attachment to tokens -->
4100	         <rule id="yes">
4101	               <one-of>
4102	                     <item xml:lang="fr-CA">oui</item>
4103	                     <item xml:lang="en-US">yes</item>
4104	               </one-of>
4105	         </rule>

4107	   <!-- single language attachment to a rule expansion -->
4108	         <rule id="request">
4109	               may I speak to
4110	               <one-of xml:lang="fr-CA">
4111	                     <item>Michel Tremblay</item>
4112	                     <item>Andre Roy</item>
4113	               </one-of>
4114	         </rule>

4116	         <!-- multiple language attachment to a token -->
4117	         <rule id="people1">
4118	               <token lexicon="en-US,fr-CA"> Robert </token>
4119	         </rule>

4121	         <!-- the equivalent single-language attachment expansion -->
4122	         <rule id="people2">
4123	               <one-of>
4124	                     <item xml:lang="en-US">Robert</item>
4125	                     <item xml:lang="fr-CA">Robert</item>
4126	               </one-of>
4127	         </rule>

4129	         </grammar>

4131	                           SRGS Grammar Example

4133	   Content-Type:text/uri-list
4134	   Content-Length:...

4136	   session:help@root-level.store
4137	   http://www.example.com/Directory-Name-List.grxml
4138	   http://www.example.com/Department-List.grxml
4139	   http://www.example.com/TAC-Contact-List.grxml
4140	   session:menu1@menu-level.store

4142	                         Grammar Reference Example

4144	   Content-Type:multipart/mixed; boundary="break"

4146	   --break
4147	   Content-Type:text/uri-list
4148	   Content-Length:...

4150	   http://www.example.com/Directory-Name-List.grxml
4151	   http://www.example.com/Department-List.grxml
4152	   http://www.example.com/TAC-Contact-List.grxml

4154	   --break
4155	   Content-Type:application/srgs+xml
4156	   Content-ID:<request1@form-level.store>
4157	   Content-Length:...

4159	   <?xml version="1.0"?>

4161	   <!-- the default grammar language is US English -->
4162	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4163	            xml:lang="en-US" version="1.0">

4165	   <!-- single language attachment to tokens -->
4166	         <rule id="yes">
4167	               <one-of>
4168	                     <item xml:lang="fr-CA">oui</item>
4169	                     <item xml:lang="en-US">yes</item>
4170	               </one-of>
4171	         </rule>

4173	   <!-- single language attachment to a rule expansion -->
4174	         <rule id="request">
4175	               may I speak to
4176	               <one-of xml:lang="fr-CA">
4177	                     <item>Michel Tremblay</item>
4178	                     <item>Andre Roy</item>
4179	               </one-of>

4181	         </rule>

4183	         <!-- multiple language attachment to a token -->
4184	         <rule id="people1">
4185	               <token lexicon="en-US,fr-CA"> Robert </token>
4186	         </rule>

4188	         <!-- the equivalent single-language attachment expansion -->
4189	         <rule id="people2">
4190	               <one-of>
4191	                     <item xml:lang="en-US">Robert</item>
4192	                     <item xml:lang="fr-CA">Robert</item>
4193	               </one-of>
4194	         </rule>

4196	         </grammar>
4197	   --break--

4199	                      Mixed Grammar Reference Example

4201	9.5.2.  Recognizer Result Data

4203	   Recognition results are returned to the client in the message body of
4204	   the RECOGNITION-COMPLETE event or the GET-RESULT response message as
4205	   described in Section 6.3).  Element and attribute descriptions for
4206	   the recognition portion of the NLSML format are provided in
4207	   Section 9.6 with a normative definition of the schema in
4208	   Section 16.1.

4210	   Content-Type:application/nlsml+xml
4211	   Content-Length:...

4213	   <?xml version="1.0"?>
4214	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
4215	           xmlns:ex="http://www.example.com/example"
4216	           grammar="http://www.example.com/theYesNoGrammar">
4217	       <interpretation>
4218	           <instance>
4219	                   <ex:response>yes</ex:response>
4220	           </instance>
4221	           <input>ok</input>
4222	       </interpretation>
4223	   </result>

4225	                              Result Example

4227	9.5.3.  Enrollment Result Data

4229	   Enrollment results are returned to the client in the message body of
4230	   the RECOGNITION-COMPLETE event as described in Section 6.3).  Element
4231	   and attribute descriptions for the enrollment portion of the NLSML
4232	   format are provided in Section 9.7 with a normative definition of the
4233	   schema in Section 16.2.

4235	9.5.4.  Recognizer Context Block

4237	   When a client changes servers while operating on the behalf of the
4238	   same incoming communication session, this header allows the client to
4239	   collect a block of opaque data from one server and provide it to
4240	   another server.  This capability is desirable if the client needs
4241	   different language support or because the server issued a redirect.
4242	   Here the first recognizer resource may have collected acoustic and
4243	   other data during its execution of recognition methods.  After a
4244	   server switch, communicating this data may allow the recognition
4245	   resource on the new server to provide better recognition.  This block
4246	   of data is implementation-specific and MUST be carried as Media Type
4247	   application/octets in the body of the message.

4249	   This block of data is communicated in the "SET-PARAMS" and
4250	   "GET-PARAMS" method/response messages.  In the "GET-PARAMS" method,
4251	   if an empty recognizer-context-block header is present, then the
4252	   recognizer SHOULD return its vendor-specific context block, if any,
4253	   in the message body as an entity of Media Type application/octets
4254	   with a specific Content-ID.  The Content-ID value MUST also be
4255	   specified in the recognizer-context-block header in the "GET-PARAMS"
4256	   response.  The "SET-PARAMS" request wishing to provide this vendor-
4257	   specific data MUST send it in the message body as a typed entity with
4258	   the same Content-ID that it received from the "GET-PARAMS".  The
4259	   Content-ID MUST also be sent in the recognizer-context-block header
4260	   of the "SET-PARAMS" message.

4262	   Each speech recognition implementation choosing to use this mechanism
4263	   to hand off recognizer context data among servers MUST distinguish
4264	   its implementation-specific block of data from other implementations
4265	   by choosing a Content-ID that is recognizable among the participating
4266	   servers and unlikely to collide with values chosen by another
4267	   implementation.

4269	9.6.  Recognizer Results

4271	   The recognizer portion of NLSML (see Section 6.3.1) represents
4272	   information automatically extracted from a user's utterances by a
4273	   semantic interpretation component, where "utterance" is to be taken
4274	   in the general sense of a meaningful user input in any modality
4275	   supported by the MRCPv2 implementation.

4277	9.6.1.  Markup Functions

4279	   MRCPv2 recognition resources employ the Natural Language Semantics
4280	   Markup Language to interpret natural language speech input and to
4281	   format the interpretation for consumption by an MRCPv2 client.

4283	   The elements of the markup fall into the following general functional
4284	   categories: Interpretation, Side Information, and Multi-Modal
4285	   Integration.

4287	9.6.1.1.  Interpretation

4289	   Elements and attributes represent the semantics of a user's
4290	   utterance, including the <result>, <interpretation>, and <instance>
4291	   elements.  The <result> element contains the full result of
4292	   processing one utterance.  It may contain multiple <interpretation>
4293	   elements if the interpretation of the utterance results in multiple
4294	   alternative meanings due to uncertainty in speech recognition or
4295	   natural language understanding.  There are at least two reasons for
4296	   providing multiple interpretations:
4297	   1.  the client application might have additional information, for
4298	       example, information from a database, that would allow it to
4299	       select a preferred interpretation from among the possible
4300	       interpretations returned from the semantic interpreter.
4301	   2.  a client-based dialog manager (e.g.  VXML) that was unable to
4302	       select between several competing interpretations could use this
4303	       information to go back to the user and find out what was
4304	       intended.  For example, it could issue a "SPEAK" request to a
4305	       synthesizer resource to emit "Did you say 'Boston' or 'Austin'?"

4307	9.6.1.2.  Side Information

4309	   These are elements and attributes representing additional information
4310	   about the interpretation, over and above the interpretation itself.
4311	   Side information includes:
4312	   1.  Whether an interpretation was achieved (the <nomatch> element)
4313	       and the system's confidence in an interpretation (the
4314	       "confidence" attribute of <interpretation>).
4315	   2.  Alternative interpretations (<interpretation>)
4316	   3.  Input formats and ASR information: the <input> element,
4317	       representing the input to the semantic interpreter.

4319	9.6.1.3.  Multi-Modal Integration

4321	   When more than one modality is available for input, the
4322	   interpretation of the inputs need to be coordinated.  The "mode"
4323	   attribute of <input> supports this by indicating whether the
4324	   utterance was input by speech, dtmf, pointing, etc.  The
4325	   "timestamp_start" and "timestamp_end" attributes of <interpretation>
4326	   also provide for temporal coordination by indicating when inputs
4327	   occurred.

4329	9.6.2.  Overview of Recognizer Result Elements and their Relationships

4331	   The recognizer elements in NLSML fall into two categories:
4332	   1.  description of the input that was processed.
4333	   2.  description of the meaning which was extracted from the input.
4334	   Next to each element are its attributes.  In addition, some elements
4335	   can contain multiple instances of other elements.  For example, a
4336	   <result> can contain multiple <interpretations>, each of which is
4337	   taken to be an alternative.  Similarly, <input> can contain multiple
4338	   child <input> elements which are taken to be cumulative.  To
4339	   illustrate the basic usage of these elements, as a simple example,
4340	   consider the utterance "ok" (interpreted as "yes").  The example
4341	   illustrates how that utterance and its interpretation would be
4342	   represented in the NL Semantics markup.

4344	   <?xml version="1.0"?>
4345	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
4346	           xmlns:ex="http://www.example.com/example"
4347	           grammar="http://www.example.com/theYesNoGrammar">
4348	     <interpretation>
4349	        <instance>
4350	           <ex:response>yes</ex:response>
4351	         </instance>
4352	       <input>ok</input>
4353	     </interpretation>
4354	   </result>

4356	   This example includes only the minimum required information.  There
4357	   is an overall <result> element which includes one interpretation and
4358	   an input element.  The interpretation contains the application-
4359	   specific element "<response>" which is the semantically interpreted
4360	   result.

4362	9.6.3.  Elements and Attributes

4364	9.6.3.1.  RESULT Root Element

4366	   The root element of the markup is <result>.  The <result> element
4367	   includes one or more <interpretation> elements.  Multiple
4368	   interpretations can result from ambiguities in the input or in the
4369	   semantic interpretation.  If the "grammar" attribute does not apply
4370	   to all of the interpretations in the result it can be overridden for
4371	   individual interpretations at the <interpretation> level.

4373	   Attributes:
4374	   1.  grammar: The grammar or recognition rule matched by this result.
4375	       The format of the grammar attribute will match the rule reference
4376	       semantics defined in the grammar specification.  Specifically,
4377	       the rule reference is in the external XML form for grammar rule
4378	       references.  The markup interpreter needs to know the grammar
4379	       rule that is matched by the utterance because multiple rules may
4380	       be simultaneously active.  The value is the grammar URI used by
4381	       the markup interpreter to specify the grammar.  The grammar can
4382	       be overridden by a grammar attribute in the <interpretation>
4383	       element if the input was ambiguous as to which grammar it
4384	       matched.  If all interpretation elements within the result
4385	       element contain carry their own grammar attributes, the attribute
4386	       can be dropped from the result element.

4388	   <?xml version="1.0"?>
4389	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
4390	           grammar="http://www.example.com/grammar">
4391	     <interpretation>
4392	      ....
4393	     </interpretation>
4394	   </result>

4396	9.6.3.2.  INTERPRETATION Element

4398	   An <interpretation> element contains a single semantic
4399	   interpretation.

4401	   Attributes:
4402	   1.  confidence: A float value from 0.0-1.0 indicating the semantic
4403	       analyzer's confidence in this interpretation.  A value of 1.0
4404	       indicates maximum confidence.  The values are implementation-
4405	       dependent, but are intended to align with the value
4406	       interpretation for the confidence MRCPv2 header defined in
4407	       Section 9.4.1.  This attribute is optional.
4408	   2.  grammar: The grammar or recognition rule matched by this
4409	       interpretation (if needed to override the grammar specification
4410	       at the <interpretation> level.)  This attribute is only needed
4411	       under <interpretation> if it is necessary to override a grammar
4412	       that was defined at the <result> level.)  Note that the grammar
4413	       attribute for the interpretation element is OPTIONAL if and only
4414	       if the grammar attribute is specified in the result element.

4416	   Interpretations MUST be sorted best-first by some measure of
4417	   "goodness".  The goodness measure is "confidence" if present,
4418	   otherwise, it is some implementation-specific indication of quality.

4420	   The grammar is expected to be specified most frequently at the
4421	   <result> level.  However, it can be overridden at the
4422	   <interpretation> level because it is possible that different
4423	   interpretations may match different grammar rules.

4425	   The <interpretation> element includes an optional <input> element
4426	   which contains the input being analyzed, and an <instance> element
4427	   containing the interpretation of the utterance.

4429	   <interpretation confidence="0.75"
4430	                   grammar="http://www.example.com/grammar">
4431	       ...
4432	   </interpretation>

4434	9.6.3.3.  INSTANCE Element

4436	   The <instance> element contains the interpretation of the utterance.
4437	   When the Semantic Interpretation for Speech Recognition format is
4438	   used, the <instance> element contains the XML serialization of the
4439	   ECMAScript result using the approach defined in that specification.
4440	   When there is semantic markup in the grammar that does not create
4441	   semantic objects, but instead only does a semantic translation of a
4442	   portion of the input, such as translating "coke" to "coca-cola", the
4443	   instance contains the whole input but with the translation applied.
4444	   The NLSML looks like the markup in Figure 2 below.  If there are no
4445	   semantic objects created, nor any semantic translation the instance
4446	   value is the same as the input value.

4448	   Attributes:
4449	   1.  confidence: Each element of the instance may have a confidence
4450	       attribute, defined in the NL semantics namespace.  The confidence
4451	       attribute contains an float value in the range from 0.0-1.0
4452	       reflecting the system's confidence in the analysis of that slot.
4453	       A value of 1.0 indicates maximum confidence.  The values are
4454	       implementation-dependent, but are intended to align with the
4455	       value interpretation for the confidence MRCPv2 header defined in
4456	       Section 9.4.1.  This attribute is optional.

4458	   <instance>
4459	     <nameAddress>
4460	         <street confidence="0.75">123 Maple Street</street>
4461	         <city>Mill Valley</city>
4462	         <state>CA</state>
4463	         <zip>90952</zip>
4464	     </nameAddress>
4465	   </instance>
4466	   <input>
4467	     My address is 123 Maple Street,
4468	     Mill Valley, California, 90952
4469	   </input>

4471	   <instance>
4472	       I would like to buy a coca-cola
4473	   </instance>
4474	   <input>
4475	     I would like to buy a coke
4476	   </input>

4478	                          Figure 2: NSLML Example

4480	9.6.3.4.  INPUT Element

4482	   The <input> element is the text representation of a user's input.  It
4483	   includes an optional "confidence" attribute which indicates the
4484	   recognizer's confidence in the recognition result (as opposed to the
4485	   confidence in the interpretation, which is indicated by the
4486	   "confidence" attribute of <interpretation>).  Optional "timestamp-
4487	   start" and "timestamp-end" attributes indicate the start and end
4488	   times of a spoken utterance, in ISO 8601 format.

4490	   Attributes:
4491	   1.  timestamp-start: The time at which the input began. (optional)
4492	   2.  timestamp-end: The time at which the input ended. (optional)
4493	   3.  mode: The modality of the input, for example, speech, dtmf, etc.
4494	       (optional)
4495	   4.  confidence: the confidence of the recognizer in the correctness
4496	       of the input in the range 0.0 to 1.0 (optional)
4497	   Note that it may not make sense for temporally overlapping inputs to
4498	   have the same mode; however, this constraint is not expected to be
4499	   enforced by implementations.

4501	   When there is no time zone designator, ISO 8601 time representations
4502	   default to local time.

4504	   There are three possible formats for the <input> element.

4506	   1.  The <input> element can contain simple text:
4507	   <input>onions</input>
4508	       A future possibility is for <input> to contain not only text but
4509	       additional markup that represents prosodic information that was
4510	       contained in the original utterance and extracted by the speech
4511	       recognizer.  This depends on the availability of ASR's that are
4512	       capable of producing prosodic information.  MRCPv2 clients MUST
4513	       be prepared to receive such markup and MAY make use of it.
4514	   2.  An <input> tag can also contain additional <input> tags.  Having
4515	       additional input elements allows the representation to support
4516	       future multi-modal inputs as well as finer-grained speech
4517	       information, such as timestamps for individual words and word-
4518	       level confidences.
4519	   <input>
4520	        <input mode="speech" confidence="0.5"
4521	            timestamp-start="2000-04-03T0:00:00"
4522	            timestamp-end="2000-04-03T0:00:00.2">fried</input>
4523	        <input mode="speech" confidence="1.0"
4524	            timestamp-start="2000-04-03T0:00:00.25"
4525	            timestamp-end="2000-04-03T0:00:00.6">onions</input>
4526	   </input>
4527	   3.  Finally, the <input> element can contain <nomatch> and <noinput>
4528	       elements, which describe situations in which the speech
4529	       recognizer received input that it was unable to process, or did
4530	       not receive any input at all, respectively.

4532	9.6.3.5.  NOMATCH Element

4534	   The <nomatch> element under <input> is used to indicate that the
4535	   semantic interpreter was unable to successfully match any input with
4536	   confidence above the threshold.  It can optionally contain the text
4537	   of the best of the (rejected) matches.

4539	   <interpretation>
4540	      <instance/>
4541	         <input confidence="0.1">
4542	            <nomatch/>
4543	         </input>
4544	   </interpretation>
4545	   <interpretation>
4546	      <instance/>
4547	      <input mode="speech" confidence="0.1">
4548	        <nomatch>I want to go to New York</nomatch>
4549	      </input>
4550	   </interpretation>

4552	9.6.3.6.  NOINPUT Element

4554	   <noinput> indicates that there was no input - a timeout occurred in
4555	   the speech recognizer due to silence.
4556	   <interpretation>
4557	      <instance/>
4558	      <input>
4559	         <noinput/>
4560	      </input>
4561	   </interpretation>

4563	   If there are multiple levels of inputs, the most natural place for
4564	   <nomatch> and <noinput> elements to appear is under the highest level
4565	   of <input> for <no input>, and under the appropriate level of
4566	   <interpretation> for <nomatch>.  So <noinput> means "no input at all"
4567	   and <nomatch> means "no match in speech modality" or "no match in
4568	   dtmf modality".  For example, to represent garbled speech combined
4569	   with dtmf "1 2 3 4", the markup would be:
4570	   <input>
4571	      <input mode="speech"><nomatch/></input>
4572	      <input mode="dtmf">1 2 3 4</input>
4573	   </input>

4575	   Note: while <noinput> could be represented as an attribute of input,
4576	   <nomatch> cannot, since it could potentially include PCDATA content
4577	   with the best match.  For parallelism, <noinput> is also an element.

4579	9.7.  Enrollment Results

4581	   All enrollment elements are contained within a single <enrollment-
4582	   result> element under <result>.  The elements are described below and
4583	   have the schema defined in Section 16.2.  The following elements are
4584	   defined:

4586	   1.  num-clashes
4587	   2.  num-good-repetitions
4588	   3.  num-repetitions-still-needed
4589	   4.  consistency-status
4590	   5.  clash-phrase-ids
4591	   6.  transcriptions
4592	   7.  confusable-phrases

4594	9.7.1.  NUM-CLASHES Element

4596	   The <num-clashes> element contains the number of clashes that this
4597	   pronunciation has with other pronunciations in an active enrollment
4598	   session.  The associated Clash-Threshold header determines the
4599	   sensitivity of the clash measurement.  Note that clash testing can be
4600	   turned off completely by setting the Clash-Threshold header value to
4601	   0.

4603	9.7.2.  NUM-GOOD-REPETITIONS Element

4605	   The <num-good-repetitions> element contains the number of consistent
4606	   pronunciations obtained so far in an active enrollment session.

4608	9.7.3.  NUM-REPETITIONS-STILL-NEEDED Element

4610	   The <num-repetitions-still-needed> element contains the number of
4611	   consistent pronunciations that must still be obtained before the new
4612	   phrase can be added to the enrollment grammar.  The number of
4613	   consistent pronunciations required is specified by the client in the
4614	   request header Num-Min-Consistent-Pronunciations.  The returned value
4615	   must be 0 before the client can successfully commit a phrase to the
4616	   grammar by ending the enrollment session.

4618	9.7.4.  CONSISTENCY-STATUS Element

4620	   The <consistency-status> element is used to indicate how consistent
4621	   the repetitions are when learning a new phrase.  It can have the
4622	   values of consistent, inconsistent, and undecided.

4624	9.7.5.  CLASH-PHRASE-IDS Element

4626	   The <clash-phrase-ids> element contains the phrase ids of clashing
4627	   pronunciation(s), if any.  This element is absent if there are no
4628	   clashes.

4630	9.7.6.  TRANSCRIPTIONS Element

4632	   The <transcriptions> element contains the transcriptions returned in
4633	   the last repetition of the phrase being enrolled.

4635	9.7.7.  CONFUSABLE-PHRASES Element

4637	   The <confusable-phrases> element contains a list of phrases from a
4638	   command grammar that are confusable with the phrase being added to
4639	   the personal grammar.  This element may be absent if there are no
4640	   confusable phrases.

4642	9.8.  DEFINE-GRAMMAR

4644	   The DEFINE-GRAMMAR method, from the client to the server, provides
4645	   one or more grammars and requests the server to access, fetch, and
4646	   compile the grammars as needed.  The DEFINE-GRAMMAR method
4647	   implementation MUST do a fetch of all external URIs that are part of
4648	   that operation.  If caching is implemented, this URI fetching MUST
4649	   conform to the cache control hints and parameter headers associated
4650	   with the method in deciding whether it should be fetched from cache
4651	   or from the external server.  If these hints/parameters are not
4652	   specified in the method, the values set for the session using SET-
4653	   PARAMS/GET-PARAMS apply.  If it was not set for the session their
4654	   default values apply.

4656	   If the server resource is in the recognition state, the DEFINE-
4657	   GRAMMAR request MUST respond with a failure status.

4659	   If the resource is in the idle state and is able to successfully
4660	   process the supplied grammars, the server MUST return a success code
4661	   status and the request-state MUST be COMPLETE.

4663	   If the recognizer resource could not define the grammar for some
4664	   reason, for example if the download failed, the grammar failed to
4665	   compile, or the grammar was in an unsupported form, the MRCPv2
4666	   response for the DEFINE-GRAMMAR method MUST contain a failure status
4667	   code of 407, and contain a completion-cause header describing the
4668	   failure reason.

4670	   C->S:MRCP/2.0 589 DEFINE-GRAMMAR 543257
4671	   Channel-Identifier:32AECB23433801@speechrecog
4672	   Content-Type:application/srgs+xml
4673	   Content-ID:<request1@form-level.store>
4674	   Content-Length:...

4676	   <?xml version="1.0"?>

4678	   <!-- the default grammar language is US English -->
4679	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4680	            xml:lang="en-US" version="1.0">

4682	   <!-- single language attachment to tokens -->
4683	   <rule id="yes">
4684	               <one-of>
4685	                     <item xml:lang="fr-CA">oui</item>
4686	                     <item xml:lang="en-US">yes</item>
4687	               </one-of>
4688	         </rule>

4690	   <!-- single language attachment to a rule expansion -->
4691	         <rule id="request">
4692	               may I speak to
4693	               <one-of xml:lang="fr-CA">
4694	                     <item>Michel Tremblay</item>
4695	                     <item>Andre Roy</item>

4697	               </one-of>
4698	         </rule>

4700	         </grammar>

4702	   S->C:MRCP/2.0 73 543257 200 COMPLETE
4703	   Channel-Identifier:32AECB23433801@speechrecog
4704	           Completion-Cause:000 success

4706	   C->S:MRCP/2.0 334 DEFINE-GRAMMAR 543258
4707	   Channel-Identifier:32AECB23433801@speechrecog
4708	   Content-Type:application/srgs+xml
4709	   Content-ID:<helpgrammar@root-level.store>
4710	   Content-Length:...

4712	   <?xml version="1.0"?>

4714	   <!-- the default grammar language is US English -->
4715	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4716	            xml:lang="en-US" version="1.0">

4718	         <rule id="request">
4719	               I need help
4720	         </rule>

4722	   S->C:MRCP/2.0 73 543258 200 COMPLETE
4723	   Channel-Identifier:32AECB23433801@speechrecog
4724	           Completion-Cause:000 success

4726	   C->S:MRCP/2.0 723 DEFINE-GRAMMAR 543259
4727	   Channel-Identifier:32AECB23433801@speechrecog
4728	   Content-Type:application/srgs+xml
4729	   Content-ID:<request2@field-level.store>
4730	   Content-Length:...

4732	   <?xml version="1.0" encoding="UTF-8"?>

4734	   <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
4735	                     "http://www.w3.org/TR/speech-grammar/grammar.dtd">

4737	   <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en"
4738	   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4739	          xsi:schemaLocation="http://www.w3.org/2001/06/grammar
4740	              http://www.w3.org/TR/speech-grammar/grammar.xsd"
4741	              version="1.0" mode="voice" root="basicCmd">

4743	   <meta name="author" content="Stephanie Williams"/>
4744	   <rule id="basicCmd" scope="public">
4745	     <example> please move the window </example>
4746	     <example> open a file </example>

4748	     <ruleref
4749	       uri="http://grammar.example.com/politeness.grxml#startPolite"/>

4751	     <ruleref uri="#command"/>
4752	     <ruleref
4753	       uri="http://grammar.example.com/politeness.grxml#endPolite"/>
4754	   </rule>

4756	   <rule id="command">
4757	     <ruleref uri="#action"/> <ruleref uri="#object"/>
4758	   </rule>

4760	   <rule id="action">
4761	      <one-of>
4762	         <item weight="10"> open   <tag>open</tag>   </item>
4763	         <item weight="2">  close  <tag>close</tag>  </item>
4764	         <item weight="1">  delete <tag>delete</tag> </item>
4765	         <item weight="1">  move   <tag>move</tag>   </item>
4766	      </one-of>
4767	   </rule>

4769	   <rule id="object">
4770	     <item repeat="0-1">
4771	       <one-of>
4772	         <item> the </item>
4773	         <item> a </item>
4774	       </one-of>
4775	     </item>

4777	     <one-of>
4778	         <item> window </item>
4779	         <item> file </item>
4780	         <item> menu </item>
4781	     </one-of>
4782	   </rule>

4784	   </grammar>

4786	   S->C:MRCP/2.0 69 543259 200 COMPLETE
4787	   Channel-Identifier:32AECB23433801@speechrecog
4788	           Completion-Cause:000 success

4790	   C->S:MRCP/2.0 155 RECOGNIZE 543260
4791	   Channel-Identifier:32AECB23433801@speechrecog
4792	           N-Best-List-Length:2
4793	   Content-Type:text/uri-list
4794	   Content-Length:...

4796	   session:request1@form-level.store
4797	   session:request2@field-level.store
4798	   session:helpgramar@root-level.store

4800	   S->C:MRCP/2.0 48 543260 200 IN-PROGRESS
4801	   Channel-Identifier:32AECB23433801@speechrecog

4803	   S->C:MRCP/2.0 48 START-OF-INPUT 543260 IN-PROGRESS
4804	   Channel-Identifier:32AECB23433801@speechrecog

4806	   S->C:MRCP/2.0 486 RECOGNITION-COMPLETE 543260 COMPLETE
4807	   Channel-Identifier:32AECB23433801@speechrecog
4808	   Completion-Cause:000 success
4809	   Waveform-URI:<http://web.media.com/session123/audio.wav>;
4810	                size=124535;duration=2340
4811	   Content-Type:applicationt/x-nlsml
4812	   Content-Length:...

4814	   <?xml version="1.0"?>
4815	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
4816	           xmlns:ex="http://www.example.com/example"
4817	           grammar="session:request1@form-level.store">
4818	           <interpretation>
4819	               <instance name="Person">
4820	               <ex:Person>
4821	                   <ex:Name> Andre Roy </ex:Name>
4822	               </ex:Person>
4823	            </instance>
4824	            <input>   may I speak to Andre Roy </input>
4825	       </interpretation>
4826	   </result>

4828	                          Define Grammar Example

4830	9.9.  RECOGNIZE

4832	   The RECOGNIZE method from the client to the server requests the
4833	   recognizer to start recognition and provides it with one or more
4834	   grammar references for grammars to match against the input media.
4835	   The RECOGNIZE method can carry headers to control the sensitivity,
4836	   confidence level and the level of detail in results provided by the
4837	   recognizer.  These header values override the current values set by a
4838	   previous "SET-PARAMS" method.

4840	   The RECOGNIZE method can request the recognizer resource to operate
4841	   in normal or hotword mode as specified by the Recognition-Mode
4842	   header.  The default value is "normal".  If the resource could not
4843	   start a recognition, the server MUST respond with a failure status
4844	   code of 407 and a completion-cause header in the response describing
4845	   the cause of failure.

4847	   The RECOGNIZE request uses the message body to specify the grammars
4848	   applicable to the request.  The active grammar(s) for the request can
4849	   be specified in one of 3 ways.  If the client needs to explicitly
4850	   control grammar weights for the recognition operation, it must employ
4851	   method 3 below.  The order of these grammars specifies the precedence
4852	   of the grammars which is used when more than one grammar in the list
4853	   matches the speech; in this case, the grammar with the higher
4854	   precedence is returned as a match.  This precedence capability is
4855	   useful in applications like VoiceXML browsers to order grammars
4856	   specified at the dialog, document and root level of a VoiceXML
4857	   application.
4858	   1.  The grammar may be placed directly in the message body as typed
4859	       content.  If more than one grammar is included in the body, the
4860	       order of inclusion controls the corresponding precedence for the
4861	       grammars during recognition, with earlier grammars in the body
4862	       having a higher precedence than later ones.
4863	   2.  The body may contain a list of grammar URIs specified in content
4864	       of Media Type text/uri-list RFC2483 [RFC2483].  The order of the
4865	       URIs determines the corresponding precedence for the grammars
4866	       during recognition, with highest-precedence first and decreasing
4867	       for each URI thereafter.
4868	   3.  The body may contain a list of grammar URIs specified in content
4869	       of Media Type text/grammar-ref-list.  This type defines a list of
4870	       grammar URIs and allows each grammar URI to be assigned a weight
4871	       in the list.  This weight has the same meaning as the W3C grammar
4872	       weights.
4873	   In addition to performing recognition on the input, the recognizer
4874	   may also enroll the collected utterance in a personal grammar if the
4875	   Enroll-Utterance header is set to true and an Enrollment is active
4876	   (via an earlier execution of the START-PHRASE-ENROLLMENT method).  If
4877	   so, and if the RECOGNIZE request contains a Content-ID header, then
4878	   the resulting grammar (which includes the personal grammar as a sub-
4879	   grammar) can be referenced through the "session:" URI scheme (see
4880	   Section 13.6).

4882	   If the resource was able to successfully start the recognition, the
4883	   server MUST return a success code and a request-state of IN-PROGRESS.
4884	   This means that the recognizer is active and that the client MUST be
4885	   prepared to receive further events with this request-id.

4887	   If the resource was able to queue the request the server MUST return
4888	   a success code and request-state of PENDING.  This means that the
4889	   recognizer is currently active with another request and that this
4890	   request has been queued for processing.

4892	   If the resource could not start a recognition, the server MUST
4893	   respond with a failure status code of 407 and a completion-cause
4894	   header in the response describing the cause of failure.

4896	   For the recognizer resource, RECOGNIZE is the only request that
4897	   returns a request-state of IN-PROGRESS, meaning that recognition is
4898	   in progress.  When the recognition completes by matching one of the
4899	   grammar alternatives or by a time-out without a match or for some
4900	   other reason, the recognizer resource MUST send the client a
4901	   RECOGNITION-COMPLETE event with the result of the recognition and a
4902	   request-state of COMPLETE.

4904	   Large grammars can take a long time for the server to compile.  For
4905	   grammars which are used repeatedly, the client can improve server
4906	   performance by issuing a DEFINE-GRAMMAR request with the grammar
4907	   ahead of time.  In such a case the client can issue the RECOGNIZE
4908	   request and reference the grammar through the "session:" URI scheme
4909	   (see Section 13.6).  This also applies in general if the client wants
4910	   to repeat recognition with a previous inline grammar.

4912	   The RECOGNIZE method implementation MUST do a fetch of all external
4913	   URIs that are part of that operation.  If caching is implemented,
4914	   this URI fetching MUST conform to the cache control hints and
4915	   parameter headers associated with the method in deciding whether it
4916	   should be fetched from cache or from the external server.  If these
4917	   hints/parameters are not specified in the method, the values set for
4918	   the session using SET-PARAMS/GET-PARAMS apply.  If it was not set for
4919	   the session their default values apply.

4921	   Note that since the audio and the messages are carried over separate
4922	   communication paths there may be a race condition between the start
4923	   of the flow of audio and the receipt of the RECOGNIZE method.  For
4924	   example, if an audio flow is started by the client at the same time
4925	   as the RECOGNIZE method is sent, either the audio or the RECOGNIZE
4926	   can arrive at the recognizer first.  As another example, the client
4927	   may choose to continuously send audio to the Server and signal the
4928	   Server to recognize using the RECOGNIZE method.  A number of
4929	   mechanisms exist to resolve this condition and the mechanism chosen
4930	   is left to the implementer of the recognition resource.  The
4931	   recognizer can expect the media to start flowing when it receives the
4932	   recognize request, but MUST NOT buffer anything it receives
4933	   beforehand.

4935	   When a RECOGNIZE method has been received the recognition is
4936	   initiated on the stream.  The No-Input-Timer MUST BE started at this
4937	   time if the Start-Input-Timers header is specified as "true".  If
4938	   this header is set to "false", the No-Input-Timer MUST be started
4939	   when it receives the START-INPUT-TIMERS method from the client.  The
4940	   Recognition-Timer MUST be started when the recognition resource
4941	   detects speech or a DTMF digit in the media stream.

4943	   Non-Hotword mode recognition:

4945	   When the recognition resource detects speech or a DTMF digit in the
4946	   media stream it MUST send the START-OF-INPUT event.  When enough
4947	   speech has been collected for the server to process, the recognizer
4948	   can try to match the collected speech with the active grammars.  If
4949	   the speech collected at this point fully matches with any of the
4950	   active grammars, the Speech-Complete-Timer is started.  If it matches
4951	   partially with one or more of the active grammars, with more speech
4952	   needed before a full match is achieved, then the Speech-Incomplete-
4953	   Timer is started.

4955	   1.  When the No-Input-Timer expires, the recognizer must complete
4956	   with a Completion-Cause code of "no-input-timeout".

4958	   2.  The recognizer MUST support detecting a no-match condition upon
4959	   detecting end of speech.  The recognizer MAY support detecting a no-
4960	   match condition before waiting for end-of-speech.  If this is
4961	   supported, this capability is enabled by setting the "Early-No-Match"
4962	   header to "true".  Upon detecting a no-match condition the RECOGNIZE
4963	   MUST return with "no-match".

4965	   3.  When the Speech-Incomplete-Timer expires the recognizer SHOULD
4966	   complete with a Completion-Cause code of "partial-match", unless the
4967	   recognizer cannot differentiate a partial-match in which case it MUST
4968	   return a Completion-Cause code of "no-match".  The recognizer MAY
4969	   return results for the partially matched grammar.

4971	   4.  When the Speech-Complete-Timer expires the recognizer MUST
4972	   complete with a Completion-Cause code of "success".

4974	   5.  When the Recognition-Timer expires one of the following MUST
4975	   happen:

4977	   5.1 If there was a partial-match the recognizer SHOULD complete with
4978	   a Completion-Cause code of "partial-match-maxtime", unless the
4979	   recognizer cannot differentiate a partial-match in which case it MUST
4980	   complete with a Completion-Cause code of "no-match-maxtime".  The
4981	   recognizer MAY return results for the partially matched grammar.

4983	   5.2 If there was a full-match the recognizer MUST complete with a
4984	   Completion-Cause code of "success-maxtime".

4986	   5.3 If there was a no match the recognizer MUST complete with a
4987	   Completion-Cause code of "no-match-maxtime".

4989	   For the Hotword mode recognition:

4991	   Note that for Hotword mode recognition the START-OF-INPUT event is
4992	   not generated when speech or a DTMF digit is detected.

4994	   1.  When the No-Input-Timer expires, the recognizer must complete
4995	   with a Completion-Cause code of "no-input-timeout".

4997	   2.  When there is match at anytime, the RECOGNIZE completes with a
4998	   Completion-Cause code of "success".

5000	   3.  When the Recognition-Timer expires and there is not a match, the
5001	   RECOGNIZE MUST complete with a Completion-Cause code of "hotword-
5002	   maxtime".

5004	   4.  When the Recognition-Timer expires and there is a match, the
5005	   RECOGNIZE MUST complete with a Completion-Cause code of "success-
5006	   maxtime".

5008	   5.  When the Recognition-Timer is running but the detected speech/
5009	   DTMF has not resulted in a match, the Recognition-Timer MUST be
5010	   stopped and reset.  It MUST then be restarted when speech/DTMF is
5011	   again detected.

5013	   C->S:MRCP/2.0 479 RECOGNIZE 543257
5014	   Channel-Identifier:32AECB23433801@speechrecog
5015	           Confidence-Threshold:0.9
5016	   Content-Type:application/srgs+xml
5017	   Content-ID:<request1@form-level.store>
5018	   Content-Length:...

5020	   <?xml version="1.0"?>

5022	   <!-- the default grammar language is US English -->
5023	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
5024	            xml:lang="en-US" version="1.0" root="request">

5026	   <!-- single language attachment to tokens -->
5027	       <rule id="yes">
5028	               <one-of>
5029	                     <item xml:lang="fr-CA">oui</item>
5030	                     <item xml:lang="en-US">yes</item>
5031	               </one-of>

5033	         </rule>

5035	   <!-- single language attachment to a rule expansion -->
5036	         <rule id="request">
5037	               may I speak to
5038	               <one-of xml:lang="fr-CA">
5039	                     <item>Michel Tremblay</item>
5040	                     <item>Andre Roy</item>
5041	               </one-of>
5042	         </rule>

5044	     </grammar>

5046	   S->C: MRCP/2.0 48 543257 200 IN-PROGRESS
5047	   Channel-Identifier:32AECB23433801@speechrecog

5049	   S->C:MRCP/2.0 49 START-OF-INPUT 543257 IN-PROGRESS
5050	   Channel-Identifier:32AECB23433801@speechrecog

5052	   S->C:MRCP/2.0 467 RECOGNITION-COMPLETE 543257 COMPLETE
5053	   Channel-Identifier:32AECB23433801@speechrecog
5054	   Completion-Cause:000 success
5055	   Waveform-URI:<http://web.media.com/session123/audio.wav>;
5056	                 size=424252;duration=2543
5057	   Content-Type:application/nlsml+xml
5058	   Content-Length:...

5060	   <?xml version="1.0"?>
5061	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
5062	           xmlns:ex="http://www.example.com/example"
5063	           grammar="session:request1@form-level.store">
5064	       <interpretation>
5065	           <instance name="Person">
5066	               <ex:Person>
5067	                   <ex:Name> Andre Roy </ex:Name>
5068	               </ex:Person>
5069	           </instance>
5070	               <input>   may I speak to Andre Roy </input>
5071	       </interpretation>
5072	   </result>

5074	                             RECOGNIZE Example

5076	   C->S:   MRCP/2.0 479 RECOGNIZE 543257
5077	           Channel-Identifier:32AECB23433801@speechrecog
5078	           Confidence-Threshold:0.9
5079	           Fetch-Timeout:20
5080	           Content-Type:application/srgs+xml
5081	           Content-Length:...

5083	           <?xml version="1.0"? Version="1.0" mode="voice"
5084	                 root="Basic md">
5085	            <rule id="rule_list" scope="public">
5086	                <one-of>
5087	                    <item weight=10>
5088	                        <ruleref uri=
5089	               "http://grammar.example.com/world-cities.grxml#canada"/>
5090	                   </item>
5091	                   <item weight=1.5>
5092	                       <ruleref uri=
5093	               "http://grammar.example.com/world-cities.grxml#america"/>
5094	                   </item>
5095	                  <item weight=0.5>
5096	                       <ruleref uri=
5097	               "http://grammar.example.com/world-cities.grxml#india"/>
5098	                  </item>
5099	              </one-of>
5100	           </rule>

5102	                         Second RECOGNIZE Example

5104	9.10.  STOP

5106	   The "STOP" method from the client to the server tells the resource to
5107	   stop recognition if a request is active.  If a RECOGNIZE request is
5108	   active and the "STOP" request successfully terminated it, then the
5109	   response header contains an active-request-id-list header containing
5110	   the request-id of the RECOGNIZE request that was terminated.  In this
5111	   case, no RECOGNITION-COMPLETE event is sent for the terminated
5112	   request.  If there was no recognition active, then the response MUST
5113	   NOT contain an active-request-id-list header.  Either way the
5114	   response MUST contain a status of 200 (Success).

5116	   C->S:   MRCP/2.0 573 RECOGNIZE 543257
5117	           Channel-Identifier:32AECB23433801@speechrecog
5118	           Confidence-Threshold:0.9
5119	           Content-Type:application/srgs+xml
5120	           Content-ID:<request1@form-level.store>
5121	           Content-Length:...

5123	           <?xml version="1.0"?>

5125	           <!-- the default grammar language is US English -->
5126	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5127	                    xml:lang="en-US" version="1.0" root="request">

5129	           <!-- single language attachment to tokens -->
5130	               <rule id="yes">
5131	                   <one-of>
5132	                         <item xml:lang="fr-CA">oui</item>
5133	                         <item xml:lang="en-US">yes</item>
5134	                   </one-of>
5135	               </rule>

5137	           <!-- single language attachment to a rule expansion -->
5138	               <rule id="request">
5139	               may I speak to
5140	                   <one-of xml:lang="fr-CA">
5141	                         <item>Michel Tremblay</item>
5142	                         <item>Andre Roy</item>
5143	                   </one-of>
5144	               </rule>
5145	           </grammar>

5147	   S->C:   MRCP/2.0 47 543257 200 IN-PROGRESS
5148	           Channel-Identifier:32AECB23433801@speechrecog

5150	   C->S:   MRCP/2.0 28 STOP 543258 200
5151	           Channel-Identifier:32AECB23433801@speechrecog

5153	   S->C:   MRCP/2.0 67 543258 200 COMPLETE
5154	           Channel-Identifier:32AECB23433801@speechrecog
5155	           Active-Request-Id-List:543257

5157	9.11.  GET-RESULT

5159	   The GET-RESULT method from the client to the server may be issued
5160	   when the recognizer resource is in the recognized state.  This
5161	   request allows the client to retrieve results for a completed
5162	   recognition.  This is useful if the client decides it wants more
5163	   alternatives or more information.  When the server receives this
5164	   request it re-computes and returns the results according to the
5165	   recognition constraints provided in the GET-RESULT request.

5167	   The GET-RESULT request can specify constraints such as a different
5168	   confidence-threshold, or n-best-list-length.  This capability is
5169	   optional for MRCPv2 servers and the automatic speech recognition
5170	   engine in the server MAY return a status of unsupported feature.

5172	   C->S:   MRCP/2.0 73 GET-RESULT 543257
5173	           Channel-Identifier:32AECB23433801@speechrecog
5174	           Confidence-Threshold:0.9

5176	   S->C:   MRCP/2.0 487 543257 200 COMPLETE
5177	           Channel-Identifier:32AECB23433801@speechrecog
5178	           Content-Type:application/nlsml+xml
5179	           Content-Length:...

5181	           <?xml version="1.0"?>
5182	           <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
5183	                   xmlns:ex="http://www.example.com/example"
5184	                   grammar="session:request1@form-level.store">
5185	               <interpretation>
5186	                   <instance name="Person">
5187	                       <ex:Person>
5188	                           <ex:Name> Andre Roy </ex:Name>
5189	                       </ex:Person>
5190	                   </instance>
5191	                   <input>   may I speak to Andre Roy </input>
5192	               </interpretation>
5193	           </result>

5195	9.12.  START-OF-INPUT

5197	   This is an event from the server to the client indicating that the
5198	   recognition resource has detected speech or a DTMF digit in the media
5199	   stream.  This event is useful in implementing kill-on-barge-in
5200	   scenarios when a synthesizer resource is in a different session from
5201	   the recognizer resource and hence is not aware of an incoming audio
5202	   source.  In these cases, it is up to the client to act as a
5203	   intermediary and respond to this event by issuing a BARGE-IN-OCCURRED
5204	   event to the synthesizer resource.  The recognizer resource also MUST
5205	   send a Proxy-Sync-Id header with a unique value for this event.

5207	   This event MUST be generated by the server irrespective of whether
5208	   the synthesizer and recognizer are on the same server or not.

5210	9.13.  START-INPUT-TIMERS

5212	   This request is sent from the client to the recognition resource when
5213	   it knows that a kill-on-barge-in prompt has finished playing.  This
5214	   is useful in the scenario when the recognition and synthesizer
5215	   engines are not in the same session.  When a kill-on-barge-in prompt
5216	   is being played, the client may want a RECOGNIZE request to be
5217	   simultaneously active so that it can detect and implement kill on
5218	   barge-in.  But at the same time the client doesn't want the
5219	   recognizer to start the no-input timers until the prompt is finished.
5220	   The Start-Input-Timers header in the RECOGNIZE request allows the
5221	   client to say whether the timers should be started immediately or
5222	   not.  If not, the recognizer resource MUST NOT start the timers until
5223	   the client sends a START-INPUT-TIMERS method to the recognizer.

5225	9.14.  RECOGNITION-COMPLETE

5227	   This is an Event from the recognizer resource to the client
5228	   indicating that the recognition completed.  The recognition result is
5229	   sent in the body of the MRCPv2 message.  The request-state field MUST
5230	   be COMPLETE indicating that this is the last event with that
5231	   request-id, and that the request with that request-id is now
5232	   complete.  The server MUST maintain the recognizer context containing
5233	   the results and the audio waveform input of that recognition until
5234	   the next RECOGNIZE request is issued for that resource or the session
5235	   terminates.  A URI to the audio waveform MAY be returned to the
5236	   client in a Waveform-URI header in the RECOGNITION-COMPLETE event.
5237	   The client can use this URI to retrieve or playback the audio.

5239	   Note if an enrollment session was active, the RECOGNITION-COMPLETE
5240	   event can contain either recognition or enrollment results depending
5241	   on what was spoken.

5243	   C->S:   MRCP/2.0 487 RECOGNIZE 543257
5244	           Channel-Identifier:32AECB23433801@speechrecog
5245	           Confidence-Threshold:0.9
5246	           Content-Type:application/srgs+xml
5247	           Content-ID:<request1@form-level.store>
5248	           Content-Length:...

5250	           <?xml version="1.0"?>

5252	           <!-- the default grammar language is US English -->
5253	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5254	                    xml:lang="en-US" version="1.0" root="request">

5256	           <!-- single language attachment to tokens -->
5257	               <rule id="yes">
5258	                      <one-of>
5259	                          <item xml:lang="fr-CA">oui</item>
5260	                          <item xml:lang="en-US">yes</item>
5261	                      </one-of>
5262	                 </rule>

5264	           <!-- single language attachment to a rule expansion -->
5265	                 <rule id="request">
5266	                     may I speak to
5267	                      <one-of xml:lang="fr-CA">
5268	                             <item>Michel Tremblay</item>
5269	                             <item>Andre Roy</item>
5270	                      </one-of>
5271	                 </rule>
5272	           </grammar>

5274	   S->C:   MRCP/2.0 48 543257 200 IN-PROGRESS
5275	           Channel-Identifier:32AECB23433801@speechrecog

5277	   S->C:   MRCP/2.0 49 START-OF-INPUT 543257 IN-PROGRESS
5278	           Channel-Identifier:32AECB23433801@speechrecog

5280	   S->C:   MRCP/2.0 465 RECOGNITION-COMPLETE 543257 COMPLETE
5281	           Channel-Identifier:32AECB23433801@speechrecog
5282	           Completion-Cause:000 success
5283	           Waveform-URI:<http://web.media.com/session123/audio.wav>;
5284	                        size=342456;duration=25435
5285	           Content-Type:application/nlsml+xml
5286	           Content-Length:...

5288	           <?xml version="1.0"?>
5289	           <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
5290	                   xmlns:ex="http://www.example.com/example"
5291	                   grammar="session:request1@form-level.store">
5292	               <interpretation>
5293	                   <instance name="Person">
5294	                       <ex:Person>
5295	                           <ex:Name> Andre Roy </ex:Name>
5296	                       </ex:Person>
5297	                   </instance>
5298	                   <input>   may I speak to Andre Roy </input>
5299	               </interpretation>
5300	           </result>

5302	   S->C:   MRCP/2.0 465 RECOGNITION-COMPLETE 543257 COMPLETE
5303	           Channel-Identifier:32AECB23433801@speechrecog
5304	           Completion-Cause:000 success
5305	           Content-Type:application/nlsml+xml
5306	           Content-Length:...

5308	           <?xml version= "1.0"?>
5309	           <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
5310	                   grammar="Personal-Grammar-URI">
5311	               <enrollment-result>
5312	                   <num-clashes> 2 </num-clashes>
5313	                   <num-good-repetitions> 1 </num-good-repetitions>
5314	                   <num-repetitions-still-needed>
5315	                      1
5316	                   </num-repetitions-still-needed>
5317	                   <consistency-status> consistent </consistency-status>
5318	                   <clash-phrase-ids>
5319	                       <item> Jeff </item> <item> Andre </item>
5320	                   </clash-phrase-ids>
5321	                   <transcriptions>
5322	                        <item> m ay b r ow k er </item>
5323	                        <item> m ax r aa k ah </item>
5324	                   </transcriptions>
5325	                   <confusable-phrases>
5326	                        <item>
5327	                             <phrase> call </phrase>
5328	                             <confusion-level> 10 </confusion-level>
5329	                        </item>
5330	                   </confusable-phrases>
5331	               </enrollment-result>
5332	           </result>

5334	9.15.  START-PHRASE-ENROLLMENT

5336	   The START-PHRASE-ENROLLMENT method from the client to the server
5337	   starts a new phrase enrollment session during which the client may
5338	   call RECOGNIZE multiple times to enroll a new utterance in a grammar.
5339	   An enrollment session consists of a set of calls to RECOGNIZE in
5340	   which the caller speaks a phrase several times so the system can
5341	   "learn" it.  The phrase is then added to a personal grammar (speaker-
5342	   trained grammar), so that the system can recognize it later.

5344	   Only one phrase enrollment session may be active at a time for a
5345	   resource.  The Personal-Grammar-URI identifies the grammar that is
5346	   used during enrollment to store the personal list of phrases.  Once
5347	   RECOGNIZE is called, the result is returned in a RECOGNITION-COMPLETE
5348	   event and may contain either an enrollment result OR a recognition
5349	   result for a regular recognition.

5351	   Calling END-PHRASE-ENROLLMENT ends the ongoing phrase enrollment
5352	   session, which is typically done after a sequence of successful calls
5353	   to RECOGNIZE.  This method can be called to commit the new phrase to
5354	   the personal grammar or to abort the phrase enrollment session.

5356	   The Personal-Grammar-URI, which specifies the grammar to contain the
5357	   new enrolled phrase, is created if it does not exist.  Also, the
5358	   personal grammar may ONLY contain phrases added via a phrase
5359	   enrollment session.

5361	   The Phrase-ID passed to this method is used to identify this phrase
5362	   in the grammar and will be returned as the speech input when doing a
5363	   RECOGNIZE on the grammar.  The Phrase-NL similarly is returned in a
5364	   RECOGNITION-COMPLETE event in the same manner as other NL in a
5365	   grammar.  The tag-format of this NL is implementation specific.

5367	   If the client has specified Save-Best-Waveform as true, then the
5368	   response after ending the phrase enrollment session MUST contain the
5369	   location/URI of a recording of the best repetition of the learned
5370	   phrase.

5372	   C->S:     MRCP/2.0 123 START-PHRASE-ENROLLMENT 543258
5373	           Channel-Identifier:32AECB23433801@speechrecog
5374	           Num-Min-Consistent-Pronunciations:2
5375	           Consistency-Threshold:30
5376	           Clash-Threshold:12
5377	           Personal-Grammar-URI:<personal grammar uri>
5378	           Phrase-Id:<phrase id>
5379	           Phrase-NL:<NL phrase>
5380	           Weight:1
5381	           Save-Best-Waveform:true

5383	   S->C:    MRCP/2.0 49 543258 200 COMPLETE
5384	           Channel-Identifier:32AECB23433801@speechrecog

5386	9.16.  ENROLLMENT-ROLLBACK

5388	   The ENROLLMENT-ROLLBACK method discards the last live utterance from
5389	   the RECOGNIZE operation.  The client can invoke this method when the
5390	   caller provides undesirable input such as non-speech noises, side-
5391	   speech, commands, utterance from the RECOGNIZE grammar, etc.  Note
5392	   that this method does not provide a stack of rollback states.
5393	   Executing ENROLLMENT-ROLLBACK twice in succession without an
5394	   intervening recognition operation has no effect the second time.

5396	   C->S:   MRCP/2.0 49 ENROLLMENT-ROLLBACK 543261
5397	           Channel-Identifier:32AECB23433801@speechrecog

5399	   S->C:    MRCP/2.0 49 543261 200 COMPLETE
5400	           Channel-Identifier:32AECB23433801@speechrecog

5402	9.17.  END-PHRASE-ENROLLMENT

5404	   The END-PHRASE-ENROLLMENT method may be called ONLY during an active
5405	   phrase enrollment session.  It MUST NOT be called during an ongoing
5406	   RECOGNIZE operation.  To commit the new phrase in the grammar, the
5407	   client MAY call this method once successive calls to RECOGNIZE have
5408	   succeeded and Num-Repetitions-Still-Needed has been returned as 0 in
5409	   the RECOGNITION-COMPLETE event.  Alternatively, the client can abort
5410	   the phrase enrollment session by calling this method with the Abort-
5411	   Phrase-Enrollment header.

5413	   If the client has specified Save-Best-Waveform as true in the START-
5414	   PHRASE-ENROLLMENT request, then the response MUST contain the
5415	   location/URI of a recording of the best repetition of the learned
5416	   phrase.

5418	  C->S:     MRCP/2.0 49 END-PHRASE-ENROLLMENT 543262
5419	          Channel-Identifier:32AECB23433801@speechrecog

5421	  S->C:    MRCP/2.0 123 543262 200 COMPLETE
5422	          Channel-Identifier:32AECB23433801@speechrecog
5423	          Waveform-URI:<http://mediaserver.com/recordings/file1324.wav>;
5424	                       size=242453;duration=25432

5426	9.18.  MODIFY-PHRASE

5428	   The MODIFY-PHRASE method sent from the client to the server is used
5429	   to change the phrase ID, NL phrase and/or weight for a given phrase
5430	   in a personal grammar.

5432	   If no fields are supplied then calling this method has no effect.

5434	   C->S:     MRCP/2.0 123 MODIFY-PHRASE 543265
5435	           Channel-Identifier:32AECB23433801@speechrecog
5436	           Personal-Grammar-URI:<personal grammar uri>
5437	           Phrase-Id:<phrase id>
5438	           New-Phrase-Id:<new phrase id>
5439	           Phrase-NL:<NL phrase>
5440	           Weight:1

5442	   S->C:    MRCP/2.0 49 543265 200 COMPLETE
5443	           Channel-Identifier:32AECB23433801@speechrecog

5445	9.19.  DELETE-PHRASE

5447	   The DELETE-PHRASE method sent from the client to the server is used
5448	   to delete a phase in a personal grammar added through voice
5449	   enrollment or text enrollment.  If the specified phrase does not
5450	   exist, this method has no effect.

5452	   C->S:     MRCP/2.0 123 DELETE-PHRASE 543266
5453	           Channel-Identifier:32AECB23433801@speechrecog
5454	           Personal-Grammar-URI:<personal grammar uri>
5455	           Phrase-Id:<phrase id>

5457	   S->C:    MRCP/2.0 49 543266 200 COMPLETE
5458	           Channel-Identifier:32AECB23433801@speechrecog

5460	9.20.  INTERPRET

5462	   The INTERPRET method from the client to the server takes as input an
5463	   interpret-text header containing the text for which the semantic
5464	   interpretation is desired, and returns, via the INTERPRETATION-
5465	   COMPLETE event, an interpretation result which is very similar to the
5466	   one returned from a RECOGNIZE method invocation.  Only portions of
5467	   the result relevant to acoustic matching are excluded from the
5468	   result.  The interpret-text header MUST be included in the INTERPRET
5469	   request.

5471	   Recognizer grammar data is treated in the same way as it is when
5472	   issuing a RECOGNIZE method call.

5474	   If a RECOGNIZE, RECORD or another INTERPRET operation is already in
5475	   progress for the resource, the server MUST reject the request a
5476	   response having a status code of 402, "Method not valid in this
5477	   state", and a COMPLETE request state.

5479	   C->S:    MRCP/2.0 123 INTERPRET 543266
5480	           Channel-Identifier:32AECB23433801@speechrecog
5481	           Interpret-Text:may I speak to Andre Roy
5482	           Content-Type:application/srgs+xml
5483	           Content-ID:<request1@form-level.store>
5484	           Content-Length:...

5486	           <?xml version="1.0"?>
5487	           <!-- the default grammar language is US English -->
5488	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5489	                    xml:lang="en-US" version="1.0" root="request">
5490	           <!-- single language attachment to tokens -->
5491	               <rule id="yes">
5492	                   <one-of>
5493	                       <item xml:lang="fr-CA">oui</item>
5494	                       <item xml:lang="en-US">yes</item>
5495	                   </one-of>
5496	               </rule>

5498	           <!-- single language attachment to a rule expansion -->
5499	               <rule id="request">
5500	                   may I speak to
5501	                   <one-of xml:lang="fr-CA">
5502	                       <item>Michel Tremblay</item>
5503	                       <item>Andre Roy</item>
5504	                   </one-of>
5505	               </rule>
5506	           </grammar>

5508	   S->C:    MRCP/2.0 49 543266 200 IN-PROGRESS
5509	           Channel-Identifier:32AECB23433801@speechrecog

5511	   S->C:    MRCP/2.0 49 543267 200 COMPLETE
5512	           Channel-Identifier:32AECB23433801@speechrecog
5513	           Completion-Cause:000 success
5514	           Content-Type:application/nlsml+xml
5515	           Content-Length:...

5517	           <?xml version="1.0"?>
5518	           <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
5519	                   xmlns:ex="http://www.example.com/example"
5520	                   grammar="session:request1@form-level.store">
5521	               <interpretation>
5522	                   <instance name="Person">
5523	                       <ex:Person>
5524	                           <ex:Name> Andre Roy </ex:Name>
5525	                       </ex:Person>
5526	                   </instance>
5527	                   <input>   may I speak to Andre Roy </input>
5528	               </interpretation>
5529	           </result>

5531	9.21.  INTERPRETATION-COMPLETE

5533	   This event from the recognition resource to the client indicates that
5534	   the INTERPRET operation is complete.  The interpretation result is
5535	   sent in the body of the MRCP message.  The request state MUST be set
5536	   to COMPLETE.

5538	   The completion-cause header MUST be included in this event and MUST
5539	   be set to an appropriate value from the list of cause codes.

5541	   C->S:    MRCP/2.0 123 INTERPRET 543266
5542	           Channel-Identifier:32AECB23433801@speechrecog
5543	           Interpret-Text:may I speak to Andre Roy
5544	           Content-Type:application/srgs+xml
5545	           Content-ID:<request1@form-level.store>
5546	           Content-Length:...

5548	           <?xml version="1.0"?>
5549	           <!-- the default grammar language is US English -->
5550	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5551	                    xml:lang="en-US" version="1.0" root="request">
5552	           <!-- single language attachment to tokens -->
5553	               <rule id="yes">
5554	                   <one-of>
5555	                       <item xml:lang="fr-CA">oui</item>
5556	                       <item xml:lang="en-US">yes</item>
5557	                   </one-of>
5558	               </rule>

5560	           <!-- single language attachment to a rule expansion -->
5561	               <rule id="request">
5562	                   may I speak to
5563	                   <one-of xml:lang="fr-CA">
5564	                       <item>Michel Tremblay</item>
5565	                       <item>Andre Roy</item>
5566	                   </one-of>
5567	               </rule>
5568	           </grammar>

5570	   S->C:    MRCP/2.0 49 543266 200 IN-PROGRESS
5571	           Channel-Identifier:32AECB23433801@speechrecog

5573	   S->C:    MRCP/2.0 49 543267 200 COMPLETE
5574	           Channel-Identifier:32AECB23433801@speechrecog
5575	           Completion-Cause:000 success
5576	           Content-Type:application/nlsml+xml
5577	           Content-Length:...

5579	           <?xml version="1.0"?>
5580	           <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
5581	                   xmlns:ex="http://www.example.com/example"
5582	                   grammar="session:request1@form-level.store">
5583	               <interpretation>
5584	                   <instance name="Person">
5585	                       <ex:Person>
5586	                           <ex:Name> Andre Roy </ex:Name>

5588	                       </ex:Person>
5589	                   </instance>
5590	                   <input>   may I speak to Andre Roy </input>
5591	               </interpretation>
5592	           </result>

5594	9.22.  DTMF Detection

5596	   Digits received as DTMF tones are delivered to the recognition
5597	   resource in the MRCPv2 server in the RTP stream according to RFC4733
5598	   [RFC4733].  The automatic speech recognizer (ASR) MUST support
5599	   RFC4733 to recognize digits and it MAY support recognizing DTMF tones
5600	   [Q.23] in the audio.

5602	10.  Recorder Resource

5604	   This resource captures received audio and video and stores it as
5605	   content pointed to by a URI.  The main usages of recorders are
5606	   1.  to capture speech audio that may be submitted for recognition at
5607	       a later time, and
5608	   2.  recording voice or video mails.
5609	   Both these applications require functionality above and beyond those
5610	   specified by protocols such as RTSP.  This includes Audio endpointing
5611	   (i.e., detecting speech or silence).  The support for video is
5612	   optional and is mainly capturing video mails that may require the
5613	   speech or audio processing mentioned above.

5615	   A recorder MUST provide some endpointing capabilities for suppressing
5616	   silence at the beginning and end of a recording, and MAY also
5617	   suppress silence in the middle of a recording.  If such suppression
5618	   is done, the recorder MUST maintain timing metadata to indicate the
5619	   actual time stamps of the recorded media.

5621	10.1.  Recorder State Machine

5623	   Idle                   Recording
5624	   State                  State
5625	    |                       |
5626	    |---------RECORD------->|
5627	    |                       |
5628	    |<------STOP------------|
5629	    |                       |
5630	    |<--RECORD-COMPLETE-----|
5631	    |                       |
5632	    |              |--------|
5633	    |       START-OF-INPUT  |
5634	    |              |------->|
5635	    |                       |
5636	    |              |--------|
5637	    |    START-INPUT-TIMERS |
5638	    |              |------->|
5639	    |                       |

5641	                          Recorder State Machine

5643	10.2.  Recorder Methods

5645	   The recorder resource supports the following methods.

5647	   recorder-Method      =  "RECORD"
5648	                        /  "STOP"
5649	                        /  "START-INPUT-TIMERS"

5651	10.3.  Recorder Events

5653	   The recorder resource may generate the following events.

5655	   recorder-Event       =  "START-OF-INPUT"
5656	                        /  "RECORD-COMPLETE"

5658	10.4.  Recorder Header Fields

5660	   Method invocations for the recorder resource may contain resource-
5661	   specific headers containing request options and information to
5662	   augment the Method, Response or Event message it is associated with.

5664	   recorder-header      =  sensitivity-level
5665	                        /  no-input-timeout
5666	                        /  completion-cause
5667	                        /  completion-reason
5668	                        /  failed-uri
5669	                        /  failed-uri-cause
5670	                        /  record-uri
5671	                        /  media-type
5672	                        /  max-time
5673	                        /  trim-length
5674	                        /  final-silence
5675	                        /  capture-on-speech
5676	                        /  ver-buffer-utterance
5677	                        /  start-input-timers
5678	                        /  new-audio-channel

5680	10.4.1.  Sensitivity Level

5682	   To filter out background noise and not mistake it for speech, the
5683	   recorder may support a variable level of sound sensitivity.  The
5684	   sensitivity-level header is a float value between 0.0 and 1.0 and
5685	   allows the client to set the sensitivity level for the recorder.
5686	   This header MAY occur in RECORD, "SET-PARAMS" or "GET-PARAMS".  A
5687	   higher value for this header means higher sensitivity.  The default
5688	   value for this header is implementation specific.

5690	   sensitivity-level    =     "Sensitivity-Level" ":" FLOAT CRLF

5692	10.4.2.  No Input Timeout

5694	   When recording is started and there is no speech detected for a
5695	   certain period of time, the recorder can send a RECORD-COMPLETE event
5696	   to the client and terminate the record operation.  The no-input-
5697	   timeout header can set this timeout value.  The value is in
5698	   milliseconds.  This header MAY occur in RECORD, "SET-PARAMS" or
5699	   "GET-PARAMS".  The value for this header ranges from 0 to an
5700	   implementation specific maximum value.  The default value for this
5701	   header is implementation specific.

5703	   no-input-timeout    =     "No-Input-Timeout" ":" 1*19DIGIT CRLF

5705	10.4.3.  Completion Cause

5707	   This header MUST be part of a RECORD-COMPLETE event from the recorder
5708	   resource to the client.  This indicates the reason behind the RECORD
5709	   method completion.  This header MUST be sent in the RECORD responses
5710	   if they return with a failure status and a COMPLETE state.

5712	   completion-cause         =  "Completion-Cause" ":" 3DIGIT SP
5713	                               1*VCHAR CRLF

5715	   +------------+-----------------------+------------------------------+
5716	   | Cause-Code | Cause-Name            | Description                  |
5717	   +------------+-----------------------+------------------------------+
5718	   | 000        | success-silence       | RECORD completed with a      |
5719	   |            |                       | silence at the end           |
5720	   | 001        | success-maxtime       | RECORD completed after       |
5721	   |            |                       | reaching maximum recording   |
5722	   |            |                       | time specified in record     |
5723	   |            |                       | method.                      |
5724	   | 002        | noinput-timeout       | RECORD failed due to no      |
5725	   |            |                       | input                        |
5726	   | 003        | uri-failure           | Failure accessing the record |
5727	   |            |                       | URI.                         |
5728	   | 004        | error                 | RECORD request terminated    |
5729	   |            |                       | prematurely due to a         |
5730	   |            |                       | recorder error.              |
5731	   +------------+-----------------------+------------------------------+

5733	10.4.4.  Completion Reason

5735	   This header MAY be present in a RECORD-COMPLETE event coming from the
5736	   recorder resource to the client.  It contains the reason text behind
5737	   the RECORD request completion.  This header communicates text
5738	   describing the reason for the failure.

5740	   The completion reason text is provided for client use in logs and for
5741	   debugging and instrumentation purposes.  Clients MUST NOT interpret
5742	   the completion reason text.

5744	   completion-reason        =  "Completion-Reason" ":"
5745	                               quoted-string CRLF

5747	10.4.5.  Failed URI

5749	   When a recorder method needs to post the audio to a URI and access to
5750	   the URI fails, the server MUST provide the failed URI in this header
5751	   in the method response.

5753	   failed-uri               =  "Failed-URI" ":" Uri CRLF

5755	10.4.6.  Failed URI Cause

5757	   When a recorder method needs to post the audio to a URI and access to
5758	   the URI fails, the server MUST provide the URI specific or protocol
5759	   specific response code through this header in the method response.

5761	   The value encoding is UTF-8 to accommodate any access protocol, some
5762	   of which might have a response string instead of a numeric response
5763	   code.

5765	   failed-uri-cause         =  "Failed-URI-Cause" ":" 1*UTFCHAR
5766	                               CRLF

5768	10.4.7.  Record URI

5770	   When a recorder method contains this header the server must capture
5771	   the audio and store it.  If the header is present but specified with
5772	   no value, the server MUST store the content locally and generate a
5773	   URI that points to it.  This URI is then returned in either the
5774	   "STOP" response or the RECORD-COMPLETE event.  If the header in the
5775	   RECORD method specifies a URI, the server MUST attempt to capture and
5776	   store the audio at that location.  If this header is not specified in
5777	   the RECORD request, the server MUST capture the audio and send it in
5778	   the "STOP" response or the RECORD-COMPLETE event as a message body.
5779	   In this case, the response carrying the audio content would have this
5780	   header with a cid value pointing to the Content-ID in the message
5781	   body.

5783	   The server MUST also return the size in bytes and the duration in
5784	   milliseconds of the recorded audio wave-form as parameters associated
5785	   with the header.

5787	   record-uri               =  "Record-URI" ":" ["<" Uri ">"
5788	                               ";" "size" "=" 1*19DIGIT
5789	                               ";" "duration" "=" 1*19DIGIT] CRLF

5791	10.4.8.  Media Type

5793	   A RECORD method MUST contain this header, which specifies to the
5794	   server the Media Type of the captured audio or video.

5796	   media-type               =  "Media-Type" ":" media-type-value
5797	                               CRLF

5799	10.4.9.  Max Time

5801	   When recording is started this specifies the maximum length of the
5802	   recording in milliseconds, calculated from the time the actual
5803	   capture and store begins and is not necessarily the time the RECORD
5804	   method is received.  It specifies the duration before silence
5805	   suppression, if any, has been applied by the recorder resource.
5806	   After this time, the recording stops and the server MUST return a
5807	   RECORD-COMPLETE event to the client having a request-state of
5808	   "COMPLETE".  This header MAY occur in RECORD, "SET-PARAMS" or
5809	   "GET-PARAMS".  The value for this header ranges from 0 to an
5810	   implementation specific maximum value.  A value of zero means
5811	   infinity and hence the recording continues until one or more of the
5812	   other stop conditions are met.  The default value for this header is
5813	   0.

5815	   max-time                 =  "Max-Time" ":" 1*19DIGIT CRLF

5817	10.4.10.  Trim-Length

5819	   This header MAY be sent on a STOP method and specifies the length of
5820	   audio to be trimmed from the end of the recording after the stop.
5821	   The length is interpreted to be in milliseconds.  The default value
5822	   for this header is 0.

5824	   trim-length                 =  "Trim-Length" ":" 1*19DIGIT CRLF

5826	10.4.11.  Final Silence

5828	   When recorder is started and the actual capture begins, this header
5829	   specifies the length of silence in the audio that is to be
5830	   interpreted as the end of the recording.  This header MAY occur in
5831	   RECORD, "SET-PARAMS" or "GET-PARAMS".  The value for this header
5832	   ranges from 0 to an implementation specific maximum value and is
5833	   interpreted to be in milliseconds.  A value of zero means infinity
5834	   and hence the recording will continue until one of the other stop
5835	   conditions are met.  The default value for this header is
5836	   implementation specific.

5838	   final-silence            =  "Final-Silence" ":" 1*19DIGIT CRLF

5840	10.4.12.  Capture On Speech

5842	   If false, the recorder MUST start capturing immediately when started.
5843	   If true, the recorder MUST wait for the endpointing functionality to
5844	   detect speech before it starts capturing.  This header MAY occur in
5845	   the RECORD, "SET-PARAMS" or "GET-PARAMS".  The value for this header
5846	   is a Boolean.  The default value for this header is false.

5848	   capture-on-speech        =  "Capture-On-Speech " ":" BOOLEAN CRLF

5850	10.4.13.  Ver-Buffer-Utterance

5852	   This header is the same as the one described for the Verification
5853	   resource (see Section 11.4.14).  This tells the server to buffer the
5854	   utterance associated with this recording request into the
5855	   verification buffer.  Sending this header is permitted only if the
5856	   verification buffer is for the session.  This buffer is shared across
5857	   resources within a session.  It gets instantiated when a verification
5858	   resource is added to this session and is released when the
5859	   verification resource is released from the session.

5861	10.4.14.  Start Input Timers

5863	   This header MAY be sent as part of the RECORD request.  A value of
5864	   false tells the recorder resource to start the operation, but not to
5865	   start the no-input timer until the client sends a START-INPUT-TIMERS
5866	   request to the recorder resource.  This is useful in the scenario
5867	   when the recorder and synthesizer resources are not part of the same
5868	   session.  When a kill-on-barge-in prompt is being played, the client
5869	   may want the RECORD request to be simultaneously active so that it
5870	   can detect and implement kill-on-barge-in.  But at the same time the
5871	   client doesn't want the recorder resource to start the no-input
5872	   timers until the prompt is finished.  The default value is "true".

5874	   start-input-timers       =  "Start-Input-Timers" ":"
5875	                               BOOLEAN CRLF

5877	10.4.15.  New Audio Channel

5879	   This header is the same as the one described for the Recognizer
5880	   resource (see Section 9.4.23).

5882	10.5.  Recorder Message Body

5884	   If the RECORD request did not have a Record-Uri header, the "STOP"
5885	   response or the RECORD-COMPLETE event MUST contain a message body
5886	   carrying the captured audio.  In this case, the message carrying the
5887	   audio content has a Record-Uri header with a cid value pointing to
5888	   the message body entity that contains the recorded audio.

5890	10.6.  RECORD

5892	   The RECORD request places the recorder resource in the Recording
5893	   state.  Depending on the headers specified in the RECORD method, the
5894	   resource may start recording the audio immediately or wait for the
5895	   end pointing functionality to detect speech in the audio.  The audio
5896	   is then made available to the client either in the message body or as
5897	   specified by Record-URI.

5899	   The server MUST support the HTTPS URI scheme and MAY support other
5900	   schemes.  Note that due to the sensitive nature of voice recordings,
5901	   any protocols used for dereferencing SHOULD employ integrity and
5902	   confidentiality.

5904	   If a RECORD operation is already in progress, invoking this method
5905	   causes the server to issue a response having a status code of 402,
5906	   "Method not valid in this state", and a COMPLETE request state.

5908	   If the recording-uri is not valid, a status code of 404, "Illegal
5909	   Value for Header", is returned in the response.  If it is impossible
5910	   for the server to create the requested stored content, a status code
5911	   of 407, "Method or Operation Failed", is returned.

5913	   If the type specified in the Media-Type header is not supported, the
5914	   server MUST respond with a status code of 409, "Unsupported Header
5915	   Value", with the Media-Type header in its response.

5917	   When the recording operation is initiated, the response indicates an
5918	   IN-PROGRESS request state.  The server MAY generate a subsequent
5919	   START-OF-INPUT event when speech is detected.  Upon completion of the
5920	   recording operation, the server generates a RECORD-COMPLETE event.

5922	   C->S:  MRCP/2.0 386 RECORD 543257
5923	          Channel-Identifier:32AECB23433802@recorder
5924	          Record-URI:<file://mediaserver/recordings/myfile.wav>
5925	          Capture-On-Speech:true
5926	          Final-Silence:300
5927	          Max-Time:6000

5929	   S->C:  MRCP/2.0 48 456234 200 IN-PROGRESS
5930	          Channel-Identifier:32AECB23433802@recorder

5932	   S->C:  MRCP/2/0 49 START-OF-INPUT 456234 IN-PROGRESS
5933	          Channel-Identifier:32AECB23433802@recorder

5935	   S->C:  MRCP/2.0 54 RECORD-COMPLETE 456234 COMPLETE
5936	          Channel-Identifier:32AECB23433802@recorder
5937	          Completion-Cause:000 success-silence
5938	          Record-URI:<file://mediaserver/recordings/myfile.wav>;
5939	                     size=242552;duration=25645

5941	                              RECORD Example

5943	10.7.  STOP

5945	   The "STOP" method moves the recorder from the recording state back to
5946	   the idle state.  If a RECORD request is active and the "STOP" request
5947	   successfully terminated it, then the STOP response MUST contain an
5948	   active-request-id-list header containing the "RECORD" request-id that
5949	   was terminated.  In this case, no RECORD-COMPLETE event is sent for
5950	   the terminated request.  If there was no recording active, then the
5951	   response MUST NOT contain an active-request-id-list header.  If the
5952	   recording was a success the "STOP" response MUST contain a Record-URI
5953	   header pointing to the recorded audio content or to an typed entity
5954	   in the body of the "STOP" response containing the recorded audio.
5955	   The "STOP" method may have a Trim-Length header, in which case the
5956	   specified length of audio is trimmed from the end of the recording
5957	   after the stop.  In any case, the response MUST contain a status of
5958	   200 (Success).

5960	   C->S:  MRCP/2.0 386 RECORD 543257
5961	          Channel-Identifier:32AECB23433802@recorder
5962	          Record-URI:<file://mediaserver/recordings/myfile.wav>
5963	          Capture-On-Speech:true
5964	          Final-Silence:300
5965	          Max-Time:6000

5967	   S->C:  MRCP/2.0 48 456234 200 IN-PROGRESS
5968	          Channel-Identifier:32AECB23433802@recorder

5970	   S->C:  MRCP/2/0 49 START-OF-INPUT 456234 IN-PROGRESS
5971	          Channel-Identifier:32AECB23433802@recorder

5973	   C->S:  MRCP/2.0 386 STOP 543257
5974	          Channel-Identifier:32AECB23433802@recorder
5975	          Trim-Length:200

5977	   S->C:  MRCP/2.0 48 456234 200 COMPLETE
5978	          Channel-Identifier:32AECB23433802@recorder
5979	          Record-URI:<file://mediaserver/recordings/myfile.wav>;
5980	                     size=324253;duration=24561
5981	          Active-Request-Id-List:543257

5983	                               STOP Example

5985	10.8.  RECORD-COMPLETE

5987	   If the recording completes due to no-input, silence after speech, or
5988	   max-time, the server MUST generate the RECORD-COMPLETE event to the
5989	   client with a request-state of "COMPLETE".  If the recording was a
5990	   success the RECORD-COMPLETE event contains a Record-URI header
5991	   pointing to the recorded audio file on the server or to a typed
5992	   entity in the message body containing the recorded audio .

5994	   C->S:  MRCP/2.0 386 RECORD 543257
5995	          Channel-Identifier:32AECB23433802@recorder
5996	          Record-URI:<file://mediaserver/recordings/myfile.wav>
5997	          Capture-On-Speech:true
5998	          Final-Silence:300
5999	          Max-Time:6000

6001	   S->C:  MRCP/2.0 48 456234 200 IN-PROGRESS
6002	          Channel-Identifier:32AECB23433802@recorder

6004	   S->C:  MRCP/2/0 49 START-OF-INPUT 456234 IN-PROGRESS
6005	          Channel-Identifier:32AECB23433802@recorder

6007	   S->C:  MRCP/2.0 48 RECORD-COMPLETE 456234 COMPLETE
6008	          Channel-Identifier:32AECB23433802@recorder
6009	          Completion-Cause:000 success
6010	          Record-URI:<file://mediaserver/recordings/myfile.wav>;
6011	                     size=325325;duration=24652

6013	                          RECORD-COMPLETE Example

6015	10.9.  START-INPUT-TIMERS

6017	   This request is sent from the client to the recorder resource when it
6018	   discovers that a kill-on-barge-in prompt has finished playing.  This
6019	   is useful in the scenario when the recorder and synthesizer resources
6020	   are not in the same MRCPv2 session.  When a kill-on-barge-in prompt
6021	   is being played, the client wants the RECORD request to be
6022	   simultaneously active so that it can detect and implement kill on
6023	   barge-in.  But at the same time the client doesn't want the recorder
6024	   resource to start the no-input timers until the prompt is finished.
6025	   The Start-Input-Timers header in the RECORD request allows the client
6026	   to say if the timers should be started or not.  In the above case the
6027	   recorder resource does not start the timers until the client sends a
6028	   START-INPUT-TIMERS method to the recorder.

6030	10.10.  START-OF-INPUT

6032	   The START-OF-INPUT event is returned from the server to the client
6033	   once the server has detected speech.  This event is always returned
6034	   by the recording resource when speech has been detected.  The
6035	   recorder resource also MUST send a Proxy-Sync-Id header with a unique
6036	   value for this event.

6038	   S->C:  MRCP/2.0 49 START-OF-INPUT 543259 IN-PROGRESS
6039	          Channel-Identifier:32AECB23433801@recorder
6040	          Proxy-Sync-Id:987654321

6042	11.  Speaker Verification and Identification

6044	   This section describes the methods, responses and events employed by
6045	   MRCPv2 for doing Speaker Verification / Identification.

6047	   Speaker verification is a voice authentication methodology that can
6048	   be used to identify the speaker in order to grant the user access to
6049	   sensitive information and transactions.  Because speech is a
6050	   biometric, a number of essential security considerations related to
6051	   biometric authentication technologies apply to its implementation and
6052	   usage.  Implementers should carefully read Section 12 in this
6053	   document and the corresponding section of Speechsc Requirements
6054	   [RFC4313].

6056	   In speaker verification, a recorded utterance is compared to a
6057	   previously stored voiceprint which is in turn associated with a
6058	   claimed identity for that user.  Verification typically consists of
6059	   two phases: a designation phase to establish the claimed identity of
6060	   the caller and an execution phase in which a voiceprint is either
6061	   created (training) or used to authenticate the claimed identity
6062	   (verification).

6064	   Speaker identification is the process of associating an unknown
6065	   speaker with a member in a population.  It does not employ a claim of
6066	   identity.  When an individual claims to belong to a group (e.g., one
6067	   of the owners of a joint bank account) a group authentication is
6068	   performed.  This is generally implemented as a kind of verification
6069	   involving comparison with more than one voice model.  It is sometimes
6070	   called 'multi-verification.'  If the individual speaker can be
6071	   identified from the group, this may be useful for applications where
6072	   multiple users share the same access privileges to some data or
6073	   application.  Speaker identification and group authentication are
6074	   also done in two phases, a designation phase and an execution phase.
6075	   Note that from a functionality standpoint identification can be
6076	   thought of as a special case of group authentication (if the
6077	   individual is identified) where the group is the entire population,
6078	   although the implementation of speaker identification may be
6079	   different from the way group authentication is performed.  To
6080	   accommodate single-voiceprint verification, verification against
6081	   multiple voiceprints, group authentication, and identification, this
6082	   specification provides a single set of methods that can take a list
6083	   of identifiers, called "voiceprint identifiers", and return a list of
6084	   identifiers, with a score for each representing how well the input
6085	   speech matched each identifier.  The input and output lists of
6086	   identifiers do not have to match, allowing a vendor-specific group
6087	   identifier to be used as input to indicate that identification is to
6088	   be performed.  In this specification, the terms "Identification" and
6089	   "Multi-verification" are used to indicate that the input represents a
6090	   group (potentially the entire population) and that results for
6091	   multiple voiceprints may be returned.

6093	   It is possible for a speaker verification resource to share the same
6094	   session with a recognizer resource or to operate independently.  In
6095	   order to share the same session, the verification and recognizer
6096	   resources MUST be allocated from within the same SIP dialog.
6097	   Otherwise, an independent verification resource, running on the same
6098	   physical server or a separate one, will be set up.  Note that in
6099	   addition to allowing both resources to be allocated in the same
6100	   INVITE, it is possible to allocate one initially and the other later
6101	   via a re-INVITE.

6103	   Some of the speaker verification methods, described below, apply only
6104	   to a specific mode of operation.

6106	   The verification resource has a verification buffer associated with
6107	   it (see Section 11.4.14).  This allows the storage of speech
6108	   utterances for the purposes of verification, identification or
6109	   training from the buffered speech.  This buffer is owned by the
6110	   verification resource but other input resources such as the
6111	   recognition resource or recorder resource may write to it.  This
6112	   allows the speech received as part of a recognition or recording
6113	   operation to be later used for verification, identification or
6114	   training.  Access to the buffer is limited to one operation at time.
6115	   Hence when the resource is doing read, write or delete operation such
6116	   as a RECOGNIZE with ver-buffer-utterance turned on, another operation
6117	   involving the buffer fails with a status of 402.  The verification
6118	   buffer can be cleared by a CLEAR-BUFFER request from the client and
6119	   is freed when the verification resource is deallocated or the session
6120	   with the server terminates.

6122	   The verification buffer is different from collecting waveforms and
6123	   processing them using either the real time audio stream or stored
6124	   audio, because this buffering mechanism does not simply accumulate
6125	   speech to a buffer.  The verification buffer may contain additional
6126	   information gathered by the recognition resource that serves to
6127	   improve verification performance.

6129	11.1.  Speaker Verification State Machine

6131	   Speaker verification may operate in a training or a verification
6132	   session.  Starting one of these sessions does not change the state of
6133	   the verification resource, i.e. it remains idle.  Once a verification
6134	   or training session is started, then utterances are trained or
6135	   verified by calling the VERIFY or VERIFY-FROM-BUFFER method.  The
6136	   state of the verification resources goes from IDLE to VERIFYING state
6137	   each time VERIFY or VERIFY-FROM-BUFFER is called.

6139	     Idle              Session Opened       Verifying/Training
6140	     State             State                State
6141	      |                   |                         |
6142	      |--START-SESSION--->|                         |
6143	      |                   |                         |
6144	      |                   |----------|              |
6145	      |                   |     START-SESSION       |
6146	      |                   |<---------|              |
6147	      |                   |                         |
6148	      |<--END-SESSION-----|                         |
6149	      |                   |                         |
6150	      |                   |---------VERIFY--------->|
6151	      |                   |                         |
6152	      |                   |---VERIFY-FROM-BUFFER--->|
6153	      |                   |                         |
6154	      |                   |----------|              |
6155	      |                   |  VERIFY-ROLLBACK        |
6156	      |                   |<---------|              |
6157	      |                   |                         |
6158	      |                   |                |--------|
6159	      |                   | GET-INTERMEDIATE-RESULT |
6160	      |                   |                |------->|
6161	      |                   |                         |
6162	      |                   |                |--------|
6163	      |                   |     START-INPUT-TIMERS  |
6164	      |                   |                |------->|
6165	      |                   |                         |
6166	      |                   |                |--------|
6167	      |                   |         START-OF-INPUT  |
6168	      |                   |                |------->|
6169	      |                   |                         |
6170	      |                   |<-VERIFICATION-COMPLETE--|
6171	      |                   |                         |
6172	      |                   |<--------STOP------------|
6173	      |                   |                         |
6174	      |                   |----------|              |
6175	      |                   |         STOP            |
6176	      |                   |<---------|              |
6177	      |                   |                         |
6178	      |----------|        |                         |
6179	      |         STOP      |                         |
6180	      |<---------|        |                         |
6181	      |                   |----------|              |
6182	      |                   |    CLEAR-BUFFER         |
6183	      |                   |<---------|              |
6184	      |                   |                         |
6185	      |----------|        |                         |
6186	      |   CLEAR-BUFFER    |                         |
6187	      |<---------|        |                         |
6188	      |                   |                         |
6189	      |                   |----------|              |
6190	      |                   |   QUERY-VOICEPRINT      |
6191	      |                   |<---------|              |
6192	      |                   |                         |
6193	      |----------|        |                         |
6194	      | QUERY-VOICEPRINT  |                         |
6195	      |<---------|        |                         |
6196	      |                   |                         |
6197	      |                   |----------|              |
6198	      |                   |  DELETE-VOICEPRINT      |
6199	      |                   |<---------|              |
6200	      |                   |                         |
6201	      |----------|        |                         |
6202	      | DELETE-VOICEPRINT |                         |
6203	      |<---------|        |                         |

6205	                    Verification Resource State Machine

6207	11.2.  Speaker Verification Methods

6209	   The verification resource supports the following methods.

6211	   verification-method      =  "START-SESSION"
6212	                            / "END-SESSION"
6213	                            / "QUERY-VOICEPRINT"
6214	                            / "DELETE-VOICEPRINT"
6215	                            / "VERIFY"
6216	                            / "VERIFY-FROM-BUFFER"
6217	                            / "VERIFY-ROLLBACK"
6218	                            / "STOP"
6219	                            / "CLEAR-BUFFER"
6220	                            / "START-INPUT-TIMERS"
6221	                            / "GET-INTERMEDIATE-RESULT"

6223	   These methods allow the client to control the mode and target of
6224	   verification or identification operations within the context of a
6225	   session.  All the verification input operations that occur within a
6226	   session may be used to create, update, or validate against the
6227	   voiceprint specified during the session.  At the beginning of each
6228	   session the verification resource is reset to the state it had prior
6229	   to any previous verification session.

6231	   Verification/identification operations can be executed against live
6232	   or buffered audio.  The verification resource provides methods for
6233	   collecting and evaluating live audio data, and methods for
6234	   controlling the verification resource and adjusting its configured
6235	   behavior.

6237	   There are no dedicated methods for collecting buffered audio data.
6238	   This is accomplished by calling VERIFY, RECOGNIZE or RECORD as
6239	   appropriate for the resource, with the header ver-buffer-utterance.
6240	   Then, when the following method is called verification is performed
6241	   using the set of buffered audio.
6242	   1.  VERIFY-FROM-BUFFER

6244	   The following methods are used for verification of live audio
6245	   utterances :
6246	   1.  VERIFY
6247	   2.  START-INPUT-TIMERS

6249	   The following methods are used for configuring the verification
6250	   resource and for establishing resource states :
6251	   1.  START-SESSION
6252	   2.  END-SESSION
6253	   3.  QUERY-VOICEPRINT
6254	   4.  DELETE-VOICEPRINT
6255	   5.  VERIFY-ROLLBACK
6256	   6.  "STOP"
6257	   7.  CLEAR-BUFFER

6259	   The following method allows the polling a Verification in progress
6260	   for intermediate results.
6261	   1.  GET-INTERMEDIATE-RESULT

6263	11.3.  Verification Events

6265	   The verification resource generates the following events.

6267	   verification-event       =  "VERIFICATION-COMPLETE"
6268	                            /  "START-OF-INPUT"

6270	11.4.  Verification Header Fields

6272	   A verification resource message may contain headers containing
6273	   request options and information to augment the Request, Response or
6274	   Event message it is associated with.

6276	   verification-header      =  repository-uri
6277	                            /  voiceprint-identifier
6278	                            /  verification-mode
6279	                            /  adapt-model
6280	                            /  abort-model
6281	                            /  min-verification-score
6282	                            /  num-min-verification-phrases
6283	                            /  num-max-verification-phrases
6284	                            /  no-input-timeout
6285	                            /  save-waveform
6286	                            /  media-type
6287	                            /  waveform-uri
6288	                            /  voiceprint-exists
6289	                            /  ver-buffer-utterance
6290	                            /  input-waveform-uri
6291	                            /  completion-cause
6292	                            /  completion-reason
6293	                            /  speech-complete-timeout
6294	                            /  new-audio-channel
6295	                            /  abort-verification
6296	                            /  start-input-timers

6298	11.4.1.  Repository-URI

6300	   This header specifies the voiceprint repository to be used or
6301	   referenced during speaker verification or identification operations.
6302	   This header is required in the START-SESSION, QUERY-VOICEPRINT and
6303	   DELETE-VOICEPRINT methods.

6305	   repository-uri           =  "Repository-URI" ":" Uri CRLF

6307	11.4.2.  Voiceprint-Identifier

6309	   This header specifies the claimed identity for verification
6310	   applications.  The claimed identity may be used to specify an
6311	   existing voiceprint or to establish a new voiceprint.  This header is
6312	   required in the QUERY-VOICEPRINT and DELETE-VOICEPRINT methods.  The
6313	   Voiceprint-Identifier is required in the START-SESSION method for
6314	   verification operations.  For Identification or Multi-Verification
6315	   operations this header may contain a list of voiceprint identifiers
6316	   separated by semi-colons.  For identification operations the client
6317	   can also specify a voiceprint group identifier instead of a list of
6318	   voiceprint identifiers.

6320	   voiceprint-identifier        =  "Voiceprint-Identifier" ":"
6321	                                   1*VCHAR "." 1*VCHAR
6322	                                   *[";" 1*VCHAR "." 1*VCHAR] CRLF

6324	11.4.3.  Verification-Mode

6326	   This header specifies the mode of the verification resource and is
6327	   set by the START-SESSION method.  Acceptable values indicate whether
6328	   the verification session will train a voiceprint ("train") or verify/
6329	   identify using an existing voiceprint ("verify").

6331	   Training and verification sessions both require the voiceprint
6332	   Repository-URI to be specified in the START-SESSION.  In many usage
6333	   scenarios, however, the system does not know the speaker's claimed
6334	   identity until a recognition operation has, for example, recognized
6335	   an account number to which the user desires access.  In order to
6336	   allow the first few utterances of a dialog to be both recognized and
6337	   verified, the verification resource on the MRCPv2 server retains a
6338	   buffer.  In this buffer, the MRCPv2 server accumulates recognized
6339	   utterances.  The client can later execute a verification method and
6340	   apply the buffered utterances to the current verification session.

6342	   Some voice user interfaces may require additional user input that
6343	   should not be subject to verification.  For example, the user's input
6344	   may have been recognized with low confidence and thus require a
6345	   confirmation cycle.  In such cases, the client should not execute the
6346	   VERIFY or VERIFY-FROM-BUFFER methods to collect and analyze the
6347	   caller's input.  A separate recognizer resource can analyze the
6348	   caller's response without any participation by the verification
6349	   resource.

6351	   Once the following conditions have been met:
6352	   1.  Voiceprint identity has been successfully established through the
6353	       voiceprint identifier headers of the START-SESSION method, and
6354	   2.  the verification mode has been set to one of "train" or "verify",
6355	   the verification resource may begin providing verification
6356	   information during verification operations.  If the verification
6357	   resource does not reach one of the two major states ("train" or
6358	   "verify") , it MUST report an error condition in the MRCPv2 status
6359	   code to indicate why the verification resource is not ready for the
6360	   corresponding usage.

6362	   The value of verification-mode is persistent within a verification
6363	   session.  If the client attempts to change the mode during a
6364	   verification session, the verification resource reports an error and
6365	   the mode retains its current value.

6367	   verification-mode            =  "Verification-Mode" ":"
6368	                                   verification-mode-string

6370	   verification-mode-string     =  "train"
6371	                                /  "verify"

6373	11.4.4.  Adapt-Model

6375	   This header indicates the desired behavior of the verification
6376	   resource after a successful verification operation.  If the value of
6377	   this header is "true", the sever SHOULD use audio collected during
6378	   the verification session to update the voiceprint to account for
6379	   ongoing changes in a speaker's incoming speech characteristics,
6380	   unless local policy prohibits updating the voiceprint.  If the value
6381	   is "false" (the default), the server MUST NOT update the voiceprint.
6382	   This header MAY occur in the START-SESSION method.

6384	   adapt-model              = "Adapt-Model" ":" BOOLEAN CRLF

6386	11.4.5.  Abort-Model

6388	   The Abort-Model header indicates the desired behavior of the
6389	   verification resource upon session termination.  If the value of this
6390	   header is "true", the server MUST discard any pending changes to a
6391	   voiceprint due to verification training or verification adaptation.
6392	   If the value is "false" (the default), the server MUST commit any
6393	   pending changes for a training session or a successful verification
6394	   session to the voiceprint repository.  A value of "true" for Abort-
6395	   Model overrides a value of "true" for the Adapt-Model header.  This
6396	   header MAY occur in the END-SESSION method.

6398	   abort-model             = "Abort-Model" ":" BOOLEAN CRLF

6400	11.4.6.  Min-Verification-Score

6402	   The Min-Verification-Score header, when used with a verification
6403	   resource through a "SET-PARAMS", "GET-PARAMS" or START-SESSION
6404	   method, determines the minimum verification score for which a
6405	   verification decision of "accepted" may be declared by the server.
6406	   This is a float value between -1.0 and 1.0 that determines the
6407	   minimum verification score for which a verification decision of
6408	   "accepted" may be declared by the server.  The default value for this
6409	   header is implementation specific.

6411	   min-verification-score  = "Min-Verification-Score" ":"
6412	                             [ %x2D ] FLOAT CRLF

6414	11.4.7.  Num-Min-Verification-Phrases

6416	   The Num-Min-Verification-Phrases header is used to specify the
6417	   minimum number of valid utterances before a positive decision is
6418	   given for verification.  The value for this header is an integer and
6419	   the default value is 1.  The verification resource MUST NOT declare a
6420	   verification 'accepted' unless Num-Min-Verification-Phrases valid
6421	   utterances have been received.  The minimum value is 1.  This header
6422	   MAY occur in START-SESSION, "SET-PARAMS" or "GET-PARAMS".

6424	   num-min-verification-phrases =  "Num-Min-Verification-Phrases" ":"
6425	                                   1*19DIGIT CRLF

6427	11.4.8.  Num-Max-Verification-Phrases

6429	   The Num-Max-Verification-Phrases header is used to specify the number
6430	   of valid utterances required before a decision is forced for
6431	   verification.  The verification resource MUST NOT return a decision
6432	   of 'undecided' once Num-Max-Verification-Phrases have been collected
6433	   and used to determine a verification score.  The value for this
6434	   header is an integer and the minimum value is 1.  The default value
6435	   is implementation-specific.  This header MAY occur in START-SESSION,
6436	   "SET-PARAMS" or "GET-PARAMS".

6438	   num-max-verification-phrases =  "Num-Max-Verification-Phrases" ":"
6439	                                    1*19DIGIT CRLF

6441	11.4.9.  No-Input-Timeout

6443	   The No-Input-Timeout header sets the length of time from the start of
6444	   the verification timers (see START-INPUT-TIMERS) until the
6445	   declaration of a no-input event in the VERIFICATION-COMPLETE server
6446	   event message.  The value is in milliseconds.  This header MAY occur
6447	   in VERIFY, "SET-PARAMS" or "GET-PARAMS".  The value for this header
6448	   ranges from 0 to an implementation specific maximum value.  The
6449	   default value for this header is implementation specific.

6451	   no-input-timeout         = "No-Input-Timeout" ":" 1*19DIGIT CRLF

6453	11.4.10.  Save-Waveform

6455	   This header allows the client to request the verification resource to
6456	   save the audio stream that was used for verification/identification.
6457	   The verification resource MUST attempt to record the audio and make
6458	   it available to the client in the form of a URI returned in the
6459	   Waveform-URI header in the VERIFICATION-COMPLETE event.  If there was
6460	   an error in recording the stream or the audio content is otherwise
6461	   not available, the verification resource MUST return an empty
6462	   Waveform-URI header.  The default value for this header is "false".
6463	   This header MAY appear in the VERIFY method.  Note that this header
6464	   does not appear in the VERIFY-FROM-BUFFER method since it only
6465	   controls whether or not to save the waveform for live verification /
6466	   identification operations.

6468	   save-waveform            =  "Save-Waveform" ":" BOOLEAN CRLF

6470	11.4.11.  Media Type

6472	   This header MAY be specified in the SET-PARAMS, GET-PARAMS or the
6473	   VERIFY methods and tells the server resource the Media Type of the
6474	   captured audio or video such as the one captured and returned by the
6475	   Waveform-URI header.

6477	   media-type               =  "Media-Type" ":" media-type-value
6478	                               CRLF

6480	11.4.12.  Waveform-URI

6482	   If the Save-Waveform header is set to true, the verification resource
6483	   MUST attempt to record the incoming audio stream of the verification
6484	   into a file and provide a URI for the client to access it.  This
6485	   header MUST be present in the VERIFICATION-COMPLETE event if the
6486	   Save-Waveform header was set to true by the client.  The value of the
6487	   header MUST be empty if there was some error condition preventing the
6488	   server from recording.  Otherwise, the URI generated by the server
6489	   MUST be globally unique across the server and all its verification
6490	   sessions.  The content MUST be available via the URI until the
6491	   verification session ends.  Since the Save-Waveform header applies
6492	   only to live verification / identification operations, the server can
6493	   return the Waveform-URI only in the VERIFICATION-COMPLETE event for
6494	   live verification / identification operations.

6496	   The server MUST also return the size in bytes and the duration in
6497	   milliseconds of the recorded audio wave-form as parameters associated
6498	   with the header.

6500	   waveform-uri             =  "Waveform-URI" ":" ["<" Uri ">"
6501	                               ";" "size" "=" 1*19DIGIT
6502	                               ";" "duration" "=" 1*19DIGIT] CRLF

6504	11.4.13.  Voiceprint-Exists

6506	   This header MUST be returned in QUERY-VOICEPRINT and DELETE-
6507	   VOICEPRINT responses.  This is the status of the voiceprint specified
6508	   in the QUERY-VOICEPRINT method.  For the DELETE-VOICEPRINT method
6509	   this header indicates the status of the voiceprint at the moment the
6510	   method execution started.

6512	   voiceprint-exists    =  "Voiceprint-Exists" ":" BOOLEAN CRLF

6514	11.4.14.  Ver-Buffer-Utterance

6516	   This header is used to indicate that this utterance could be later
6517	   considered for Speaker Verification.  This way, a client can request
6518	   the server to buffer utterances while doing regular recognition or
6519	   verification activities and speaker verification can later be
6520	   requested on the buffered utterances.  This header is OPTIONAL in the
6521	   RECOGNIZE, VERIFY and RECORD methods.  The default value for this
6522	   header is "false".

6524	   ver-buffer-utterance     = "Ver-Buffer-Utterance" ":" BOOLEAN
6525	                              CRLF

6527	11.4.15.  Input-Waveform-Uri

6529	   This header specifies stored audio content that the client requests
6530	   the server to fetch and process according to the current verification
6531	   mode, either to train the voiceprint or verify a claimed identity.
6532	   This header enables the client to implement the buffering use case
6533	   where the recognizer and verification resources are in different
6534	   sessions and the verification buffer technique cannot be used.  It
6535	   MAY be specified on the VERIFY request.

6537	   input-waveform-uri           =  "Input-Waveform-URI" ":" Uri CRLF

6539	11.4.16.  Completion-Cause

6541	   This header MUST be part of a VERIFICATION-COMPLETE event from the
6542	   verification resource to the client.  This indicates the cause of
6543	   VERIFY or VERIFY-FROM-BUFFER method completion.  This header MUST be
6544	   sent in the VERIFY, VERIFY-FROM-BUFFER, and QUERY-VOICEPRINT
6545	   responses, if they return with a failure status and a COMPLETE state.

6547	   completion-cause         = "Completion-Cause" ":" 3DIGIT SP
6548	                              1*VCHAR CRLF

6550	   +------------+--------------------------+---------------------------+
6551	   | Cause-Code | Cause-Name               | Description               |
6552	   +------------+--------------------------+---------------------------+
6553	   | 000        | success                  | VERIFY or                 |
6554	   |            |                          | VERIFY-FROM-BUFFER        |
6555	   |            |                          | request completed         |
6556	   |            |                          | successfully.  The verify |
6557	   |            |                          | decision can be           |
6558	   |            |                          | "accepted", "rejected",   |
6559	   |            |                          | or "undecided".           |
6560	   | 001        | error                    | VERIFY or                 |
6561	   |            |                          | VERIFY-FROM-BUFFER        |
6562	   |            |                          | request terminated        |
6563	   |            |                          | prematurely due to a      |
6564	   |            |                          | verification resource or  |
6565	   |            |                          | system error.             |
6566	   | 002        | no-input-timeout         | VERIFY request completed  |
6567	   |            |                          | with no result due to a   |
6568	   |            |                          | no-input-timeout.         |
6569	   | 003        | too-much-speech-timeout  | VERIFY request completed  |
6570	   |            |                          | with no result due to too |
6571	   |            |                          | much speech.              |
6572	   | 004        | speech-too-early         | VERIFY request completed  |
6573	   |            |                          | with no result due to     |
6574	   |            |                          | spoke too soon.           |
6575	   | 005        | buffer-empty             | VERIFY-FROM-BUFFER        |
6576	   |            |                          | request completed with no |
6577	   |            |                          | result due to empty       |
6578	   |            |                          | buffer.                   |
6579	   | 006        | out-of-sequence          | Verification operation    |
6580	   |            |                          | failed due to             |
6581	   |            |                          | out-of-sequence method    |
6582	   |            |                          | invocations.  For example |
6583	   |            |                          | calling VERIFY before     |
6584	   |            |                          | QUERY-VOICEPRINT.         |
6585	   | 007        | repository-uri-failure   | Failure accessing         |
6586	   |            |                          | Repository URI.           |
6587	   | 008        | repository-uri-missing   | Repository-uri is not     |
6588	   |            |                          | specified.                |
6589	   | 009        | voiceprint-id-missing    | Voiceprint-identification |
6590	   |            |                          | is not specified.         |
6591	   | 010        | voiceprint-id-not-exist  | Voiceprint-identification |
6592	   |            |                          | does not exist in the     |
6593	   |            |                          | voiceprint repository.    |
6594	   | 011        | speech-not-usable        | VERIFY request completed  |
6595	   |            |                          | with no result because    |
6596	   |            |                          | the speech was not usable |
6597	   |            |                          | (too noisy, too short,    |
6598	   |            |                          | etc.)                     |
6599	   +------------+--------------------------+---------------------------+

6601	11.4.17.  Completion Reason

6603	   This header MAY be specified in a VERIFICATION-COMPLETE event coming
6604	   from the verifier resource to the client.  It contains the reason
6605	   text behind the VERIFY request completion.  This header communicates
6606	   text describing the reason for the failure.

6608	   The completion reason text is provided for client use in logs and for
6609	   debugging and instrumentation purposes.  Clients MUST NOT interpret
6610	   the completion reason text.

6612	   completion-reason        =  "Completion-Reason" ":"
6613	                               quoted-string CRLF

6615	11.4.18.  Speech Complete Timeout

6617	   This header is the same as the one described for the Recognizer
6618	   resource.  See Section 9.4.15.  This header MAY occur in VERIFY, SET-
6619	   PARAMS, or GET-PARAMS.

6621	11.4.19.  New Audio Channel

6623	   This header is the same as the one described for the Recognizer
6624	   resource.  See Section 9.4.23.  This header MAY be specified in a
6625	   VERIFY request.

6627	11.4.20.  Abort-Verification

6629	   This header MUST be sent in a "STOP" request to indicate whether or
6630	   not to abort a VERIFY method in progress.  A value of "true" requests
6631	   the server to discard the results.  A value of "false" requests the
6632	   server to return in the "STOP" response the verification results
6633	   obtained up to the point it received the "STOP" request.

6635	   Abort-verification   =  "Abort-Verification " ":" BOOLEAN CRLF

6637	11.4.21.  Start Input Timers

6639	   This header MAY be sent as part of a VERIFY request.  A value of
6640	   false tells the verification resource to start the VERIFY operation,
6641	   but not to start the no-input timer yet.  The verification resource
6642	   MUST NOT start the timers until the client sends a START-INPUT-TIMERS
6643	   request to the resource.  This is useful in the scenario when the
6644	   verifier and synthesizer resources are not part of the same session.
6645	   In this scenario, when a kill-on-barge-in prompt is being played, the
6646	   client may want the VERIFY request to be simultaneously active so
6647	   that it can detect and implement kill-on-barge-in.  But at the same
6648	   time the client doesn't want the verification resource to start the
6649	   no-input timers until the prompt is finished.  The default value is
6650	   "true".

6652	   start-input-timers       =  "Start-Input-Timers" ":"
6653	                               BOOLEAN CRLF

6655	11.5.  Verification Message Body

6657	   A verification response or event message may carry additional data as
6658	   described in the following subsection.

6660	11.5.1.  Verification Result Data

6662	   Verification results are returned to the client in the message body
6663	   of the VERIFICATION-COMPLETE event or the GET-INTERMEDIATE-RESULT
6664	   response message as described in Section 6.3).  Element and attribute
6665	   descriptions for the verification portion of the NLSML format are
6666	   provided in Section 11.5.2 with a normative definition of the schema
6667	   in Section 16.3.

6669	11.5.2.  Verification Result Elements

6671	   All verification elements are contained within a single
6672	   <verification-result> element under <result>.  The elements are
6673	   described below and have the schema defined in Section 16.2.  The
6674	   following elements are defined:

6676	   1.   Voiceprint
6677	   2.   Incremental
6678	   3.   Cumulative
6679	   4.   Decision
6680	   5.   Utterance-Length
6681	   6.   Device
6682	   7.   Gender
6683	   8.   Adapted
6684	   9.   Verification-Score
6685	   10.  Vendor-Specific-Results

6687	11.5.2.1.  Voiceprint

6689	   This element in the verification results provides information on how
6690	   the speech data matched a single voiceprint.  The result data
6691	   returned may have more than one such entity in the case of
6692	   Identification or Multi-Verification.  Each "<voiceprint>" element
6693	   and the XML data within the element describe verification result
6694	   information for how well the speech data matched that particular
6695	   voiceprint.  The list of voiceprint element data are ordered
6696	   according to their cumulative verification match scores, with the
6697	   highest score first.

6699	11.5.2.2.  Cumulative

6701	   Within each "<voiceprint>" element there MUST be a "<cumulative>"
6702	   element with the cumulative scores of how well multiple utterances
6703	   matched the voiceprint.

6705	11.5.2.3.  Incremental

6707	   The first "<voiceprint>" element MAY contain an "<incremental>"
6708	   element with the incremental scores of how well the last utterance
6709	   matched the voiceprint.

6711	11.5.2.4.  Decision

6713	   This element is found within the "<incremental>" or "<cumulative>"
6714	   element within the verification results.  Its value indicates the
6715	   verification decision.  It can have the values of "accepted",
6716	   "rejected" or "undecided".

6718	11.5.2.5.  Utterance-Length

6720	   This element MAY occur within either the "<incremental>" or
6721	   "<cumulative>" elements within the first "<voiceprint>" element.  Its
6722	   value indicates the size in milliseconds, respectively, of the last
6723	   utterance or the cumulated set of utterances.

6725	11.5.2.6.  Device

6727	   This element is found within the incremental or cumulative element
6728	   within the verification results.  Its value indicates the apparent
6729	   type of device used by the caller as determined by the verification
6730	   resource.  It can have the values of "cellular-phone", "electret-
6731	   phone", "carbon-button-phone", or "unknown".

6733	11.5.2.7.  Gender

6735	   This element is found within the incremental or cumulative element
6736	   within the verification results.  Its value indicates the apparent
6737	   gender of the speaker as determined by the verification resource.  It
6738	   can have the values of "male", "female" or "unknown".

6740	11.5.2.8.  Adapted

6742	   This element is found within the first "<voiceprint>" element within
6743	   the verification results.  When verification is trying to confirm the
6744	   voiceprint, this indicates if the voiceprint has been adapted as a
6745	   consequence of analyzing the source utterances.  It is not returned
6746	   during verification training.  The value can be "true" or "false".

6748	11.5.2.9.  Verification-Score

6750	   This element is found within the incremental or cumulative element
6751	   within the verification results.  Its value indicates the score of
6752	   the last utterance as determined by verification.

6754	   During verification, the higher the score the more likely it is that
6755	   the speaker is the same one as the one who spoke the voiceprint
6756	   utterances.  During training, the higher the score the more likely
6757	   the speaker is to have spoken all of the analyzed utterances.  The
6758	   value is a floating point between -1.0 and 1.0.  If there are no such
6759	   utterances the score is -1.  Note that the verification score is not
6760	   a probability value.

6762	11.5.2.10.  Vendor-Specific-Results

6764	   Verification results may contain implementation specific data which
6765	   augment the information provided by the MRCPv2-defined elements.
6766	   These may be useful to clients who have private knowledge of how to
6767	   interpret these schema extensions.  Implementation specific additions
6768	   to the verification results schema MUST belong to the vendor's own
6769	   namespace.  In the result structure, they must either be indicated by
6770	   a namespace prefix declared within the result or must be children of
6771	   an element identified as belonging to the respective namespace.

6773	   The following example shows the results of three voiceprints.  Note
6774	   that the first one has crossed the verification score threshold, and
6775	   the speaker has been accepted.  The voiceprint was also adapted with
6776	   the most recent utterance.

6778	   <?xml version="1.0"?>
6779	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
6780	           grammar="What-Grammar-URI">
6781	     <verification-result>
6782	       <voiceprint id="johnsmith">
6783	         <adapted> true </adapted>
6784	         <incremental>
6785	           <utterance-length> 500 </utterance-length>
6786	           <device> cellular-phone </device>
6787	           <gender> male </gender>
6788	           <decision> accepted </decision>
6789	           <verification-score> 0.98514 </verification-score>
6790	         </incremental>
6791	         <cumulative>
6792	           <utterance-length> 10000 </utterance-length>
6793	           <device> cellular-phone </device>
6794	           <gender> male </gender>
6795	           <decision> accepted </decision>
6796	           <verification-score> 0.96725</verification-score>
6797	         </cumulative>
6798	       </voiceprint>
6799	       <voiceprint id="marysmith">
6800	         <cumulative>
6801	           <verification-score> 0.93410 </verification-score>
6802	         </cumulative>
6803	       </voiceprint>
6804	       <voiceprint uri="juniorsmith">
6805	         <cumulative>
6806	           <verification-score> 0.74209 </verification-score>
6807	         </cumulative>
6808	       </voiceprint>
6809	     </verification-result>
6810	   </result>

6812	                      Verification Results Example 1

6814	   In this next example, the verifier has enough information to decide
6815	   to reject the speaker.

6817	   <?xml version="1.0"?>
6818	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
6819	           xmlns:xmpl="http://www.example.org/2003/12/mrcpv2"
6820	           grammar="What-Grammar-URI">
6821	     <verification-result>
6822	       <voiceprint id="johnsmith">
6823	         <incremental>
6824	           <utterance-length> 500 </utterance-length>
6825	           <device> cellular-phone </device>
6826	           <gender> male </gender>
6827	           <verification-score> 0.88514 </verification-score>
6828	           <xmpl:raspiness> high </xmpl:raspiness>
6829	           <xmpl:emotion> sadness </xmpl:emotion>
6830	         </incremental>
6831	         <cumulative>
6832	           <utterance-length> 10000 </utterance-length>
6833	           <device> cellular-phone </device>
6834	           <gender> male </gender>
6835	           <decision> rejected </decision>
6836	           <verification-score> 0.9345 </verification-score>
6837	         </cumulative>
6838	       </voiceprint>
6839	     </verification-result>
6840	   </result>

6842	                      Verification Results Example 2

6844	11.6.  START-SESSION

6846	   The START-SESSION method starts a Speaker Verification or
6847	   Identification session.  Execution of this method places the
6848	   verification resource into its initial state.  If this method is
6849	   called during an ongoing verification session, the previous session
6850	   is implicitly aborted.  If this method is invoked when VERIFY or
6851	   VERIFY-FROM-BUFFER is active, the method fails and the server returns
6852	   a status code of 402.

6854	   Upon completion of the START-SESSION method, the verification
6855	   resource MUST have terminated any ongoing verification session, and
6856	   cleared any voiceprint designation.

6858	   A verification session is associated with the voiceprint repository
6859	   to be used during the session.  This is specified through the
6860	   "Repository-URI" header (see Section 11.4.1).

6862	   The START-SESSION method also establishes, through the Voiceprint-
6863	   Identifier header, which voiceprints are to be matched or trained
6864	   during the verification session.  If this is an Identification
6865	   session or if the client wants to do Multi-Verification, the
6866	   Voiceprint-Identifier header contains a list of semi-colon separated
6867	   voiceprint identifiers.

6869	   The header "Adapt-Model" may also be present in the START-SESSION
6870	   request to indicate whether or not to adapt a voiceprint based on
6871	   data collected during the session (if the voiceprint verification
6872	   phase succeeds).  By default, the voiceprint model MUST NOT be
6873	   adapted with data from a verification session.

6875	   The START-SESSION also determines whether the session is for a train
6876	   or verify of a voiceprint.  Hence the Verification-Mode header MUST
6877	   be sent in every START-SESSION request.  The value of the
6878	   Verification-Mode header MUST be one of either "train" or "verify".

6880	   Before a verification/identification session is started, only VERIFY-
6881	   ROLLBACK and generic "SET-PARAMS" and "GET-PARAMS" operations may be
6882	   performed on the verification resource.  The server MUST return 402
6883	   "Method not valid in this state" for all other verification
6884	   operations.

6886	   A verification resource may only have a single session active at one
6887	   time.

6889	   C->S:  MRCP/2.0 123 START-SESSION 314161
6890	          Channel-Identifier:32AECB23433801@speakverify
6891	          Repository-URI:http://www.example.com/voiceprintdbase/
6892	          Voiceprint-Mode:verify
6893	          Voiceprint-Identifier:johnsmith.voiceprint
6894	          Adapt-Model:true

6896	   S->C:  MRCP/2.0 49 314161 200 COMPLETE
6897	          Channel-Identifier:32AECB23433801@speakverify

6899	11.7.  END-SESSION

6901	   The END-SESSION method terminates an ongoing verification session and
6902	   releases the verification voiceprint resources.  The session may
6903	   terminate in one of three ways:
6904	   a.  abort - the voiceprint adaptation or creation may be aborted so
6905	       that the voiceprint remains unchanged (or is not created).
6906	   b.  commit - when terminating a voiceprint training session, the new
6907	       voiceprint is committed to the repository.
6908	   c.  adapt - an existing voiceprint is modified using a successful
6909	       verification.

6911	   The header "Abort-Model" MAY be included in the END-SESSION to
6912	   control whether or not to abort any pending changes to the
6913	   voiceprint.  The default behavior is to commit (not abort) any
6914	   pending changes to the designated voiceprint.

6916	   The END-SESSION method may be safely executed multiple times without
6917	   first executing the START-SESSION method.  Any additional executions
6918	   of this method without an intervening use of the START-SESSION method
6919	   have no effect on the verification resource.

6921	   The following example assumes there is either a training session or a
6922	   verification session in progress.

6924	   C->S:  MRCP/2.0 123 END-SESSION 314174
6925	          Channel-Identifier:32AECB23433801@speakverify
6926	          Abort-Model:true

6928	   S->C:  MRCP/2.0 49 314174 200 COMPLETE
6929	          Channel-Identifier:32AECB23433801@speakverify

6931	11.8.  QUERY-VOICEPRINT

6933	   The QUERY-VOICEPRINT method is used to get status information on a
6934	   particular voiceprint and can be used by the client to ascertain if a
6935	   voiceprint or repository exists and if it contains trained
6936	   voiceprints.

6938	   The response to the QUERY-VOICEPRINT request contains an indication
6939	   of the status of the designated voiceprint in the "Voiceprint-Exists"
6940	   header, allowing the client to determine whether to use the current
6941	   voiceprint for verification, train a new voiceprint, or choose a
6942	   different voiceprint.

6944	   A voiceprint is completely specified by providing a repository
6945	   location and a voiceprint identifier.  The particular voiceprint or
6946	   identity within the repository is specified by a string identifier
6947	   that is unique within the repository.  The "Voiceprint-Identifier"
6948	   header carries this unique voiceprint identifier within a given
6949	   repository.

6951	   The following example assumes a verification session is in progress
6952	   and the voiceprint exists in the voiceprint repository.

6954	   C->S:  MRCP/2.0 123 QUERY-VOICEPRINT 314168
6955	          Channel-Identifier:32AECB23433801@speakverify
6956	          Repository-URI:http://www.example.com/voiceprints/
6957	          Voiceprint-Identifier:johnsmith.voiceprint

6959	   S->C:  MRCP/2.0 123 314168 200 COMPLETE
6960	          Channel-Identifier:32AECB23433801@speakverify
6961	          Repository-URI:http://www.example.com/voiceprints/
6962	          Voiceprint-Identifier:johnsmith.voiceprint
6963	          Voiceprint-Exists:true

6965	   The following example assumes that the URI provided in the
6966	   'Repository-URI' header is a bad URI.

6968	   C->S:  MRCP/2.0 123 QUERY-VOICEPRINT 314168
6969	          Channel-Identifier:32AECB23433801@speakverify
6970	          Repository-URI:http://www.example.com/bad-uri/
6971	          Voiceprint-Identifier:johnsmith.voiceprint

6973	   S->C:  MRCP/2.0 123 314168 405 COMPLETE
6974	          Channel-Identifier:32AECB23433801@speakverify
6975	          Repository-URI:http://www.example.com/bad-uri/
6976	          Voiceprint-Identifier:johnsmith.voiceprint
6977	          Completion-Cause:007 repository-uri-failure

6979	11.9.  DELETE-VOICEPRINT

6981	   The DELETE-VOICEPRINT method removes a voiceprint from a repository.
6982	   This method MUST carry the Repository-URI and Voiceprint-Identifier
6983	   header fields.

6985	   If the corresponding voiceprint does not exist, the DELETE-VOICEPRINT
6986	   method MUST return a 200 status code.

6988	   The following example demonstrates a DELETE-VOICEPRINT operation to
6989	   remove a specific voiceprint.

6991	   C->S:  MRCP/2.0 123 DELETE-VOICEPRINT 314168
6992	          Channel-Identifier:32AECB23433801@speakverify
6993	          Repository-URI:http://www.example.com/bad-uri/
6994	          Voiceprint-Identifier:johnsmith.voiceprint

6996	   S->C:  MRCP/2.0 49 314168 200 COMPLETE
6997	          Channel-Identifier:32AECB23433801@speakverify

6999	11.10.  VERIFY

7001	   The VERIFY method is used to request the verification resource to
7002	   either train/adapt the voiceprint or to verify/identify a claimed
7003	   identity.  If the voiceprint is new or was deleted by a previous
7004	   DELETE-VOICEPRINT method, the VERIFY method trains the voiceprint.
7005	   If the voiceprint already exits, it is adapted and not retrained by
7006	   the VERIFY command.

7008	   C->S:  MRCP/2.0 49 VERIFY 543260
7009	          Channel-Identifier:32AECB23433801@speakverify

7011	   S->C:  MRCP/2.0 49 543260 200 IN-PROGRESS
7012	          Channel-Identifier:32AECB23433801@speakverify

7014	   When the VERIFY request is completes, the MRCPv2 server MUST send a
7015	   'VERIFICATION-COMPLETE' event to the client.

7017	11.11.  VERIFY-FROM-BUFFER

7019	   The VERIFY-FROM-BUFFER method directs the verification resource to
7020	   verify buffered audio against a voiceprint.  Only one VERIFY or
7021	   VERIFY-FROM-BUFFER method may be active for a verification resource
7022	   at a time.

7024	   The buffered audio is not consumed by this method and thus VERIFY-
7025	   FROM-BUFFER may be invoked multiple times by the client to attempt
7026	   verification against different voiceprints.

7028	   For the VERIFY-FROM-BUFFER method, the server MAY optionally return
7029	   an "IN-PROGRESS" response before the "VERIFICATION-COMPLETE" event.

7031	   When the VERIFY-FROM-BUFFER method is invoked and the verification
7032	   buffer is in use by another resource sharing it, the server MUST
7033	   return an IN-PROGRESS response and wait until the buffer is available
7034	   to it.  The verification buffer is owned by the verification resource
7035	   but is shared with write access from other input resources on the
7036	   same session.  Hence, it is considered to be in use if there is a
7037	   read or write operation such as a RECORD or RECOGNIZE with the Ver-
7038	   Buffer-Utterance header set to "true" on a resource that shares this
7039	   buffer.  Note that if a RECORD or RECOGNIZE method returns with a
7040	   failure cause code, the VERIFY-FROM-BUFFER request waiting to process
7041	   that buffer MUST also fail with a Completion-Cause of 005 (buffer-
7042	   empty).

7044	   The following example illustrates the usage of some buffering
7045	   methods.  In this scenario the client first performed a live
7046	   verification, but the utterance had been rejected.  In the meantime,
7047	   the utterance is also saved to the audio buffer.  Then, another
7048	   voiceprint is used to do verification against the audio buffer and
7049	   the utterance is accepted.  For the example, we assume both Num-Min-
7050	   Verification-Phrases and Num-Max-Verification-Phrases are 1.

7052	   C->S:  MRCP/2.0 123 START-SESSION 314161
7053	          Channel-Identifier:32AECB23433801@speakverify
7054	          Verification-Mode:verify
7055	          Adapt-Model:true
7056	          Repository-URI:http://www.example.com/voiceprints
7057	          Voiceprint-Identifier:johnsmith.voiceprint

7059	   S->C:  MRCP/2.0 49 314161 200 COMPLETE
7060	          Channel-Identifier:32AECB23433801@speakverify

7062	   C->S:  MRCP/2.0 123 VERIFY 314162
7063	          Channel-Identifier:32AECB23433801@speakverify
7064	          Ver-buffer-utterance:true

7066	   S->C:  MRCP/2.0 49 314164 200 IN-PROGRESS
7067	          Channel-Identifier:32AECB23433801@speakverify

7069	   S->C:  MRCP/2.0 123 VERIFICATION-COMPLETE 314162 COMPLETE
7070	          Channel-Identifier:32AECB23433801@speakverify
7071	          Completion-Cause:000 success
7072	          Content-Type:application/nlsml+xml
7073	          Content-Length:...

7075	          <?xml version="1.0"?>
7076	          <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
7077	                  grammar="What-Grammar-URI">
7078	            <verification-result>
7079	              <voiceprint id="johnsmith">
7080	                <incremental>
7081	                  <utterance-length> 500 </utterance-length>
7082	                  <device> cellular-phone </device>
7083	                  <gender> female </gender>
7084	                  <decision> rejected </decision>
7085	                  <verification-score> 0.05465 </verification-score>
7086	                </incremental>
7087	                <cumulative>
7088	                  <utterance-length> 500 </utterance-length>
7089	                  <device> cellular-phone </device>
7090	                  <gender> female </gender>
7091	                  <decision> rejected </decision>
7092	                  <verification-score> 0.05465 </verification-score>
7093	                </cumulative>
7094	              </voiceprint>

7096	            </verification-result>
7097	          </result>

7099	   C->S:  MRCP/2.0 123 QUERY-VOICEPRINT 314163
7100	          Channel-Identifier:32AECB23433801@speakverify
7101	          Repository-URI:http://www.example.com/voiceprints/
7102	          Voiceprint-Identifier:johnsmith

7104	   S->C:  MRCP/2.0 123 314163 200 COMPLETE
7105	          Channel-Identifier:32AECB23433801@speakverify
7106	          Repository-URI:http://www.example.com/voiceprints/
7107	          Voiceprint-Identifier:johnsmith.voiceprint
7108	          Voiceprint-Exists:true

7110	   C->S:  MRCP/2.0 123 START-SESSION 314164
7111	          Channel-Identifier:32AECB23433801@speakverify
7112	          Verification-Mode:verify
7113	          Adapt-Model:true
7114	          Repository-URI:http://www.example.com/voiceprints
7115	          Voiceprint-Identifier:marysmith.voiceprint

7117	   S->C:  MRCP/2.0 49 314164 200 COMPLETE
7118	          Channel-Identifier:32AECB23433801@speakverify

7120	   C->S:  MRCP/2.0 123 VERIFY-FROM-BUFFER 314165
7121	          Channel-Identifier:32AECB23433801@speakverify

7123	   S->C:  MRCP/2.0 49 314165 200 IN-PROGRESS
7124	          Channel-Identifier:32AECB23433801@speakverify

7126	   S->C:  MRCP/2.0 123 VERIFICATION-COMPLETE 314165 COMPLETE
7127	          Channel-Identifier:32AECB23433801@speakverify
7128	          Completion-Cause:000 success
7129	          Content-Type:application/nlsml+xml
7130	          Content-Length:...

7132	          <?xml version="1.0"?>
7133	          <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
7134	                  grammar="What-Grammar-URI">
7135	            <verification-result>
7136	              <voiceprint id="marysmith">
7137	                <incremental>
7138	                  <utterance-length> 1000 </utterance-length>
7139	                  <device> cellular-phone </device>
7140	                  <gender> female </gender>
7141	                  <decision> accepted </decision>
7142	                  <verification-score> 0.98 </verification-score>
7143	                </incremental>
7144	                <cumulative>
7145	                  <utterance-length> 1000 </utterance-length>
7146	                  <device> cellular-phone </device>
7147	                  <gender> female </gender>
7148	                  <decision> accepted </decision>
7149	                  <verification-score> 0.98 </verification-score>
7150	                </cumulative>
7151	              </voiceprint>
7152	            </verification-result>
7153	          </result>

7155	   C->S:  MRCP/2.0 49 END-SESSION 314166
7156	          Channel-Identifier:32AECB23433801@speakverify

7158	   S->C:  MRCP/2.0 49 314166 200 COMPLETE
7159	          Channel-Identifier:32AECB23433801@speakverify

7161	                        VERIFY-FROM-BUFFER example

7163	11.12.  VERIFY-ROLLBACK

7165	   The VERIFY-ROLLBACK method discards the last buffered utterance or
7166	   discards the last live utterances (when the mode is "train" or
7167	   "verify").  The client should invoke this method when the user
7168	   provides undesirable input such as non-speech noises, side-speech,
7169	   out-of-grammar utterances, commands, etc.  Note that this method does
7170	   not provide a stack of rollback states.  Executing VERIFY-ROLLBACK
7171	   twice in succession without an intervening recognition operation has
7172	   no effect on the second attempt.

7174	   C->S:  MRCP/2.0 49 VERIFY-ROLLBACK 314165
7175	          Channel-Identifier:32AECB23433801@speakverify

7177	   S->C:  MRCP/2.0 49 314165 200 COMPLETE
7178	          Channel-Identifier:32AECB23433801@speakverify

7180	                          VERFY-ROLLBACK Example

7182	11.13.  STOP

7184	   The "STOP" method from the client to the server tells the
7185	   verification resource to stop the VERIFY or VERIFY-FROM-BUFFER
7186	   request if one is active.  If such a request is active and the "STOP"
7187	   request successfully terminated it, then the response header contains
7188	   an active-request-id-list header containing the request-id of the
7189	   VERIFY or VERIFY-FROM-BUFFER request that was terminated.  In this
7190	   case, no VERIFICATION-COMPLETE event is sent for the terminated
7191	   request.  If there was no verify request active, then the response
7192	   MUST NOT contain an active-request-id-list header.  Either way the
7193	   response MUST contain a status of 200 (Success).

7195	   The "STOP" method can carry a "Abort-Verification" header which
7196	   specifies if the verification result until that point should be
7197	   discarded or returned.  If this header is not present or if the value
7198	   is "true", the verification result is discarded and the "STOP"
7199	   response does not contain any result data.  If the header is present
7200	   and its value is "false", the "STOP" response MUST contain a
7201	   "Completion-Cause" header and carry the Verification result data in
7202	   its body.

7204	   An aborted VERIFY request does an automatic roll-back and hence does
7205	   not affect the cumulative score.  A VERIFY request that was stopped
7206	   with no "Abort-Verification" header or with the "Abort-Verification"
7207	   header set to "false" does affect cumulative scores and would need to
7208	   be explicitly rolled-back if the client does not want the
7209	   verification result considered in the cumulative scores.

7211	   The following example assumes a voiceprint identity has already been
7212	   established.

7214	   C->S:  MRCP/2.0 123 VERIFY 314177
7215	          Channel-Identifier:32AECB23433801@speakverify

7217	   S->C:  MRCP/2.0 49 314177 200 IN-PROGRESS
7218	          Channel-Identifier:32AECB23433801@speakverify

7220	   C->S:  MRCP/2.0 49 STOP 314178
7221	          Channel-Identifier:32AECB23433801@speakverify

7223	   S->C:  MRCP/2.0 123 314178 200 COMPLETE
7224	          Channel-Identifier:32AECB23433801@speakverify
7225	          Active-Request-Id-List:314177

7227	                         STOP verification Example

7229	11.14.  START-INPUT-TIMERS

7231	   This request is sent from the client to the verification resource to
7232	   start the no-input timer, usually once the client has ascertained
7233	   that any audio prompts to the user have played to completion.

7235	   C->S:  MRCP/2.0 49 START-INPUT-TIMERS 543260
7236	          Channel-Identifier:32AECB23433801@speakverify

7238	   S->C:  MRCP/2.0 49 543260 200 COMPLETE
7239	          Channel-Identifier:32AECB23433801@speakverify

7241	11.15.  VERIFICATION-COMPLETE

7243	   The VERIFICATION-COMPLETE event follows a call to VERIFY or VERIFY-
7244	   FROM-BUFFER and is used to communicate the verification results to
7245	   the client.  The event message body contains only verification
7246	   results.

7248	   S->C:  MRCP/2.0 123 VERIFICATION-COMPLETE 543259 COMPLETE
7249	          Completion-Cause:000 success
7250	          Content-Type:application/nlsml+xml
7251	          Content-Length:...

7253	          <?xml version="1.0"?>
7254	          <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
7255	                  grammar="What-Grammar-URI">
7256	            <verification-result>
7257	              <voiceprint id="johnsmith">
7258	                <incremental>
7259	                  <utterance-length> 500 </utterance-length>
7260	                  <device> cellular-phone </device>
7261	                  <gender> male </gender>
7262	                  <decision> accepted </decision>
7263	                  <verification-score> 0.85 </verification-score>
7264	                </incremental>
7265	                <cumulative>
7266	                  <utterance-length> 1500 </utterance-length>
7267	                  <device> cellular-phone </device>
7268	                  <gender> male </gender>
7269	                  <decision> accepted </decision>
7270	                  <verification-score> 0.75 </verification-score>
7271	                </cumulative>
7272	              </voiceprint>
7273	            </verification-result>
7274	          </result>

7276	11.16.  START-OF-INPUT

7278	   The START-OF-INPUT event is returned from the server to the client
7279	   once the server has detected speech.  This event is always returned
7280	   by the verification resource when speech has been detected,
7281	   irrespective of whether the recognizer and verification resources
7282	   share the same session or not.

7284	   S->C:  MRCP/2.0 49 START-OF-INPUT 543259 IN-PROGRESS
7285	          Channel-Identifier:32AECB23433801@speakverify

7287	11.17.  CLEAR-BUFFER

7289	   The CLEAR-BUFFER method can be used to clear the verification buffer.
7290	   This buffer is used to buffer speech during a recognition, record or
7291	   verification operations that may later be used VERIFY-FROM-BUFFER.
7292	   As noted before, the buffer associated with the verification resource
7293	   is shared by other input resources like recognizers and recorders.
7294	   Hence, a CLEAR-BUFFER request fails if the verification buffer is in
7295	   use.  This can happen when any one of the input resources that shares
7296	   this buffer has an active read or write operation such as RECORD,
7297	   RECOGNIZE or VERIFY with the Ver-Buffer-Utterance header set to
7298	   "true".

7300	   C->S:  MRCP/2.0 49 CLEAR-BUFFER 543260
7301	          Channel-Identifier:32AECB23433801@speakverify

7303	   S->C:  MRCP/2.0 49 543260 200 COMPLETE
7304	          Channel-Identifier:32AECB23433801@speakverify

7306	11.18.  GET-INTERMEDIATE-RESULT

7308	   A client can use the GET-INTERMEDIATE-RESULT method to poll for
7309	   intermediate results of a verification request that is in progress.
7310	   Invoking this method does not change the state of the resource.  The
7311	   verification resource collects the accumulated verification results
7312	   and returns the information in the method response.  The message body
7313	   in the response to a GET-INTERMEDIATE-RESULT REQUEST contains only
7314	   verification results.  The method response MUST NOT contain a
7315	   Completion-Cause header as the request is not yet complete.  If the
7316	   resource does not have a verification in progress the response has a
7317	   402 failure code and no result in the body.

7319	   C->S:  MRCP/2.0 49 GET-INTERMEDIATE-RESULT 543260
7320	          Channel-Identifier:32AECB23433801@speakverify

7322	   S->C:  MRCP/2.0 49 543260 200 COMPLETE
7323	          Channel-Identifier:32AECB23433801@speakverify
7324	          Content-Type:application/nlsml+xml
7325	          Content-Length:...

7327	          <?xml version="1.0"?>
7328	          <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
7329	                  grammar="What-Grammar-URI">
7330	            <verification-result>
7331	              <voiceprint id="marysmith">
7332	                <incremental>
7333	                  <utterance-length> 50 </utterance-length>
7334	                  <device> cellular-phone </device>
7335	                  <gender> female </gender>
7336	                  <decision> undecided </decision>
7337	                  <verification-score> 0.85 </verification-score>
7338	                </incremental>
7339	                <cumulative>
7340	                  <utterance-length> 150 </utterance-length>
7341	                  <device> cellular-phone </device>
7342	                  <gender> female </gender>
7343	                  <decision> undecided </decision>
7344	                  <verification-score> 0.65 </verification-score>
7345	                </cumulative>
7346	              </voiceprint>
7347	            </verification-result>
7348	          </result>

7350	12.  Security Considerations

7352	   MRCPv2 is designed to comply with the security-related requirements
7353	   documented in SpeechSC Requirements [RFC4313].  Implementers and
7354	   users of MRCPv2 are strongly encouraged to read the Security
7355	   Considerations section of [RFC4313], because that document contains
7356	   discussion of a number of important security issues associated with
7357	   the utilization of speech as biometric authentication technology, and
7358	   on the threats against systems which store recorded speech, contain
7359	   large corpora of voiceprints, and send and receive sensitive
7360	   information based on voice input to a recognizer or speech output
7361	   from a synthesizer.  Specific security measures employed by MRCPv2
7362	   are summarized in the following subsections.  See the corresponding
7363	   sections of this specification for how the security-related machinery
7364	   is invoked by individual protocol operations.

7366	12.1.  Rendezvous and Session Establishment

7368	   MRCPv2 control sessions are established as media sessions described
7369	   by SDP within the context of a SIP dialog.  In order to ensure secure
7370	   rendezvous between MRCPv2 clients and servers, the following are
7371	   required:

7373	   1.  The SIP implementation in MRCPv2 clients and servers MUST support
7374	       digest authentication.
7375	   2.  The SIP implementation in MRCPv2 clients and servers SHOULD
7376	       employ SIPS: URIs,
7377	   3.  If media stream cryptographic keying is done through SDP (e.g.
7378	       using [RFC4568]), the MRCPv2 clients and servers MUST employ
7379	       SIPS:.

7381	12.2.  Control channel protection

7383	   Sensitive data is carried over the MRCPv2 control channel.  This
7384	   includes things like the output of speech recognition operations,
7385	   speaker verification results, input to text-to-speech conversion,
7386	   etc.  For this reason MRCPv2 servers must be properly authenticated
7387	   and the control channel must permit the use of both confidentiality
7388	   and integrity for the data.  To ensure control channel protection,
7389	   MRCPv2 clients and servers MUST support TLS and SHOULD utilize it by
7390	   default unless alternative control channel protection is used.
7391	   Alternative control channel protection MAY be used if desired (e.g.
7392	   IPSEC).

7394	12.3.  Media session protection

7396	   Sensitive data is also carried on media sessions terminating on
7397	   MRCPv2 servers (the other end of a media channel may or may not be on
7398	   the MRCPv2 client).  This data includes the user's spoken utterances
7399	   and the output of text-to-speech operations.  MRCPv2 servers MUST
7400	   support a security mechanism for protection of audio media sessions.
7401	   MRCPv2 clients that originate or consume audio similarly MUST support
7402	   a security mechanism for protection of the audio.  If appropriate,
7403	   usage of the Secure Real-time Transport Protocol (SRTP) [RFC3711] is
7404	   recommended.

7406	12.4.  Indirect Content Access

7408	   MCRPv2 employs content indirection extensively.  Content may be
7409	   fetched and/or stored based on URI-addressing on systems other than
7410	   the MRCPv2 client or server.  Not all of the stored content is
7411	   necessarily sensitive (e.g. grammar definitions, XML schemas), but
7412	   the majority generally needs protection, and some indirect content,
7413	   such as voice recordings and voiceprints, are extremely sensitive and
7414	   must always be protected.  MRCPv2 clients and servers MUST implement
7415	   HTTPS for indirect content access, and SHOULD employ secure access
7416	   for all sensitive indirect content.  Other secure URI-schemes, such
7417	   as FTPS or SIPS MAY also be used.  See Section 6.2.15 for the headers
7418	   used to transfer cookie information between the MRCPv2 client and
7419	   server if needed for authentication.

7421	12.5.  Protection of stored media

7423	   MRCPv2 applications often require the use of stored media.  Voice
7424	   recordings are both stored (e.g. for diagnosis and system tuning),
7425	   and fetched (for replaying utterances into multiple MRCPv2
7426	   resources).  Voiceprints are fundamental to the speaker
7427	   identification and verification functions.  This data can be
7428	   extremely sensitive and can present substantial privacy and
7429	   impersonation risks if stolen.  Systems employing MRCPv2 should be
7430	   deployed in ways that minimize these risks.  The SpeechSC
7431	   Requirements [RFC4313] contains a more extensive discussion of these
7432	   risks and ways they may be mitigated.

7434	13.  IANA Considerations

7436	13.1.  New registries

7438	   This section describes the name spaces (registries) for MRCPv2 that
7439	   IANA is requested to create and maintain.  Assignment/registration
7440	   policies are described in RFC5226 [RFC5226].

7442	13.1.1.  MRCPv2 resource types

7444	   IANA SHALL create a new name space of "MRCPv2 resource types".  All
7445	   maintenance within and additions to the contents of this name space
7446	   MUST be according to the "Standards Action" registration policy.  The
7447	   initial contents of the registry, defined in Section 4.2, are given
7448	   below:
7449	   Resource type  Resource description  Reference
7450	   -------------  --------------------  ---------
7451	   speechrecog    Speech Recognizer     [RFCXXXX]
7452	   dtmfrecog      DTMF Recognizer       [RFCXXXX]
7453	   speechsynth    Speech Synthesizer    [RFCXXXX]
7454	   basicsynth     Basic Synthesizer     [RFCXXXX]
7455	   speakverify    Speaker Verification  [RFCXXXX]
7456	   recorder       Speech Recorder       [RFCXXXX]

7458	13.1.2.  MRCPv2 methods and events

7460	   IANA SHALL create a new name space of "MRCPv2 methods and events".
7461	   All maintenance within and additions to the contents of this name
7462	   space MUST be according to the "Standards Action" registration
7463	   policy.  The initial contents of the registry, defined by the
7464	   "method-name" BNF in Section 5.2 and the "event-name" BNF in
7465	   Section 5.5, are given below.

7467	   Name                     Resource type  Method/Event  Reference
7468	   ----                     -------------  ------------  ---------
7469	   SET-PARAMS               Synthesizer    Method        [RFCXXXX]
7470	   GET-PARAMS               Synthesizer    Method        [RFCXXXX]
7471	   SPEAK                    Synthesizer    Method        [RFCXXXX]
7472	   STOP                     Synthesizer    Method        [RFCXXXX]
7473	   PAUSE                    Synthesizer    Method        [RFCXXXX]
7474	   RESUME                   Synthesizer    Method        [RFCXXXX]
7475	   BARGE-IN-OCCURRED        Synthesizer    Method        [RFCXXXX]
7476	   CONTROL                  Synthesizer    Method        [RFCXXXX]
7477	   DEFINE-LEXICON           Synthesizer    Method        [RFCXXXX]
7478	   DEFINE-GRAMMAR           Recognizer     Method        [RFCXXXX]
7479	   RECOGNIZE                Recognizer     Method        [RFCXXXX]
7480	   INTERPRET                Recognizer     Method        [RFCXXXX]
7481	   GET-RESULT               Recognizer     Method        [RFCXXXX]
7482	   START-INPUT-TIMERS       Recognizer     Method        [RFCXXXX]
7483	   STOP                     Recognizer     Method        [RFCXXXX]
7484	   START-PHRASE-ENROLLMENT  Recognizer     Method        [RFCXXXX]
7485	   ENROLLMENT-ROLLBACK      Recognizer     Method        [RFCXXXX]
7486	   END-PHRASE-ENROLLMENT    Recognizer     Method        [RFCXXXX]
7487	   MODIFY-PHRASE            Recognizer     Method        [RFCXXXX]
7488	   DELETE-PHRASE            Recognizer     Method        [RFCXXXX]
7489	   RECORD                   Recorder       Method        [RFCXXXX]
7490	   STOP                     Recorder       Method        [RFCXXXX]
7491	   START-SESSION            Verifier       Method        [RFCXXXX]
7492	   END-SESSION              Verifier       Method        [RFCXXXX]
7493	   QUERY-VOICEPRINT         Verifier       Method        [RFCXXXX]
7494	   DELETE-VOICEPRINT        Verifier       Method        [RFCXXXX]
7495	   VERIFY                   Verifier       Method        [RFCXXXX]
7496	   VERIFY-FROM-BUFFER       Verifier       Method        [RFCXXXX]
7497	   VERIFY-ROLLBACK          Verifier       Method        [RFCXXXX]
7498	   STOP                     Verifier       Method        [RFCXXXX]
7499	   START-INPUT-TIMERS       Verifier       Method        [RFCXXXX]
7500	   GET-INTERMEDIATE-RESULT  Verifier       Method        [RFCXXXX]
7501	   SPEECH-MARKER            Synthesizer    Event         [RFCXXXX]
7502	   SPEAK-COMPLETE           Synthesizer    Event         [RFCXXXX]
7503	   START-OF-INPUT           Recognizer     Event         [RFCXXXX]
7504	   RECOGNITION-COMPLETE     Recognizer     Event         [RFCXXXX]
7505	   INTERPRETATION-COMPLETE  Recognizer     Event         [RFCXXXX]
7506	   START-OF-INPUT           Recorder       Event         [RFCXXXX]
7507	   RECORD-COMPLETE          Recorder       Event         [RFCXXXX]
7508	   VERIFICATION-COMPLETE    Verifier       Event         [RFCXXXX]
7509	   START-OF-INPUT           Verifier       Event         [RFCXXXX]

7511	13.1.3.  MRCPv2 headers

7513	   IANA SHALL create a new name space of "MRCPv2 headers".  All
7514	   maintenance within and additions to the contents of this name space
7515	   MUST be according to the "Standards Action" registration policy.  The
7516	   initial contents of the registry, defined by the "message-header" BNF
7517	   in Section 5.1, are given below.  Note that the values permitted for
7518	   the "Vendor-Specific-Parameters" parameter are managed according to a
7519	   different policy.  See Section 13.1.6.
7520	   Name                               Resource type    Reference
7521	   ----                               -------------    ---------
7522	   channel-identifier                 Generic          [RFCXXXX]
7523	   accept                             Generic          [RFC2616]
7524	   active-request-id-list             Generic          [RFCXXXX]
7525	   proxy-sync-id                      Generic          [RFCXXXX]
7526	   accept-charset                     Generic          [RFC2616]
7527	   content-type                       Generic          [RFCXXXX]
7528	   content-id               Generic  [RFC2392, RFC2046, and RFC5322]
7529	   content-base                       Generic          [RFCXXXX]
7530	   content-encoding                   Generic          [RFCXXXX]
7531	   content-location                   Generic          [RFCXXXX]
7532	   content-length                     Generic          [RFCXXXX]
7533	   fetch-timeout                      Generic          [RFCXXXX]
7534	   cache-control                      Generic          [RFCXXXX]
7535	   logging-tag                        Generic          [RFCXXXX]
7536	   set-cookie                         Generic          [RFCXXXX]
7537	   set-cookie2                        Generic          [RFCXXXX]
7538	   vendor-specific                    Generic          [RFCXXXX]
7539	   jump-size                          Synthesizer      [RFCXXXX]
7540	   kill-on-barge-in                   Synthesizer      [RFCXXXX]
7541	   speaker-profile                    Synthesizer      [RFCXXXX]
7542	   completion-cause                   Synthesizer      [RFCXXXX]
7543	   completion-reason                  Synthesizer      [RFCXXXX]
7544	   voice-parameter                    Synthesizer      [RFCXXXX]
7545	   prosody-parameter                  Synthesizer      [RFCXXXX]
7546	   speech-marker                      Synthesizer      [RFCXXXX]
7547	   speech-language                    Synthesizer      [RFCXXXX]
7548	   fetch-hint                         Synthesizer      [RFCXXXX]
7549	   audio-fetch-hint                   Synthesizer      [RFCXXXX]
7550	   failed-uri                         Synthesizer      [RFCXXXX]
7551	   failed-uri-cause                   Synthesizer      [RFCXXXX]
7552	   speak-restart                      Synthesizer      [RFCXXXX]
7553	   speak-length                       Synthesizer      [RFCXXXX]
7554	   load-lexicon                       Synthesizer      [RFCXXXX]
7555	   lexicon-search-order               Synthesizer      [RFCXXXX]
7556	   confidence-threshold               Recognizer       [RFCXXXX]
7557	   sensitivity-level                  Recognizer       [RFCXXXX]
7558	   speed-vs-accuracy                  Recognizer       [RFCXXXX]
7559	   n-best-list-length                 Recognizer       [RFCXXXX]
7560	   input-type                         Recognizer       [RFCXXXX]
7561	   no-input-timeout                   Recognizer       [RFCXXXX]
7562	   recognition-timeout                Recognizer       [RFCXXXX]
7563	   waveform-uri                       Recognizer       [RFCXXXX]
7564	   input-waveform-uri                 Recognizer       [RFCXXXX]
7565	   completion-cause                   Recognizer       [RFCXXXX]
7566	   completion-reason                  Recognizer       [RFCXXXX]
7567	   recognizer-context-block           Recognizer       [RFCXXXX]
7568	   start-input-timers                 Recognizer       [RFCXXXX]
7569	   speech-complete-timeout            Recognizer       [RFCXXXX]
7570	   speech-incomplete-timeout          Recognizer       [RFCXXXX]
7571	   dtmf-interdigit-timeout            Recognizer       [RFCXXXX]
7572	   dtmf-term-timeout                  Recognizer       [RFCXXXX]
7573	   dtmf-term-char                     Recognizer       [RFCXXXX]
7574	   failed-uri                         Recognizer       [RFCXXXX]
7575	   failed-uri-cause                   Recognizer       [RFCXXXX]
7576	   save-waveform                      Recognizer       [RFCXXXX]
7577	   media-type                         Recognizer       [RFCXXXX]
7578	   new-audio-channel                  Recognizer       [RFCXXXX]
7579	   speech-language                    Recognizer       [RFCXXXX]
7580	   ver-buffer-utterance               Recognizer       [RFCXXXX]
7581	   recognition-mode                   Recognizer       [RFCXXXX]
7582	   cancel-if-queue                    Recognizer       [RFCXXXX]
7583	   hotword-max-duration               Recognizer       [RFCXXXX]
7584	   hotword-min-duration               Recognizer       [RFCXXXX]
7585	   interpret-text                     Recognizer       [RFCXXXX]
7586	   dtmf-buffer-time                   Recognizer       [RFCXXXX]
7587	   clear-dtmf-buffer                  Recognizer       [RFCXXXX]
7588	   early-no-match                     Recognizer       [RFCXXXX]
7589	   num-min-consistent-pronunciations  Recognizer       [RFCXXXX]
7590	   consistency-threshol               Recognizer       [RFCXXXX]
7591	   clash-threshold                    Recognizer       [RFCXXXX]
7592	   personal-grammar-uri               Recognizer       [RFCXXXX]
7593	   enroll-utterance                   Recognizer       [RFCXXXX]
7594	   phrase-id                          Recognizer       [RFCXXXX]
7595	   phrase-nl                          Recognizer       [RFCXXXX]
7596	   weight                             Recognizer       [RFCXXXX]
7597	   save-best-waveform                 Recognizer       [RFCXXXX]
7598	   new-phrase-id                      Recognizer       [RFCXXXX]
7599	   confusable-phrases-ur              Recognizer       [RFCXXXX]
7600	   abort-phrase-enrollmen             Recognizer       [RFCXXXX]
7601	   sensitivity-level                  Recorder         [RFCXXXX]
7602	   no-input-timeout                   Recorder         [RFCXXXX]
7603	   completion-cause                   Recorder         [RFCXXXX]
7604	   failed-uri                         Recorder         [RFCXXXX]
7605	   failed-uri-cause                   Recorder         [RFCXXXX]
7606	   record-uri                         Recorder         [RFCXXXX]
7607	   media-type                         Recorder         [RFCXXXX]
7608	   max-time                           Recorder         [RFCXXXX]
7609	   trim-length                        Recorder         [RFCXXXX]
7610	   final-silence                      Recorder         [RFCXXXX]
7611	   capture-on-speech                  Recorder         [RFCXXXX]
7612	   new-audio-channel                  Recorder         [RFCXXXX]
7613	   start-input-timers                 Recorder         [RFCXXXX]
7614	   input-type                         Recorder         [RFCXXXX]
7615	   repository-uri                     Verifier         [RFCXXXX]
7616	   voiceprint-identifier              Verifier         [RFCXXXX]
7617	   verification-mode                  Verifier         [RFCXXXX]
7618	   adapt-model                        Verifier         [RFCXXXX]
7619	   abort-model                        Verifier         [RFCXXXX]
7620	   min-verification-score             Verifier         [RFCXXXX]
7621	   num-min-verification-phrases       Verifier         [RFCXXXX]
7622	   num-max-verification-phrases       Verifier         [RFCXXXX]
7623	   no-input-timeout                   Verifier         [RFCXXXX]
7624	   save-waveform                      Verifier         [RFCXXXX]
7625	   media-type                         Verifier         [RFCXXXX]
7626	   waveform-uri                       Verifier         [RFCXXXX]
7627	   voiceprint-exists                  Verifier         [RFCXXXX]
7628	   ver-buffer-utterance               Verifier         [RFCXXXX]
7629	   input-waveform-uri                 Verifier         [RFCXXXX]
7630	   completion-cause                   Verifier         [RFCXXXX]
7631	   completion-reason                  Verifier         [RFCXXXX]
7632	   speech-complete-timeout            Verifier         [RFCXXXX]
7633	   new-audio-channel                  Verifier         [RFCXXXX]
7634	   abort-verification                 Verifier         [RFCXXXX]
7635	   start-input-timers                 Verifier         [RFCXXXX]
7636	   input-type                         Verifier         [RFCXXXX]

7638	13.1.4.  MRCPv2 status codes

7640	   IANA SHALL create a new name space of "MRCPv2 status codes" with the
7641	   initial values that are defined in Section 5.4 All maintenance within
7642	   and additions to the contents of this name space MUST be according to
7643	   the "Specification Required with Expert Review" registration policy.

7645	13.1.5.  Grammar Reference List Parameters

7647	   IANA SHALL create a new name space of "Grammar Reference List
7648	   Parameters".  All maintenance within and additions to the contents of
7649	   this name space MUST be according to the "Specification Required with
7650	   Expert Review" registration policy.  There is only one initial
7651	   parameter, "weight", which is defined in Section 13.5.1 and
7652	   Section 9.9.

7654	13.1.6.  MRCPv2 vendor-specific parameters

7656	   IANA SHALL create a new name space of "MRCPv2 vendor-specific
7657	   parameters".  All maintenance within and additions to the contents of
7658	   this name space MUST be according to the "Hierarchical Allocation"
7659	   registration policy as follows.  Each name (corresponding to the
7660	   "vendor-av-pair-name" ABNF production) MUST satisfy the syntax
7661	   requirements of Internet Domain Names as described in section 2.3.1
7662	   of RFC1035 [RFC1035] (and as updated or obsoleted by successive
7663	   RFCs), with one exception, the order of the domain names is reversed.
7664	   For example, a vendor-specific parameter "foo" by example.com would
7665	   have the form "com.example.foo".  The first, or top-level domain, is
7666	   restricted to exactly the set of Top-Level Internet Domains defined
7667	   by IANA and will be updated by IANA when and only when that set
7668	   changes.  The second-level and all subdomains within the parameter
7669	   name MUST be allocated according to the "Expert Review" policy.  The
7670	   Designated Expert MAY advise IANA to allow delegation of subdomains
7671	   to the requester.  As a general guideline, the Designated Expert is
7672	   encouraged to manage the allocation of corporate, organizational, or
7673	   institutional names and delegate all subdomains accordingly.  For
7674	   example, the Designated Expert MAY allocate "com.example" and
7675	   delegate all subdomains of that name to the organization represented
7676	   by the Internet domain name "example.com".  For simplicity, the
7677	   Designated Expert is encouraged to perform allocations according to
7678	   the existing allocations of Internet domain names to organizations,
7679	   institutions, corporations, etc.

7681	   The registry contains a list of vendor-registered parameters, where
7682	   each defined parameter is associated with a reference to an RFC
7683	   defining it.  The registry is initially empty.

7685	13.2.  NLSML-related registrations

7687	13.2.1.  application/nlsml+xml Media Type registration

7689	   IANA is requested to register the following Media Type according to
7690	   the process defined in RFC4288 [RFC4288].
7691	   To:  ietf-types@iana.org
7692	   Subject:  Registration of media type application/nlsml+xml
7693	   MIME media type name:  application
7694	   MIME subtype name:  nlsml+xml
7695	   Required parameters:  none
7696	   Optional parameters:
7697	      charset:  All of the considerations described in RFC3023 also
7698	         apply to the application/nlsml+xml media type.
7699	   Encoding considerations:  All of the considerations described in
7700	      RFC3023 also apply to the application/nlsml+xml media type.
7701	   Security considerations:  As with HTML, NLSML documents contain links
7702	      to other data stores (grammars, verification resources, etc.).
7703	      Unlike HTML, however, the data stores are not treated as media to
7704	      be rendered.  Nevertheless, linked files may themselves have
7705	      security considerations, which would be those of the individual
7706	      registered types.  Additionally, this media type has all of the
7707	      security considerations described in RFC3023.
7708	   Interoperability considerations:  Although an NLSML document is
7709	      itself a complete XML document, for a fuller interpretation of the
7710	      content a receiver of an NLSML document may wish to access
7711	      resources linked to by the document.  The inability of an NLSML
7712	      processor to access or process such linked resources could result
7713	      in different behavior by the ultimate consumer of the data.
7714	   Published specification:  RFCXXXX
7715	   Applications which use this media type:  MRCPv2 clients and servers
7716	   Additional information:  none
7717	   Magic number(s):  There is no single initial byte sequence that is
7718	      always present for NLSML files.
7719	   Person & email address to contact for further information:  Sarvi
7720	      Shanmugham, sarvi@cisco.com
7721	   Intended usage:  This media type is expected to be used only in
7722	      conjunction with MRCPv2.

7724	13.3.  NLSML XML Schema registration

7726	   IANA is requested to register and maintain the following XML Schema.
7727	   Information provided follows the template in RFC3688 [RFC3688].
7728	   XML element type:  schema
7729	   URI:  http://www.ietf.org/xml/schema/mrcpv2
7730	   Registrant Contact:  IESG
7731	   XML:  See Section 16.1.

7733	13.4.  MRCPv2 XML Namespace registration

7735	   IANA is requested to register and maintain the following XML Name
7736	   space.  Information provided follows the template in RFC3688
7737	   [RFC3688].
7738	   XML element type:  ns
7739	   URI:  http://www.ietf.org/xml/ns/mrcpv2
7740	   Registrant Contact:  IESG
7741	   XML:  RFCXXXX

7743	13.5.  text Media Type Registrations

7745	   IANA is requested to register the following text Media Types
7746	   according to the process defined in RFC 4288 [RFC4288].

7748	13.5.1.  text/grammar-ref-list

7750	   To:  ietf-types@iana.org
7751	   Subject:  Registration of media type text/grammar-ref-list
7752	   MIME media type name:  application
7753	   MIME subtype name:  text/grammar-ref-list
7754	   Required parameters:  none
7755	   Optional parameters:  none
7756	   Encoding considerations:  Depending on the transfer protocol, a
7757	      transfer encoding may be necessary to deal with very long lines.
7758	   Security considerations:  This media type contains URIs which may
7759	      represent references to external resources.  As these resources
7760	      are assumed to be speech recognition grammars, similar
7761	      considerations as for the media types "application/srgs" and
7762	      "application/srgs+xml" apply.
7763	   Interoperability considerations;  '>' must be percent encoded in URIs
7764	      according to RFC3986.
7765	   Published specification:  The RECOGNIZE method of the MRCP protocol
7766	      performs a recognition operation that matches input against a set
7767	      of grammars.  When matching against more than one grammar, it is
7768	      sometimes necessary to use different weights for the individual
7769	      grammars.  These weights are not a property of the grammar
7770	      resource itself but qualify the reference to that grammar for the
7771	      particular recognition operation initiated by the RECOGNIZE
7772	      method.  The format of the proposed text/grammar-ref-list media
7773	      type is as follows: body = *reference where reference = "<" uri
7774	      ">" [parameters] CRLF parameters = ";" parameter *(";" parameter)
7775	      and parameter = attribute "=" value.  This specification currently
7776	      only defines a 'weight' parameter, but new parameters may be added
7777	      through the "Grammar Reference List Parameters" IANA registry
7778	      established through this specification.  Example:
7779	      <http://example.com/grammars/field1.gram>
7780	      <http://example.com/grammars/field2.gram>;weight="0.85"
7781	      <session:field3@form-level.store>;weight="0.9"
7782	      <http://example.com/grammars/universals.gram>;weight="0.75"
7783	   Applications which use this media type:  MRCPv2 clients and servers
7784	   Additional information:  none
7785	   Magic number(s):  none
7786	   Person & email address to contact for further information:  Sarvi
7787	      Shanmugham, sarvi@cisco.com
7788	   Intended usage:  This media type is expected to be used only in
7789	      conjunction with MRCPv2.

7791	13.6.  session URL scheme registration

7793	   IANA is requested to register the following new URI scheme.  The
7794	   information below follows the template given in RFC4395 [RFC4395].

7796	   URL scheme name:  "session"
7797	   URL scheme syntax:  The syntax of this scheme is identical to that
7798	      defined for the "cid" scheme in section 2 of RFC2392.
7799	   Character encoding considerations:  URI values are limited to the US-
7800	      ASCII character set.
7801	   Intended usage:  The URI is intended to identify a data resource
7802	      previously given to the network computing resource.  The purpose
7803	      of this scheme is to permit access to the specific resource for
7804	      the lifetime of the session with the entity storing the resource.
7805	      The media type of the resource CAN vary.  There is no explicit
7806	      mechanism for communication of the media type.  This scheme is
7807	      currently widely used internally by existing implementations, and
7808	      the registration is intended to provide information in the rare
7809	      (and unfortunate) case that the scheme is used elsewhere.  The
7810	      scheme SHOULD NOT be used for open internet protocols.
7811	   Applications and/or protocols which use this URL scheme name:  This
7812	      scheme name is used by MRCPv2 clients and servers.
7813	   Interoperability considerations:
7814	      The character set for URLs is restricted to US-ASCII.  Note that
7815	      none of the resources are accessible after the MCRPv2 session
7816	      ends, hence the name of the scheme.  For clients who establish one
7817	      MRCPv2 session only for the entire speech application being
7818	      implemented this is sufficient, but clients who create, terminate,
7819	      and recreate MRCP sessions for performance or scalability reasons
7820	      will lose access to resources established in the earlier
7821	      session(s).
7822	   Security considerations:  The URIs defined here provide an
7823	      identification mechanism only.  Given that the communication
7824	      channel between client and server is secure, that the server
7825	      correctly accesses the resource associated with the URI, and that
7826	      the server ensures session-only lifetime and access for each URI,
7827	      the only remaining security issues are those of the types of media
7828	      referred to by the URI.
7829	   Relevant publications:  This specification, particularly sections
7830	      Section 6.2.7, Section 8.5.2, Section 9.5.1, and Section 9.9.
7831	   Contact for further information:  Sarvi Shanmugham, sarvi@cisco.com
7832	   Author/Change controller:  IESG

7834	13.7.  SDP parameter registrations

7836	   IANA is requested to register the following SDP parameter values.
7837	   The information for each follows the template given in RFC4566
7838	   [RFC4566], Appendix B.

7840	13.7.1.  sub-registry "proto"

7842	   "TCP/MRCPv2" value of the "proto" parameter
7843	   Contact name, email address and telephone number:  Sarvi Shanmugham,
7844	      sarvi@cisco.com, +1.408.902.3875
7845	   Name being registered (as it will appear in SDP):  TCP/MRCPv2
7846	   Long-form name in English:  MCRPv2 over TCP
7847	   Type of name:  proto
7848	   Explanation of name:  This name represents the MCRPv2 protocol
7849	      carried over TCP.
7850	   Reference to specification of name:  RFCXXXX
7851	   "TCP/TLS/MRCPv2" value of the "proto" parameter
7852	   Contact name, email address and telephone number:  Sarvi Shanmugham,
7853	      sarvi@cisco.com, +1.408.902.3875
7854	   Name being registered (as it will appear in SDP):  TCP/TLS/MRCPv2
7855	   Long-form name in English:  MCRPv2 over TLS over TCP
7856	   Type of name:  proto
7857	   Explanation of name:  This name represents the MCRPv2 protocol
7858	      carried over TLS over TCP.
7859	   Reference to specification of name:  RFCXXXX

7861	13.7.2.  sub-registry "att-field (session-level)"

7863	   "resource" value of the "att-field" parameter
7864	   Contact name, email address and telephone number:  Sarvi Shanmugham,
7865	      sarvi@cisco.com, +1.408.902.3875
7866	   Attribute name (as it will appear in SDP):  resource
7867	   Long-form attribute name in English:  MRCPv2 resource type
7868	   Type of attribute:  media-level
7869	   Subject to charset attribute?  no
7870	   Explanation of attribute:  See Section 4.2 of RFCXXXX for description
7871	      and examples.
7872	   Specification of appropriate attribute values:  See section
7873	      Section 13.1.1 of RFCXXXX.
7874	   "channel" value of the "att-field" parameter
7875	   Contact name, email address and telephone number:  Sarvi Shanmugham,
7876	      sarvi@cisco.com, +1.408.902.3875
7877	   Attribute name (as it will appear in SDP):  channel
7878	   Long-form attribute name in English:  MRCPv2 resource channel
7879	      identifier
7880	   Type of attribute:  media-level
7881	   Subject to charset attribute?  no
7882	   Explanation of attribute:  See Section 4.2 of RFCXXXX for description
7883	      and examples.
7884	   Specification of appropriate attribute values  See Section 4.2 and
7885	      the "channel-id" ABNF production rules of RFCXXXX.

7887	13.7.3.  sub-registry "att-field (media-level)"

7889	   "cmid" value of the "att-field" parameter
7890	   Contact name, email address and telephone number:  Sarvi Shanmugham,
7891	      sarvi@cisco.com, +1.408.902.3875
7892	   Attribute name (as it will appear in SDP):  cmid
7893	   Long-form attribute name in English:  MRCPv2 resource channel media
7894	      identifier
7895	   Type of attribute:  media-level
7896	   Subject to charset attribute?  no
7897	   Explanation of attribute:  See Section 4.3 of RFCXXXX for description
7898	      and examples.
7899	   Specification of appropriate attribute values  See Section 4.3 and
7900	      the "cmid-attribute" ABNF production rules of RFCXXXX.

7902	14.  Examples

7904	14.1.  Message Flow

7906	   The following is an example of a typical MRCPv2 session of speech
7907	   synthesis and recognition between a client and a server.

7909	   The figure below illustrates opening a session to the MRCPv2 server.
7910	   This is exchange does not allocate a resource or setup media.  It
7911	   simply establishes a SIP session with the MRCPv2 server.

7913	   C->S:
7914	          INVITE sip:mresources@example.com SIP/2.0
7915	          Max-Forwards:6
7916	          To:MediaServer <sip:mresources@example.com>
7917	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
7918	          Call-ID:a84b4c76e66710
7919	          CSeq:314159 INVITE
7920	          Contact:<sip:sarvi@client.example.com>
7921	          Content-Type:application/sdp
7922	          Content-Length:...

7924	          v=0
7925	          o=sarvi 2890844526 2890842807 IN IP4 192.0.2.4
7926	          s=Set up MRCPv2 control and audio
7927	          i=Initial contact
7928	          c=IN IP4 192.0.2.12

7930	   S->C:
7931	          SIP/2.0 200 OK
7932	          To:MediaServer <sip:mresources@example.com>;tag=62784
7933	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
7934	          Call-ID:a84b4c76e66710
7935	          CSeq:314159 INVITE
7936	          Contact:<sip:mresources@server.example.com>
7937	          Content-Type:application/sdp
7938	          Content-Length:...

7940	          v=0
7941	          o=sarvi 2890844526 2890842807 IN IP4 192.0.2.4
7942	          s=Set up MRCPv2 control and audio
7943	          i=Initial contact
7944	          c=IN IP4 192.0.2.11

7946	   C->S:
7947	          ACK sip:mresources@server.example.com SIP/2.0
7948	          Max-Forwards:6
7949	          To:MediaServer <sip:mresources@example.com>;tag=62784
7950	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
7951	          Call-ID:a84b4c76e66710
7952	          CSeq:314160 ACK
7953	          Content-Length:...

7955	   The client requests the server to create synthesizer resource control
7956	   channel to do speech synthesis.  This also adds a media pipe to send
7957	   the generated speech.  Note that in this example, the client requests
7958	   a new MRCPv2 TCP pipe between the client and the server.  In the
7959	   following requests, the client will ask to use the existing
7960	   connection.

7962	   C->S:
7963	          INVITE sip:mresources@server.example.com SIP/2.0
7964	          Max-Forwards:6
7965	          To:MediaServer <sip:mresources@example.com>;tag=62784
7966	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
7967	          Call-ID:a84b4c76e66710
7968	          CSeq:314161 INVITE
7969	          Contact:<sip:sarvi@client.example.com>
7970	          Content-Type:application/sdp
7971	          Content-Length:...

7973	          v=0
7974	          o=sarvi 2890844526 2890842808 IN IP4 192.0.2.4
7975	          s=Set up MRCPv2 control and audio
7976	          i=Add TCP channel, synthesizer and one-way audio
7977	          c=IN IP4 192.0.2.12
7978	          m=application 9  TCP/MRCPv2 1
7979	          a=setup:active
7980	          a=connection:new
7981	          a=resource:speechsynth
7982	          a=cmid:1
7983	          m=audio 49170 RTP/AVP 0 96
7984	          a=rtpmap:0 pcmu/8000
7985	          a=recvonly
7986	          a=mid:1

7988	   S->C:
7989	          SIP/2.0 200 OK
7990	          To:MediaServer <sip:mresources@example.com>;tag=62784
7991	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
7992	          Call-ID:a84b4c76e66710
7993	          CSeq:314161 INVITE
7994	          Contact:<sip:mresources@server.example.com>
7995	          Content-Type:application/sdp
7996	          Content-Length:...

7998	          v=0
7999	          o=sarvi 2890844526 2890842808 IN IP4 192.0.2.4
8000	          s=Set up MRCPv2 control and audio
8001	          i=Add TCP channel, synthesizer and one-way audio
8002	          c=IN IP4 192.0.2.11
8003	          m=application 32416  TCP/MRCPv2 1
8004	          a=setup:passive
8005	          a=connection:new
8006	          a=channel:32AECB23433801@speechsynth
8007	          a=cmid:1
8008	          m=audio 48260 RTP/AVP 0
8009	          a=rtpmap:0 pcmu/8000
8010	          a=sendonly
8011	          a=mid:1

8013	   C->S:
8014	          ACK sip:mresources@server.example.com SIP/2.0
8015	          Max-Forwards:6
8016	          To:MediaServer <sip:mresources@example.com>;tag=62784
8017	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8018	          Call-ID:a84b4c76e66710
8019	          CSeq:314162 ACK
8020	          Content-Length:...

8022	   This exchange allocates an additional resource control channel for a
8023	   recognizer.  Since a recognizer would need to receive an audio stream
8024	   for recognition, this interaction also updates the audio stream to
8025	   sendrecv making it a 2-way audio stream.

8027	   C->S:
8028	          INVITE sip:mresources@server.example.com SIP/2.0
8029	          Max-Forwards:6
8030	          To:MediaServer <sip:mresources@example.com>;tag=62784
8031	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8032	          Call-ID:a84b4c76e66710
8033	          CSeq:314163 INVITE
8034	          Contact:<sip:sarvi@client.example.com>
8035	          Content-Type:application/sdp
8036	          Content-Length:...

8038	          v=0
8039	          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
8040	          s=Set up MRCPv2 control and audio
8041	          i=Add recognizer and duplex the audio
8042	          c=IN IP4 192.0.2.12
8043	          m=application 9  TCP/MRCPv2 1
8044	          a=setup:active
8045	          a=connection:existing
8046	          a=resource:speechsynth
8047	          a=cmid:1
8048	          m=audio 49170 RTP/AVP 0 96
8049	          a=rtpmap:0 pcmu/8000
8050	          a=recvonly
8051	          a=mid:1
8052	          m=application 9  TCP/MRCPv2 1
8053	          a=setup:active
8054	          a=connection:existing
8055	          a=resource:speechrecog
8056	          a=cmid:2
8057	          m=audio 49180 RTP/AVP 0 96
8058	          a=rtpmap:0 pcmu/8000
8059	          a=rtpmap:96 telephone-event/8000
8060	          a=fmtp:96 0-15
8061	          a=sendonly
8062	          a=mid:2

8064	   S->C:
8065	          SIP/2.0 200 OK
8066	          To:MediaServer <sip:mresources@example.com>;tag=62784
8067	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8068	          Call-ID:a84b4c76e66710
8069	          CSeq:314163 INVITE
8070	          Contact:<sip:mresources@server.example.com>
8071	          Content-Type:application/sdp
8072	          Content-Length:...

8074	          v=0
8075	          o=sarvi 2890844526 2890842809 IN IP4 192.0.2.4
8076	          s=Set up MRCPv2 control and audio
8077	          i=Add recognizer and duplex the audio
8078	          c=IN IP4 192.0.2.11
8079	          m=application 32416  TCP/MRCPv2 1
8080	          a=channel:32AECB23433801@speechsynth
8081	          a=cmid:1
8082	          m=audio 48260 RTP/AVP 0
8083	          a=rtpmap:0 pcmu/8000
8084	          a=sendonly
8085	          a=mid:1
8086	          m=application 32416  TCP/MRCPv2 1
8087	          a=channel:32AECB23433801@speechrecog
8088	          a=cmid:2
8089	          m=audio 48260 RTP/AVP 0
8090	          a=rtpmap:0 pcmu/8000
8091	          a=rtpmap:96 telephone-event/8000
8092	          a=fmtp:96 0-15
8093	          a=recvonly
8094	          a=mid:2

8096	   C->S:
8097	          ACK sip:mresources@server.example.com SIP/2.0
8098	          Max-Forwards:6
8099	          To:MediaServer <sip:mresources@example.com>;tag=62784
8100	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8101	          Call-ID:a84b4c76e66710
8102	          CSeq:314164 ACK
8103	          Content-Length:...

8105	   A MRCPv2 "SPEAK" request initiates speech.

8107	   C->S:
8108	          MRCP/2.0 386 SPEAK 543257
8109	          Channel-Identifier:32AECB23433801@speechsynth
8110	          Kill-On-Barge-In:false
8111	          Voice-gender:neutral
8112	          Voice-age:25
8113	          Prosody-volume:medium
8114	          Content-Type:application/ssml+xml
8115	          Content-Length:...

8117	          <?xml version="1.0"?>
8118	          <speak version="1.0"
8119	                 xmlns="http://www.w3.org/2001/10/synthesis"
8120	                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
8121	                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
8122	                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
8123	                 xml:lang="en-US">
8124	            <p>
8125	              <s>You have 4 new messages.</s>
8126	              <s>The first is from Stephanie Williams
8127	                <mark name="Stephanie"/>
8128	                and arrived at <break/>
8129	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
8130	              <s>The subject is <prosody
8131	                 rate="-20%">ski trip</prosody></s>
8132	            </p>
8133	          </speak>

8135	   S->C:
8136	          MRCP/2.0 49 543257 200 IN-PROGRESS
8137	          Channel-Identifier:32AECB23433801@speechsynth
8138	          Speech-Marker:timestamp=857205015059

8140	   The synthesizer hits the special marker in the message to be spoken
8141	   and faithfully informs the client of the event.

8143	   S->C:  MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS
8144	          Channel-Identifier:32AECB23433801@speechsynth
8145	          Speech-Marker:timestamp=857206027059;Stephanie

8147	   The synthesizer finishes with the "SPEAK" request.

8149	   S->C:  MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE
8150	          Channel-Identifier:32AECB23433801@speechsynth
8151	          Speech-Marker:timestamp=857207685213;Stephanie

8153	   The recognizer is issued a request to listen for the customer
8154	   choices.

8156	   C->S:  MRCP/2.0 343 RECOGNIZE 543258
8157	          Channel-Identifier:32AECB23433801@speechrecog
8158	          Content-Type:application/srgs+xml
8159	          Content-Length:...

8161	          <?xml version="1.0"?>
8162	          <!-- the default grammar language is US English -->
8163	          <grammar xmlns="http://www.w3.org/2001/06/grammar"
8164	                   xml:lang="en-US" version="1.0" root="request">
8165	          <!-- single language attachment to a rule expansion -->
8166	            <rule id="request">
8167	              Can I speak to
8168	              <one-of xml:lang="fr-CA">
8169	                <item>Michel Tremblay</item>
8170	                <item>Andre Roy</item>
8171	              </one-of>
8172	            </rule>
8173	          </grammar>

8175	   S->C:  MRCP/2.0 49 543258 200 IN-PROGRESS
8176	          Channel-Identifier:32AECB23433801@speechrecog

8178	   The client issues the next MRCPv2 "SPEAK" method.  It is generally
8179	   RECOMMENDED when playing a prompt to the user with kill-on-barge-in
8180	   and asking for input, that the client issue the RECOGNIZE request
8181	   ahead of the "SPEAK" request for optimum performance and user
8182	   experience.  This way, it is guaranteed that the recognizer is online
8183	   before the prompt starts playing and the user's speech will not be
8184	   truncated at the beginning (especially for power users).

8186	   C->S:  MRCP/2.0 289 SPEAK 543259
8187	          Channel-Identifier:32AECB23433801@speechsynth
8188	          Kill-On-Barge-In:true
8189	          Content-Type:application/ssml+xml
8190	          Content-Length:...

8192	          <?xml version="1.0"?>
8193	          <speak version="1.0"
8194	                 xmlns="http://www.w3.org/2001/10/synthesis"
8195	                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
8196	                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
8197	                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
8198	                 xml:lang="en-US">
8199	            <p>
8200	              <s>Welcome to ABC corporation.</s>
8201	              <s>Who would you like Talk to.</s>
8202	            </p>
8203	          </speak>

8205	   S->C:  MRCP/2.0 52 543259 200 IN-PROGRESS
8206	          Channel-Identifier:32AECB23433801@speechsynth
8207	          Speech-Marker:timestamp=857207696314

8209	   Since the last "SPEAK" request had Kill-On-Barge-In set to "true",
8210	   the speech synthesizer is interrupted when the user starts speaking.
8211	   And the client is notified.

8213	   Now, since the recognition and synthesizer resources are on the same
8214	   session, they may have worked with each other to deliver kill-on-
8215	   barge-in.  Whether the synthesizer and recognizer are in the same
8216	   session or not the recognizer MUST generate the START-OF-INPUT event
8217	   to the client.

8219	   The client MUST then blindly turn around and issued a BARGE-IN-
8220	   OCCURRED method to the synthesizer resource (if a "SPEAK" request was
8221	   active).  The synthesizer, if kill-on-barge-in was enabled on the
8222	   current "SPEAK" request, would have then interrupted it and issued a
8223	   "SPEAK"-COMPLETE event to the client.

8225	   The completion-cause code differentiates if this is normal completion
8226	   or a kill-on-barge-in interruption.

8228	   S->C:  MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS
8229	          Channel-Identifier:32AECB23433801@speechrecog
8230	          Proxy-Sync-Id:987654321

8232	   C->S:  MRCP/2.0 69 BARGE-IN-OCCURRED 543259
8233	          Channel-Identifier:32AECB23433801@speechsynth
8234	          Proxy-Sync-Id:987654321

8236	   S->C:  MRCP/2.0 72 543259 200 COMPLETE
8237	          Channel-Identifier:32AECB23433801@speechsynth
8238	          Active-Request-Id-List:543258
8239	          Speech-Marker:timestamp=857206096314

8241	   S->C:  MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE
8242	          Channel-Identifier:32AECB23433801@speechsynth
8243	          Completion-Cause:001 barge-in
8244	          Speech-Marker:timestamp=857207685213

8246	   The recognition resource matched the spoken stream to a grammar and
8247	   generated results.  The result of the recognition is returned by the
8248	   server as part of the RECOGNITION-COMPLETE event.

8250	   S->C:  MRCP/2.0 412 RECOGNITION-COMPLETE 543258 COMPLETE
8251	          Channel-Identifier:32AECB23433801@speechrecog
8252	          Completion-Cause:000 success
8253	          Waveform-URI:<http://web.media.com/session123/audio.wav>;
8254	                       size=423523;duration=25432
8255	          Content-Type:application/nlsml+xml
8256	          Content-Length:...

8258	          <?xml version="1.0"?>
8259	          <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
8260	                  xmlns:ex="http://www.example.com/example"
8261	                  grammar="session:request1@form-level.store">
8262	              <interpretation>
8263	                  <instance name="Person">
8264	                      <ex:Person>
8265	                          <ex:Name> Andre Roy </ex:Name>
8266	                      </ex:Person>
8267	                  </instance>
8268	                  <input>   may I speak to Andre Roy </input>
8269	              </interpretation>
8270	          </result>

8272	   When the client wants to tear down the whole session and all its
8273	   resources, it MUST issue a SIP BYE to close the SIP session.  This
8274	   will de-allocate all the control channels and resources allocated
8275	   under the session.

8277	   C->S:  BYE sip:mresources@server.example.com SIP/2.0
8278	          Max-Forwards:6
8279	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8280	          To:MediaServer <sip:mresources@example.com>;tag=62784
8281	          Call-ID:a84b4c76e66710
8282	          CSeq:231 BYE
8283	          Content-Length:...

8285	14.2.  Recognition Result Examples

8287	14.2.1.  Simple ASR Ambiguity

8289	   System: To which city will you be traveling?
8290	   User:   I want to go to Pittsburgh.

8292	   <?xml version="1.0"?>
8293	   <result xmlns="http://www.ietf.org/xml/ns/mrcpv2"
8294	           xmlns:ex="http://www.example.com/example"
8295	           grammar="http://www.example.com/flight">
8296	     <interpretation confidence="0.6">
8297	        <instance>
8298	           <ex:airline>
8299	              <ex:to_city>Pittsburgh</ex:to_city>
8300	           <ex:airline>
8301	        <instance>
8302	        <input mode="speech">
8303	           I want to go to Pittsburgh
8304	        </input>
8305	     </interpretation>
8306	     <interpretation confidence="0.4"
8307	        <instance>
8308	           <ex:airline>
8309	              <ex:to_city>Stockholm</ex:to_city>
8310	           </ex:airline>
8311	        </instance>
8312	        <input>I want to go to Stockholm</input>
8313	     </interpretation>
8314	   </result>

8316	14.2.2.  Mixed Initiative

8318	   System: What would you like?
8319	   User:   I would like 2 pizzas, one with pepperoni and cheese,
8320	           one with sausage and a bottle of coke, to go.

8322	   This example includes an order object which in turn contains objects
8323	   named "food_item", "drink_item" and "delivery_method".  The
8324	   representation assumes there are no ambiguities in the speech or
8325	   natural language processing.  Note that this representation also
8326	   assumes some level of intra-sentential anaphora resolution, i.e., to
8327	   resolve the two "one's" as "pizza".

8329	   <?xml version="1.0"?>
8330	   <nl:result xmlns:nl="http://www.ietf.org/xml/ns/mrcpv2"
8331	              xmlns="http://www.example.com/example"
8332	              grammar="http://www.example.com/foodorder">
8333	     <nl:interpretation confidence="1.0" >
8334	        <nl:instance>
8335	         <order>
8336	           <food_item confidence="1.0">
8337	             <pizza>
8338	               <ingredients confidence="1.0">
8339	                 pepperoni
8340	               </ingredients>
8341	               <ingredients confidence="1.0">
8342	                 cheese
8343	               </ingredients>
8344	             </pizza>
8345	             <pizza>
8346	               <ingredients>sausage</ingredients>
8347	             </pizza>
8348	           </food_item>
8349	           <drink_item confidence="1.0">
8350	             <size>2-liter</size>
8351	           </drink_item>
8352	           <delivery_method>to go</delivery_method>
8353	         </order>
8354	       </nl:instance>
8355	       <nl:input mode="speech">I would like 2 pizzas,
8356	            one with pepperoni and cheese, one with sausage
8357	            and a bottle of coke, to go.
8358	       </nl:input>
8359	     </nl:interpretation>
8360	   </nl:result>

8362	14.2.3.  DTMF Input

8364	   A combination of DTMF input and speech is represented using nested
8365	   input elements.  For example:
8366	   User: My pin is (dtmf 1 2 3 4)

8368	   <input>
8369	     <input mode="speech" confidence ="1.0"
8370	        timestamp-start="2000-04-03T0:00:00"
8371	        timestamp-end="2000-04-03T0:00:01.5">My pin is
8372	     </input>
8373	     <input mode="dtmf" confidence ="1.0"
8374	        timestamp-start="2000-04-03T0:00:01.5"
8375	        timestamp-end="2000-04-03T0:00:02.0">1 2 3 4
8376	     </input>
8377	   </input>

8379	   Note that grammars that recognize mixtures of speech and DTMF are not
8380	   currently possible in VoiceXML; however this representation may be
8381	   needed for other applications of NLSML, and it may be introduced in
8382	   future versions of VoiceXML.

8384	14.2.4.  Interpreting Meta-Dialog and Meta-Task Utterances

8386	   Natural language communication makes use of meta-dialog and meta-task
8387	   utterances.  This specification is flexible enough so that meta
8388	   utterances can be represented on an application-specific basis
8389	   without requiring other standard markup.

8391	   Here are two examples of how meta-task and meta-dialog utterances
8392	   might be represented.

8394	System: What toppings do you want on your pizza?
8395	User:   What toppings do you have?

8397	<interpretation grammar="http://www.example.com/toppings">
8398	   <instance>
8399	      <question>
8400	         <questioned_item>toppings<questioned_item>
8401	         <questioned_property>
8402	          availability
8403	         </questioned_property>
8404	      </question>
8405	   </instance>
8406	   <input mode="speech">
8407	     what toppings do you have?
8408	   </input>
8409	</interpretation>

8411	User:   slow down.

8413	<interpretation grammar="http://www.example.com/generalCommandsGrammar">
8414	   <instance>
8415	    <command>
8416	       <action>reduce speech rate</action>
8417	       <doer>system</doer>
8418	    </command>
8419	   </instance>
8420	  <input mode="speech">slow down</input>
8421	</interpretation>

8423	14.2.5.  Anaphora and Deixis

8425	   This specification can be used on an application-specific basis to
8426	   represent utterances that contain unresolved anaphoric and deictic
8427	   references.  Anaphoric references, which include pronouns and
8428	   definite noun phrases that refer to something that was mentioned in
8429	   the preceding linguistic context, and deictic references, which refer
8430	   to something that is present in the non-linguistic context, present
8431	   similar problems in that there may not be sufficient unambiguous
8432	   linguistic context to determine what their exact role in the
8433	   interpretation should be.  In order to represent unresolved anaphora
8434	   and deixis using this specification, one strategy would be for the
8435	   developer to define a more surface-oriented representation that
8436	   leaves the specific details of the interpretation of the reference
8437	   open.  (This assumes that a later component is responsible for
8438	   actually resolving the reference).

8440	   Example: (ignoring the issue of representing the input from the
8441	             pointing gesture.)

8443	   System: What do you want to drink?
8444	   User:   I want this (clicks on picture of large root beer.)

8446	   <?xml version="1.0"?>
8447	   <nl:result xmlns:nl="http://www.ietf.org/xml/ns/mrcpv2"
8448	           xmlns="http://www.example.com/example"
8449	           grammar="http://www.example.com/beverages.grxml">
8450	      <nl:interpretation>
8451	         <nl:instance>
8452	          <doer>I</doer>
8453	          <action>want</action>
8454	          <object>this</object>
8455	         </nl:instance>
8456	         <nl:input mode="speech">I want this</nl:input>
8457	      </nl:interpretation>
8458	   </nl:result>

8460	14.2.6.  Distinguishing Individual Items from Sets with One Member

8462	   For programming convenience, it is useful to be able to distinguish
8463	   between individual items and sets containing one item in the XML
8464	   representation of semantic results.  For example, a pizza order might
8465	   consist of exactly one pizza, but a pizza might contain zero or more
8466	   toppings.  Since there is no standard way of marking this distinction
8467	   directly in XML, in the current framework, the developer is free to
8468	   adopt any conventions that would convey this information in the XML
8469	   markup.  One strategy would be for the developer to wrap the set of
8470	   items in a grouping element, as in the following example.

8472	   <order>
8473	      <pizza>
8474	         <topping-group>
8475	            <topping>mushrooms</topping>
8476	         </topping-group>
8477	      </pizza>
8478	      <drink>coke</drink>
8479	   </order>

8481	   In this example, the programmer can assume that there is supposed to
8482	   be exactly one pizza and one drink in the order, but the fact that
8483	   there is only one topping is an accident of this particular pizza
8484	   order.

8486	   Note that the client controls both the grammar and the semantics to
8487	   be returned upon grammar matches, so the user of the MRCP protocol is
8488	   fully empowered to cause results to be returned in NLSML in such a
8489	   way that the interpretation is clear to that user.

8491	14.2.7.  Extensibility

8493	   Extensibility in NLSML is provided via result content flexibility, as
8494	   discussed in the discussions of meta utterances and anaphora.  NLSML
8495	   can easily be used in sophisticated systems to convey application-
8496	   specific information that more basic systems would not make use of,
8497	   for example defining speech acts.

8499	15.  ABNF Normative Definition

8501	   LWS    =    [*WSP CRLF] 1*WSP ; linear whitespace

8503	   SWS    =    [LWS] ; sep whitespace

8505	   UTF8-NONASCII    =    %xC0-DF 1UTF8-CONT
8506	                    /    %xE0-EF 2UTF8-CONT
8507	                    /    %xF0-F7 3UTF8-CONT
8508	                    /    %xF8-FB 4UTF8-CONT
8509	                    /    %xFC-FD 5UTF8-CONT

8511	   UTF8-CONT        =    %x80-BF
8512	   UTFCHAR          =    %x21-7E
8513	                    /    UTF8-NONASCII
8514	   param            =    *pchar

8516	   quoted-string    =    SWS DQUOTE *(qdtext / quoted-pair )
8517	                         DQUOTE

8519	   qdtext           =    LWS / %x21 / %x23-5B / %x5D-7E
8520	                    /    UTF8-NONASCII

8522	   quoted-pair      =    "\" (%x00-09 / %x0B-0C / %x0E-7F)

8524	   token            =    1*(alphanum / "-" / "." / "!" / "%" / "*"
8525	                         / "_" / "+" / "`" / "'" / "~" )

8527	   reserved         =    ";" / "/" / "?" / ":" / "@" / "&" / "="
8528	                         / "+" / "$" / ","

8530	   mark             =    "-" / "_" / "." / "!" / "~" / "*" / "'"
8531	                    /    "(" / ")"

8533	   unreserved       =    alphanum / mark
8534	   pchar            =    unreserved / escaped
8535	                    /    ":" / "@" / "&" / "=" / "+" / "$" / ","

8537	   alphanum         =    ALPHA / DIGIT

8539	   BOOLEAN          =    "true" / "false"

8541	   FLOAT            =    *DIGIT ["." *DIGIT]

8543	   escaped          =    "%" HEXDIG HEXDIG

8545	   fragment         =    *uric

8547	   uri              =    [ absoluteURI / relativeURI ]
8548	                         [ "#" fragment ]

8550	   absoluteURI      =    scheme ":" ( hier-part / opaque-part )

8552	   relativeURI      =    ( net-path / abs-path / rel-path )
8553	                         [ "?" query ]

8555	   hier-part        =    ( net-path / abs-path ) [ "?" query ]

8557	   net-path         =    "//" authority [ abs-path ]

8559	   abs-path         =    "/" path-segments

8561	   rel-path         =    rel-segment [ abs-path ]

8563	   rel-segment      =    1*( unreserved / escaped / ";" / "@"
8564	                    /    "&" / "=" / "+" / "$" / "," )

8566	   opaque-part      =    uric-no-slash *uric

8568	   uric             =    reserved / unreserved / escaped

8570	   uric-no-slash    =    unreserved / escaped / ";" / "?" / ":"
8571	                         / "@" / "&" / "=" / "+" / "$" / ","

8573	   path-segments    =    segment *( "/" segment )

8575	   segment          =    *pchar *( ";" param )

8577	   scheme           =    ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

8579	   authority        =    srvr / reg-name

8581	   srvr             =    [ [ userinfo "@" ] hostport ]
8582	   reg-name         =    1*( unreserved / escaped / "$" / ","
8583	                    /     ";" / ":" / "@" / "&" / "=" / "+" )

8585	   query            =    *uric

8587	   userinfo         =    ( user ) [ ":" password ] "@"

8589	   user             =    1*( unreserved / escaped
8590	                    /    user-unreserved )

8592	   user-unreserved  =    "&" / "=" / "+" / "$" / "," / ";"
8593	                    /    "?" / "/"

8595	   password         =    *( unreserved / escaped
8596	                    /    "&" / "=" / "+" / "$" / "," )

8598	   hostport         =    host [ ":" port ]

8600	   host             =    hostname / IPv4address / IPv6reference

8602	   hostname         =    *( domainlabel "." ) toplabel [ "." ]

8604	   domainlabel      =    alphanum / alphanum *( alphanum / "-" )
8605	                         alphanum

8607	   toplabel         =    ALPHA / ALPHA *( alphanum / "-" )
8608	                         alphanum

8610	   IPv4address      =    1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "."
8611	                         1*3DIGIT

8613	   IPv6reference    =    "[" IPv6address "]"

8615	   IPv6address      =    hexpart [ ":" IPv4address ]

8617	   hexpart          =    hexseq / hexseq "::" [ hexseq ] / "::"
8618	                         [ hexseq ]

8620	   hexseq           =    hex4 *( ":" hex4)

8622	   hex4             =    1*4HEXDIG

8624	   port             =    1*19DIGIT

8626	   cmid-attribute   =    "a=cmid:" identification-tag

8628	   identification-tag =    token
8629	   generic-message  =    start-line message-header CRLF
8630	                         [ message-body ]

8632	   message-body     =    *OCTET

8634	   start-line       =    request-line / status-line / event-line

8636	   request-line     =    mrcp-version SP message-length SP method-name
8637	                         SP request-id CRLF

8639	   status-line      =    mrcp-version SP message-length SP request-id
8640	                         SP status-code SP request-state CRLF

8642	   event-line       =    mrcp-version SP message-length SP event-name
8643	                         SP request-id SP request-state CRLF

8645	   method-name      =    generic-method
8646	                    /    synthesizer-method
8647	                    /    recognizer-method
8648	                    /    recorder-method
8649	                    /    verifier-method

8651	   generic-method   =    "SET-PARAMS"
8652	                    /    "GET-PARAMS"

8654	   request-state    =    "COMPLETE"
8655	                    /    "IN-PROGRESS"
8656	                    /    "PENDING"

8658	   event-name       =    synthesizer-event
8659	                    /    recognizer-event
8660	                    /    recorder-event
8661	                    /    verifier-event

8663	   message-header   =    1*(generic-header / resource-header)

8665	   resource-header  =    synthesizer-header
8666	                    /    recognizer-header
8667	                    /    recorder-header
8668	                    /    verifier-header

8670	   generic-header   =    channel-identifier
8671	                    /    accept
8672	                    /    active-request-id-list
8673	                    /    proxy-sync-id
8674	                    /    accept-charset
8675	                    /    content-type
8676	                    /    content-id
8677	                    /    content-base
8678	                    /    content-encoding
8679	                    /    content-location
8680	                    /    content-length
8681	                    /    fetch-timeout
8682	                    /    cache-control
8683	                    /    logging-tag
8684	                    /    set-cookie
8685	                    /    set-cookie2
8686	                    /    vendor-specific

8688	   ; -- content-id is as defined in RFC2392, RFC2046 and RFC5322
8689	   ; -- accept and accept-charset are as defined in RFC2616

8691	   mrcp-version     =    "MRCP" "/" 1*2DIGIT "." 1*2DIGIT

8693	   message-length   =    1*19DIGIT

8695	   request-id       =    1*10DIGIT

8697	   status-code      =    3DIGIT

8699	   channel-identifier =  "Channel-Identifier" ":"
8700	                         channel-id CRLF

8702	   channel-id       =    1*alphanum "@" 1*alphanum

8704	   active-request-id-list = "Active-Request-Id-List" ":"
8705	                            request-id *("," request-id) CRLF

8707	   proxy-sync-id    =    "Proxy-Sync-Id" ":" 1*VCHAR CRLF

8709	   content-length   =    "Content-Length" ":" 1*19DIGIT CRLF

8711	   content-base     =    "Content-Base" ":" absoluteURI CRLF

8713	   content-type     =    "Content-Type" ":" media-type-value CRLF

8715	   media-type-value =    type "/" subtype *( ";" parameter )

8717	   type             =    token

8719	   subtype          =    token

8721	   parameter        =    attribute "=" value

8723	   attribute        =    token
8724	   value            =    token / quoted-string

8726	   content-encoding =    "Content-Encoding" ":"
8727	                         *WSP content-coding
8728	                         *(*WSP "," *WSP content-coding *WSP )
8729	                         CRLF

8731	   content-coding   =    token

8733	   content-location =    "Content-Location" ":"
8734	                         ( absoluteURI / relativeURI )  CRLF

8736	   cache-control    =    "Cache-Control" ":"
8737	                         [*WSP cache-directive
8738	                         *( *WSP "," *WSP cache-directive *WSP )]
8739	                         CRLF

8741	   fetch-timeout    =    "Fetch-Timeout" ":" 1*19DIGIT CRLF

8743	   cache-directive  =    "max-age" "=" delta-seconds
8744	                    /    "max-stale" ["=" delta-seconds ]
8745	                    /    "min-fresh" "=" delta-seconds

8747	   logging-tag      =    "Logging-Tag" ":" 1*UTFCHAR CRLF

8749	   vendor-specific  =    "Vendor-Specific-Parameters" ":"
8750	                         [vendor-specific-av-pair
8751	                         *(";" vendor-specific-av-pair)] CRLF

8753	   vendor-specific-av-pair = vendor-av-pair-name "="
8754	                             value

8756	   vendor-av-pair-name     = 1*UTFCHAR

8758	   set-cookie       =    "Set-Cookie:" cookies CRLF

8760	   cookies          =    cookie *("," *LWS cookie)

8762	   cookie           =    attribute "=" value *(";" cookie-av)

8764	   cookie-av        =    "Comment" "=" value
8765	                    /    "Domain" "=" value
8766	                    /    "Max-Age" "=" value
8767	                    /    "Path" "=" value
8768	                    /    "Secure"
8769	                    /    "Version" "=" 1*19DIGIT
8770	                    /    "Age" "=" delta-seconds

8772	   set-cookie2      =    "Set-Cookie2:" cookies2 CRLF

8774	   cookies2         =    cookie2 *("," *LWS cookie2)

8776	   cookie2          =    attribute "=" value *(";" cookie-av2)

8778	   cookie-av2       =    "Comment" "=" value
8779	                    /    "CommentURL" "=" DQUOTE uri DQUOTE
8780	                    /    "Discard"
8781	                    /    "Domain" "=" value
8782	                    /    "Max-Age" "=" value
8783	                    /    "Path" "=" value
8784	                    /    "Port" [ "=" DQUOTE portlist DQUOTE ]
8785	                    /    "Secure"
8786	                    /    "Version" "=" 1*19DIGIT
8787	                    /    "Age" "=" delta-seconds

8789	   portlist         =    portnum *("," *LWS portnum)

8791	   portnum          =    1*19DIGIT

8793	   ; Synthesizer ABNF

8795	   synthesizer-method    =    "SPEAK"
8796	                         /    "STOP"
8797	                         /    "PAUSE"
8798	                         /    "RESUME"
8799	                         /    "BARGE-IN-OCCURRED"
8800	                         /    "CONTROL"
8801	                         /    "DEFINE-LEXICON"

8803	   synthesizer-event     =    "SPEECH-MARKER"
8804	                         /    "SPEAK-COMPLETE"

8806	   synthesizer-header    =    jump-size
8807	                         /    kill-on-barge-in
8808	                         /    speaker-profile
8809	                         /    completion-cause
8810	                         /    completion-reason
8811	                         /    voice-parameter
8812	                         /    prosody-parameter
8813	                         /    speech-marker
8814	                         /    speech-language
8815	                         /    fetch-hint
8816	                         /    audio-fetch-hint
8817	                         /    failed-uri
8818	                         /    failed-uri-cause
8819	                         /    speak-restart
8820	                         /    speak-length
8821	                         /    load-lexicon
8822	                         /    lexicon-search-order

8824	   jump-size             =    "Jump-Size" ":" speech-length-value CRLF

8826	   speech-length-value   =    numeric-speech-length
8827	                         /    text-speech-length

8829	   text-speech-length    =    1*UTFCHAR SP "Tag"

8831	   numeric-speech-length =    ("+" / "-") positive-speech-length

8833	   positive-speech-length =   1*19DIGIT SP numeric-speech-unit

8835	   numeric-speech-unit   =    "Second"
8836	                         /    "Word"
8837	                         /    "Sentence"
8838	                         /    "Paragraph"

8840	   delta-seconds         =    1*19DIGIT

8842	   kill-on-barge-in      =    "Kill-On-Barge-In" ":" BOOLEAN
8843	                              CRLF

8845	   speaker-profile       =    "Speaker-Profile" ":" absoluteURI
8846	                              CRLF

8848	   completion-cause      =    "Completion-Cause" ":" 3DIGIT SP
8849	                              1*VCHAR CRLF

8851	   completion-reason     =    "Completion-Reason" ":"
8852	                              quoted-string CRLF

8854	   voice-parameter       =    voice-gender
8855	                         /    voice-age
8856	                         /    voice-variant
8857	                         /    voice-name

8859	   voice-gender          =    "Voice-Gender:" voice-gender-value CRLF

8861	   voice-gender-value    =    "male"
8862	                         /    "female"
8863	                         /    "neutral"

8865	   voice-age             =    "Voice-Age:" 1*3DIGIT CRLF
8866	   voice-variant         =    "Voice-Variant:" 1*19DIGIT CRLF

8868	   voice-name            =    "Voice-Name:"
8869	                              1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF

8871	   prosody-parameter     =    "Prosody-" prosody-param-name ":"
8872	                              [prosody-param-value] CRLF

8874	   prosody-param-name    =    1*VCHAR

8876	   prosody-param-value   =    1*VCHAR

8878	   timestamp             =    "timestamp" "=" time-stamp-value

8880	   time-stamp-value      =    1*20DIGIT

8882	   speech-marker         =    "Speech-Marker" ":"
8883	                              timestamp
8884	                              [";" 1*(UTFCHAR / %x20)] CRLF

8886	   speech-language       =    "Speech-Language" ":" 1*VCHAR CRLF

8888	   fetch-hint            =    "Fetch-Hint" ":" 1*ALPHA CRLF

8890	   audio-fetch-hint      =    "Audio-Fetch-Hint" ":" 1*ALPHA CRLF

8892	   failed-uri            =    "Failed-URI" ":" absoluteURI CRLF

8894	   failed-uri-cause      =    "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

8896	   speak-restart         =    "Speak-Restart" ":" BOOLEAN CRLF

8898	   speak-length          =    "Speak-Length" ":" positive-length-value
8899	                              CRLF

8901	   positive-length-value   =  positive-speech-length
8902	                           /  text-speech-length

8904	   load-lexicon          =    "Load-Lexicon" ":" BOOLEAN CRLF

8906	   lexicon-search-order  =    "Lexicon-Search-Order" ":"
8907	             "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF

8909	   ; Recognizer ABNF

8911	   recognizer-method     =    recog-only-method
8912	                         /    enrollment-method

8914	   recog-only-method     =    "DEFINE-GRAMMAR"
8915	                         /    "RECOGNIZE"
8916	                         /    "INTERPRET"
8917	                         /    "GET-RESULT"
8918	                         /    "START-INPUT-TIMERS"
8919	                         /    "STOP"

8921	   enrollment-method     =    "START-PHRASE-ENROLLMENT"
8922	                         /    "ENROLLMENT-ROLLBACK"
8923	                         /    "END-PHRASE-ENROLLMENT"
8924	                         /    "MODIFY-PHRASE"
8925	                         /    "DELETE-PHRASE"

8927	   recognizer-event      =    "START-OF-INPUT"
8928	                         /    "RECOGNITION-COMPLETE"
8929	                         /    "INTERPRETATION-COMPLETE"

8931	   recognizer-header     =    recog-only-header
8932	                         /    enrollment-header

8934	   recog-only-header     =    confidence-threshold
8935	                         /    sensitivity-level
8936	                         /    speed-vs-accuracy
8937	                         /    n-best-list-length
8938	                         /    input-type
8939	                         /    no-input-timeout
8940	                         /    recognition-timeout
8941	                         /    waveform-uri
8942	                         /    input-waveform-uri
8943	                         /    completion-cause
8944	                         /    completion-reason
8945	                         /    recognizer-context-block
8946	                         /    start-input-timers
8947	                         /    speech-complete-timeout
8948	                         /    speech-incomplete-timeout
8949	                         /    dtmf-interdigit-timeout
8950	                         /    dtmf-term-timeout
8951	                         /    dtmf-term-char
8952	                         /    failed-uri
8953	                         /    failed-uri-cause
8954	                         /    save-waveform
8955	                         /    media-type
8956	                         /    new-audio-channel
8957	                         /    speech-language
8958	                         /    ver-buffer-utterance
8959	                         /    recognition-mode
8960	                         /    cancel-if-queue
8961	                         /    hotword-max-duration
8962	                         /    hotword-min-duration
8963	                         /    interpret-text
8964	                         /    dtmf-buffer-time
8965	                         /    clear-dtmf-buffer
8966	                         /    early-no-match

8968	   enrollment-header     =    num-min-consistent-pronunciations
8969	                         /    consistency-threshold
8970	                         /    clash-threshold
8971	                         /    personal-grammar-uri
8972	                         /    enroll-utterance
8973	                         /    phrase-id
8974	                         /    phrase-nl
8975	                         /    weight
8976	                         /    save-best-waveform
8977	                         /    new-phrase-id
8978	                         /    confusable-phrases-uri
8979	                         /    abort-phrase-enrollment

8981	   confidence-threshold  =    "Confidence-Threshold" ":"
8982	                              FLOAT CRLF

8984	   sensitivity-level     =    "Sensitivity-Level" ":" FLOAT
8985	                              CRLF

8987	   speed-vs-accuracy     =    "Speed-Vs-Accuracy" ":" FLOAT
8988	                              CRLF

8990	   n-best-list-length    =    "N-Best-List-Length" ":" 1*19DIGIT
8991	                              CRLF

8993	   input-type            =  "Input-Type" ":"  [ "speech" / "dtmf" ] CRLF

8995	   no-input-timeout      =    "No-Input-Timeout" ":" 1*19DIGIT
8996	                              CRLF

8998	   recognition-timeout   =    "Recognition-Timeout" ":" 1*19DIGIT
8999	                              CRLF

9001	   waveform-uri          =    "Waveform-URI" ":" ["<" absoluteURI ">"
9002	                              ";" "size" "=" 1*19DIGIT
9003	                              ";" "duration" "=" 1*19DIGIT] CRLF

9005	   recognizer-context-block = "Recognizer-Context-Block" ":"
9006	                              [1*VCHAR] CRLF

9008	   start-input-timers    =    "Start-Input-Timers" ":"
9009	                              BOOLEAN CRLF

9011	   speech-complete-timeout =  "Speech-Complete-Timeout" ":"
9012	                              1*19DIGIT CRLF

9014	   speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":"
9015	                               1*19DIGIT CRLF

9017	   dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":"
9018	                             1*19DIGIT CRLF

9020	   dtmf-term-timeout     =    "DTMF-Term-Timeout" ":" 1*19DIGIT
9021	                              CRLF

9023	   dtmf-term-char        =    "DTMF-Term-Char" ":" VCHAR CRLF

9025	   save-waveform         =    "Save-Waveform" ":" BOOLEAN CRLF

9027	   new-audio-channel     =    "New-Audio-Channel" ":"
9028	                              BOOLEAN CRLF

9030	   recognition-mode      =    "Recognition-Mode" ":" 1*ALPHA CRLF

9032	   cancel-if-queue       =    "Cancel-If-Queue" ":" BOOLEAN CRLF

9034	   hotword-max-duration  =    "Hotword-Max-Duration" ":"
9035	                              1*19DIGIT CRLF

9037	   hotword-min-duration  =    "Hotword-Min-Duration" ":"
9038	                              1*19DIGIT CRLF

9040	   interpret-text           =  "Interpret-Text" ":" 1*VCHAR CRLF

9042	   dtmf-buffer-time      =    "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF

9044	   clear-dtmf-buffer     =    "Clear-DTMF-Buffer" ":" BOOLEAN CRLF

9046	   early-no-match        =    "Early-No-Match" ":" BOOLEAN CRLF

9048	   num-min-consistent-pronunciations    =
9049	       "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF

9051	   consistency-threshold =    "Consistency-Threshold" ":" FLOAT
9052	                              CRLF

9054	   clash-threshold       =    "Clash-Threshold" ":" FLOAT CRLF
9055	   personal-grammar-uri  =    "Personal-Grammar-URI" ":" uri CRLF

9057	   enroll-utterance      =    "Enroll-Utterance" ":" BOOLEAN CRLF

9059	   phrase-id             =    "Phrase-ID" ":" 1*VCHAR CRLF

9061	   phrase-nl             =    "Phrase-NL" ":" 1*UTFCHAR CRLF

9063	   weight                =    "Weight" ":" weight-value CRLF

9065	   weight-value          =    FLOAT

9067	   save-best-waveform    =    "Save-Best-Waveform" ":"
9068	                              BOOLEAN CRLF

9070	   new-phrase-id         =    "New-Phrase-ID" ":" 1*VCHAR CRLF

9072	   confusable-phrases-uri =   "Confusable-Phrases-URI" ":"
9073	                              uri CRLF

9075	   abort-phrase-enrollment =  "Abort-Phrase-Enrollment" ":"
9076	                              BOOLEAN CRLF

9078	   ; Verifier ABNF

9080	   verifier-method       =    "START-SESSION"
9081	                         /    "END-SESSION"
9082	                         /    "QUERY-VOICEPRINT"
9083	                         /    "DELETE-VOICEPRINT"
9084	                         /    "VERIFY"
9085	                         /    "VERIFY-FROM-BUFFER"
9086	                         /    "VERIFY-ROLLBACK"
9087	                         /    "STOP"
9088	                         /    "START-INPUT-TIMERS"
9089	                         /    "GET-INTERMEDIATE-RESULT"

9091	   verifier-event        =    "VERIFICATION-COMPLETE"
9092	                         /    "START-OF-INPUT"

9094	   verifier-header       =    repository-uri
9095	                         /    voiceprint-identifier
9096	                         /    verification-mode
9097	                         /    adapt-model
9098	                         /    abort-model
9099	                         /    min-verification-score
9100	                         /    num-min-verification-phrases
9101	                         /    num-max-verification-phrases
9102	                         /    no-input-timeout
9103	                         /    save-waveform
9104	                         /    media-type
9105	                         /    waveform-uri
9106	                         /    voiceprint-exists
9107	                         /    ver-buffer-utterance
9108	                         /    input-waveform-uri
9109	                         /    completion-cause
9110	                         /    completion-reason
9111	                         /    speech-complete-timeout
9112	                         /    new-audio-channel
9113	                         /    abort-verification
9114	                         /    start-input-timers
9115	                         /    input-type

9117	   repository-uri        =    "Repository-URI" ":" uri CRLF

9119	   voiceprint-identifier =    "Voiceprint-Identifier" ":"
9120	                              1*VCHAR "." 3VCHAR
9121	                              [";" 1*VCHAR "." 3VCHAR] CRLF

9123	   verification-mode     =    "Verification-Mode" ":"
9124	                              verification-mode-string

9126	   verification-mode-string = "train" / "verify"

9128	   adapt-model           =    "Adapt-Model" ":" BOOLEAN CRLF

9130	   abort-model           =    "Abort-Model" ":" BOOLEAN CRLF

9132	   min-verification-score  =  "Min-Verification-Score" ":"
9133	                              [ %x2D ] FLOAT CRLF

9135	   num-min-verification-phrases = "Num-Min-Verification-Phrases"
9136	                                  ":" 1*19DIGIT CRLF

9138	   num-max-verification-phrases = "Num-Max-Verification-Phrases"
9139	                                  ":" 1*19DIGIT CRLF

9141	   voiceprint-exists     =    "Voiceprint-Exists" ":"
9142	                              BOOLEAN CRLF

9144	   ver-buffer-utterance  =    "Ver-Buffer-Utterance" ":"
9145	                              BOOLEAN CRLF

9147	   input-waveform-uri    =    "Input-Waveform-URI" ":" uri CRLF

9149	   abort-verification    =    "Abort-Verification " ":"
9150	                              BOOLEAN CRLF

9152	   ; Recorder ABNF

9154	   recorder-method       =    "RECORD"
9155	                         /    "STOP"

9157	   recorder-event        =    "START-OF-INPUT"
9158	                         /    "RECORD-COMPLETE"

9160	   recorder-header       =    sensitivity-level
9161	                         /    no-input-timeout
9162	                         /    completion-cause
9163	                         /    completion-reason
9164	                         /    failed-uri
9165	                         /    failed-uri-cause
9166	                         /    record-uri
9167	                         /    media-type
9168	                         /    max-time
9169	                         /    trim-length
9170	                         /    final-silence
9171	                         /    capture-on-speech
9172	                         /    new-audio-channel
9173	                         /    start-input-timers
9174	                         /    input-type

9176	   record-uri            =    "Record-URI" ":" [ "<" uri ">"
9177	                              ";" "size" "=" 1*19DIGIT
9178	                              ";" "duration" "=" 1*19DIGIT] CRLF

9180	   media-type            =    "Media-Type" ":" media-type-value CRLF

9182	   max-time              =    "Max-Time" ":" 1*19DIGIT CRLF

9184	   trim-length           =    "Trim-Length" ":" 1*19DIGIT CRLF

9186	   final-silence         =    "Final-Silence" ":" 1*19DIGIT CRLF

9188	   capture-on-speech     =    "Capture-On-Speech " ":"
9189	                              BOOLEAN CRLF

9191	16.  XML Schemas

9193	16.1.  NLSML Schema Definition

9195	 <?xml version="1.0" encoding="UTF-8"?>
9196	 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
9197	             targetNamespace="http://www.ietf.org/xml/ns/mrcpv2"
9198	             xmlns="http://www.ietf.org/xml/ns/mrcpv2"
9199	             elementFormDefault="qualified"
9200	             attributeFormDefault="unqualified" >
9201	   <xs:annotation>
9202	     <xs:documentation> Natural Language Semantic Markup Schema
9203	     </xs:documentation>
9204	   </xs:annotation>
9205	   <xs:include schemaLocation="enrollment-schema.rng"/>
9206	   <xs:include schemaLocation="verification-schema.rng"/>
9207	   <xs:element name="result">
9208	     <xs:complexType>
9209	       <xs:sequence>
9210	         <xs:element name="interpretation" maxOccurs="unbounded">
9211	           <xs:complexType>
9212	             <xs:sequence>
9213	               <xs:element name="instance" minOccurs="0">
9214	                 <xs:complexType mixed="true">
9215	                   <xs:sequence minOccurs="0">
9216	                     <xs:any namespace="##other" processContents="lax"/>
9217	                   </xs:sequence>
9218	                 </xs:complexType>
9219	               </xs:element>
9220	               <xs:element name="input">
9221	                 <xs:complexType mixed="true">
9222	                   <xs:choice>
9223	                     <xs:element name="noinput" minOccurs="0"/>
9224	                     <xs:element name="nomatch" minOccurs="0"/>
9225	                     <xs:element name="input" minOccurs="0"/>
9226	                   </xs:choice>
9227	                   <xs:attribute name="mode"
9228	                                 type="xs:string"
9229	                                 default="speech"/>
9230	                   <xs:attribute name="confidence"
9231	                                 type="confidenceinfo"
9232	                                 default="1.0"/>
9233	                   <xs:attribute name="timestamp-start"
9234	                                 type="xs:string"/>
9235	                   <xs:attribute name="timestamp-end"
9236	                                 type="xs:string"/>
9237	                 </xs:complexType>
9238	               </xs:element>
9239	             </xs:sequence>
9240	             <xs:attribute name="confidence" type="confidenceinfo"
9241	                           default="1.0"/>
9242	             <xs:attribute name="grammar" type="xs:anyURI"
9243	                           use="optional"/>
9244	           </xs:complexType>
9245	         </xs:element>
9246	         <xs:element name="enrollment-result"
9247	                     type="enrollment-contents"/>
9248	         <xs:element name="verification-result"
9249	                     type="verification-contents"/>
9250	       </xs:sequence>
9251	       <xs:attribute name="grammar" type="xs:anyURI"
9252	                     use="optional"/>
9253	     </xs:complexType>
9254	   </xs:element>

9256	   <xs:simpleType name="confidenceinfo">
9257	     <xs:restriction base="xs:float">
9258	        <xs:minInclusive value="0.0"/>
9259	        <xs:maxInclusive value="1.0"/>
9260	     </xs:restriction>
9261	   </xs:simpleType>
9262	 </xs:schema>

9264	16.2.  Enrollment Results Schema Definition

9266	   <!-- MRCP Enrollment Schema
9267	   (See http://www.oasis-open.org/committees/relax-ng/spec.html)
9268	   -->

9270	   <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
9271	            ns="http://www.ietf.org/xml/ns/mrcpv2"
9272	            xmlns="http://relaxng.org/ns/structure/1.0">

9274	     <start>
9275	       <element name="enrollment-result">
9276	         <ref name="enrollment-content"/>
9277	       </element>
9278	     </start>

9280	     <define name="enrollment-content">
9281	       <interleave>
9282	         <element name="num-clashes">
9283	           <data type="nonNegativeInteger"/>

9285	         </element>
9286	         <element name="num-good-repetitions">
9287	           <data type="nonNegativeInteger"/>
9288	         </element>
9289	         <element name="num-repetitions-still-needed">
9290	           <data type="nonNegativeInteger"/>
9291	         </element>
9292	         <element name="consistency-status">
9293	           <choice>
9294	             <value>consistent</value>
9295	             <value>inconsistent</value>
9296	             <value>undecided</value>
9297	           </choice>
9298	         </element>
9299	         <optional>
9300	           <element name="clash-phrase-ids">
9301	             <oneOrMore>
9302	               <element name="item">
9303	                 <data type="token"/>
9304	               </element>
9305	             </oneOrMore>
9306	           </element>
9307	         </optional>
9308	         <optional>
9309	           <element name="transcriptions">
9310	             <oneOrMore>
9311	               <element name="item">
9312	                 <text/>
9313	               </element>
9314	             </oneOrMore>
9315	           </element>
9316	         </optional>
9317	         <optional>
9318	           <element name="confusable-phrases">
9319	             <oneOrMore>
9320	               <element name="item">
9321	                 <text/>
9322	               </element>
9323	             </oneOrMore>
9324	           </element>
9325	         </optional>
9326	       </interleave>
9327	     </define>

9329	   </grammar>

9331	16.3.  Verification Results Schema Definition
9332	   <!--    MRCP Verification Results Schema
9333	           (See http://www.oasis-open.org/committees/relax-ng/spec.html)
9334	      -->

9336	   <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
9337	            ns="http://www.ietf.org/xml/ns/mrcpv2"
9338	            xmlns="http://relaxng.org/ns/structure/1.0">

9340	     <start>
9341	       <element name="verification-result">
9342	         <ref name="verification-contents"/>
9343	       </element>
9344	     </start>

9346	     <define name="verification-contents">
9347	       <element name="voiceprint">
9348	         <ref name="firstVoiceprintContent"/>
9349	       </element>
9350	       <zeroOrMore>
9351	         <element name="voiceprint">
9352	           <ref name="restVoiceprintContent"/>
9353	         </element>
9354	       </zeroOrMore>
9355	     </define>

9357	     <define name="firstVoiceprintContent">
9358	       <attribute name="id">
9359	         <data type="string"/>
9360	       </attribute>
9361	       <interleave>
9362	         <optional>
9363	           <element name="adapted">
9364	             <data type="boolean"/>
9365	           </element>
9366	         </optional>
9367	         <optional>
9368	           <element name="needmoredata">
9369	             <ref name="needmoredataContent"/>
9370	           </element>
9371	         </optional>
9372	         <optional>
9373	           <element name="incremental">
9374	             <ref name="firstCommonContent"/>
9375	           </element>
9376	         </optional>
9377	         <element name="cumulative">
9378	           <ref name="firstCommonContent"/>
9379	         </element>

9381	       </interleave>
9382	     </define>

9384	     <define name="restVoiceprintContent">
9385	       <attribute name="id">
9386	         <data type="string"/>
9387	       </attribute>
9388	       <element name="cumulative">
9389	         <ref name="restCommonContent"/>
9390	       </element>
9391	     </define>

9393	     <define name="firstCommonContent">
9394	       <interleave>
9395	         <element name="decision">
9396	           <ref name="decisionContent"/>
9397	         </element>
9398	         <optional>
9399	           <element name="utterance-length">
9400	             <ref name="utterance-lengthContent"/>
9401	           </element>
9402	         </optional>
9403	         <optional>
9404	           <element name="device">
9405	             <ref name="deviceContent"/>
9406	           </element>
9407	         </optional>
9408	         <optional>
9409	           <element name="gender">
9410	             <ref name="genderContent"/>
9411	           </element>
9412	         </optional>
9413	         <zeroOrMore>
9414	           <element name="verification-score">
9415	             <ref name="verification-scoreContent"/>
9416	           </element>
9417	         </zeroOrMore>
9418	       </interleave>
9419	     </define>

9421	     <define name="restCommonContent">
9422	       <interleave>
9423	         <optional>
9424	           <element name="decision">
9425	             <ref name="decisionContent"/>
9426	           </element>
9427	         </optional>
9428	         <optional>
9429	           <element name="device">
9430	             <ref name="deviceContent"/>
9431	           </element>
9432	         </optional>
9433	         <optional>
9434	           <element name="gender">
9435	             <ref name="genderContent"/>
9436	           </element>
9437	         </optional>
9438	        <zeroOrMore>
9439	           <element name="verification-score">
9440	             <ref name="verification-scoreContent"/>
9441	           </element>
9442	        </zeroOrMore>
9443	        </interleave>
9444	     </define>

9446	     <define name="decisionContent">
9447	       <choice>
9448	         <value>accepted</value>
9449	         <value>rejected</value>
9450	         <value>undecided</value>
9451	       </choice>
9452	     </define>

9454	     <define name="needmoredataContent">
9455	       <data type="boolean"/>
9456	     </define>

9458	     <define name="utterance-lengthContent">
9459	       <data type="nonNegativeInteger"/>
9460	     </define>

9462	     <define name="deviceContent">
9463	       <choice>
9464	         <value>cellular-phone</value>
9465	         <value>electret-phone</value>
9466	         <value>carbon-button-phone</value>
9467	         <value>unknown</value>
9468	       </choice>
9469	     </define>

9471	     <define name="genderContent">
9472	       <choice>
9473	         <value>male</value>
9474	         <value>female</value>
9475	         <value>unknown</value>
9476	       </choice>

9478	     </define>

9480	     <define name="verification-scoreContent">
9481	       <data type="float">
9482	         <param name="minInclusive">-1</param>
9483	         <param name="maxInclusive">1</param>
9484	       </data>
9485	     </define>

9487	   </grammar>

9489	17.  References

9491	17.1.  Normative References

9493	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
9494	              Jacobson, "RTP: A Transport Protocol for Real-Time
9495	              Applications", STD 64, RFC 3550, July 2003.

9497	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
9498	              A., Peterson, J., Sparks, R., Handley, M., and E.
9499	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
9500	              June 2002.

9502	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
9503	              Streaming Protocol (RTSP)", RFC 2326, April 1998.

9505	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
9506	              Description Protocol", RFC 4566, July 2006.

9508	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
9509	              Requirement Levels", BCP 14, RFC 2119, March 1997.

9511	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
9512	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
9513	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

9515	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
9516	              with Session Description Protocol (SDP)", RFC 3264,
9517	              June 2002.

9519	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
9520	              10646", STD 63, RFC 3629, November 2003.

9522	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
9523	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

9525	   [RFC4145]  Yon, D. and G. Camarillo, "TCP-Based Media Transport in
9526	              the Session Description Protocol (SDP)", RFC 4145,
9527	              September 2005.

9529	   [RFC4572]  Lennox, J., "Connection-Oriented Media Transport over the
9530	              Transport Layer Security (TLS) Protocol in the Session
9531	              Description Protocol (SDP)", RFC 4572, July 2006.

9533	   [RFC3388]  Camarillo, G., Eriksson, G., Holler, J., and H.
9534	              Schulzrinne, "Grouping of Media Lines in the Session
9535	              Description Protocol (SDP)", RFC 3388, December 2002.

9537	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
9538	              October 2008.

9540	   [RFC2392]  Levinson, E., "Content-ID and Message-ID Uniform Resource
9541	              Locators", RFC 2392, August 1998.

9543	   [RFC2109]  Kristol, D. and L. Montulli, "HTTP State Management
9544	              Mechanism", RFC 2109, February 1997.

9546	   [RFC2965]  Kristol, D. and L. Montulli, "HTTP State Management
9547	              Mechanism", RFC 2965, October 2000.

9549	   [RFC4646]  Phillips, A. and M. Davis, "Tags for Identifying
9550	              Languages", BCP 47, RFC 4646, September 2006.

9552	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
9553	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
9554	              May 2008.

9556	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
9557	              specification", STD 13, RFC 1035, November 1987.

9559	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
9560	              Registration Procedures", BCP 13, RFC 4288, December 2005.

9562	   [RFC3688]  Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
9563	              January 2004.

9565	   [RFC4395]  Hansen, T., Hardie, T., and L. Masinter, "Guidelines and
9566	              Registration Procedures for New URI Schemes", BCP 35,
9567	              RFC 4395, February 2006.

9569	   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
9570	              Description Protocol (SDP) Security Descriptions for Media
9571	              Streams", RFC 4568, July 2006.

9573	   [W3C.REC-speech-synthesis-20040907]
9574	              Walker, M., Burnett, D., and A. Hunt, "Speech Synthesis
9575	              Markup Language (SSML) Version 1.0", World Wide Web
9576	              Consortium Recommendation REC-speech-synthesis-20040907,
9577	              September 2004,
9578	              <http://www.w3.org/TR/2004/REC-speech-synthesis-20040907>.

9580	   [RFC2483]  Mealling, M. and R. Daniel, "URI Resolution Services
9581	              Necessary for URN Resolution", RFC 2483, January 1999.

9583	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
9584	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
9585	              RFC 3711, March 2004.

9587	   [W3C.REC-speech-grammar-20040316]
9588	              McGlashan, S. and A. Hunt, "Speech Recognition Grammar
9589	              Specification Version 1.0", World Wide Web Consortium
9590	              Recommendation REC-speech-grammar-20040316, March 2004,
9591	              <http://www.w3.org/TR/2004/REC-speech-grammar-20040316>.

9593	   [W3C.REC-semantic-interpretation-20070405]
9594	              Tichelen, L. and D. Burke, "Semantic Interpretation for
9595	              Speech Recognition (SISR) Version 1.0", World Wide Web
9596	              Consortium REC REC-semantic-interpretation-20070405,
9597	              April 2007, <http://www.w3.org/TR/2007/
9598	              REC-semantic-interpretation-20070405>.

9600	   [W3C.REC-xml-names11-20040204]
9601	              Hollander, D., Layman, A., Tobin, R., and T. Bray,
9602	              "Namespaces in XML 1.1", World Wide Web Consortium
9603	              FirstEdition REC-xml-names11-20040204, February 2004,
9604	              <http://www.w3.org/TR/2004/REC-xml-names11-20040204>.

9606	17.2.  Informative References

9608	   [RFC4313]  Oran, D., "Requirements for Distributed Control of
9609	              Automatic Speech Recognition (ASR), Speaker
9610	              Identification/Speaker Verification (SI/SV), and Text-to-
9611	              Speech (TTS) Resources", RFC 4313, December 2005.

9613	   [Q.23]     International Telecommunications Union, "Technical
9614	              Features of Push-Button Telephone Sets", ITU-T Q.23, 1993.

9616	   [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
9617	              Digits, Telephony Tones, and Telephony Signals", RFC 4733,
9618	              December 2006.

9620	   [W3C.REC-voicexml20-20040316]
9621	              Burnett, D., McGlashan, S., Danielsen, P., Porter, B.,
9622	              Lucas, B., Ferrans, J., Hunt, A., Carter, J., Tryphonas,
9623	              S., and K. Rehor, "Voice Extensible Markup Language
9624	              (VoiceXML) Version 2.0", World Wide Web Consortium
9625	              Recommendation REC-voicexml20-20040316, March 2004,
9626	              <http://www.w3.org/TR/2004/REC-voicexml20-20040316>.

9628	   [RFC4463]  Shanmugham, S., Monaco, P., and B. Eberman, "A Media
9629	              Resource Control Protocol (MRCP) Developed by Cisco,
9630	              Nuance, and Speechworks", RFC 4463, April 2006.

9632	   [refs.javaSpeechGrammarFormat]
9633	              Sun Microsystems, "Java Speech Grammar Format Version
9634	              1.0", October 1998.

9636	   [W3C.REC-emma-20090210]
9637	              Johnston, M., Baggia, P., Burnett, D., Carter, J., Dahl,
9638	              D., McCobb, G., and D. Raggett, "EMMA: Extensible
9639	              MultiModal Annotation markup language", World Wide Web
9640	              Consortium Recommendation REC-emma-20090210,
9641	              February 2009,
9642	              <http://www.w3.org/TR/2009/REC-emma-20090210>.

9644	Appendix A.  Contributors

9646	   Pierre Forgues
9647	   Nuance Communications Ltd.
9648	   1500 University Street
9649	   Suite 935
9650	   Montreal, Quebec
9651	   Canada H3A 3S7

9653	   Email:  forgues@nuance.com

9655	   Charles Galles
9656	   Intervoice, Inc.
9657	   17811 Waterview Parkway
9658	   Dallas, Texas 75252

9660	   Email:  charles.galles@intervoice.com

9662	   Klaus Reifenrath
9663	   Scansoft, Inc
9664	   Guldensporenpark 32
9665	   Building D
9666	   9820 Merelbeke
9667	   Belgium

9669	   Email: klaus.reifenrath@scansoft.com

9671	Appendix B.  Acknowledgements

9673	   Andre Gillet (Nuance Communications)
9674	   Andrew Hunt (ScanSoft)
9675	   Andrew Wahbe (Genesys)
9676	   Aaron Kneiss (ScanSoft)
9677	   Brian Eberman (ScanSoft)
9678	   Corey Stohs (Cisco Systems Inc)
9679	   Dave Burke (VoxPilot)
9680	   Jeff Kusnitz (IBM Corp)
9681	   Ganesh N Ramaswamy (IBM Corp)
9682	   Klaus Reifenrath (ScanSoft)
9683	   Kristian Finlator (ScanSoft)
9684	   Magnus Westerlund (Ericsson)
9685	   Martin Dragomirecky (Cisco Systems Inc)
9686	   Paolo Baggia (Loquendo)
9687	   Peter Monaco (Nuance Communications)
9688	   Pierre Forgues (Nuance Communications)
9689	   Ran Zilca (IBM Corp)
9690	   Suresh Kaliannan (Cisco Systems Inc.)
9691	   Skip Cave (Intervoice Inc)
9692	   Thomas Gal (LumenVox)

9694	   The chairs of the speechsc work group are Eric Burger (BEA Systems,
9695	   Inc.) and Dave Oran (Cisco Systems, Inc.).

9697	Authors' Addresses

9699	   Saravanan Shanmugham
9700	   Cisco Systems, Inc.
9701	   170 W. Tasman Dr.
9702	   San Jose, CA  95134
9703	   USA

9705	   Email: sarvi@cisco.com

9707	   Daniel C. Burnett
9708	   Voxeo
9709	   189 South Orange Avenue #2050
9710	   Orlando, FL  32801
9711	   USA

9713	   Email: dburnett@voxeo.com