idnits 2.17.1 

draft-ietf-speechsc-mrcpv2-28.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 5 instances of lines with non-RFC2606-compliant FQDNs in the
     document.

  -- The document has examples using IPv4 documentation addresses according
     to RFC6890, but does not use any IPv6 documentation addresses.  Maybe
     there should be IPv6 examples, too?


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 619 has weird spacing: '...ple.net   or...'

  -- The document seems to contain a disclaimer for pre-RFC5378 work, and may
     have content which was first submitted before 10 November 2008.  The
     disclaimer is necessary when there are original authors that you have
     been unable to contact, or if some do not wish to grant the BCP78 rights
     to the IETF Trust.  If you are able to get all authors (current and
     original) to grant those rights, you can and should remove the
     disclaimer; otherwise, the disclaimer is needed and you can ignore this
     comment. (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (August 28, 2012) is 4252 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFCXXXX' is mentioned on line 8040, but not defined

  == Missing Reference: 'LWS' is mentioned on line 8914, but not defined

  ** Obsolete normative reference: RFC 2326 (Obsoleted by RFC 7826)

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
     RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Obsolete normative reference: RFC 4572 (Obsoleted by RFC 8122)

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446)

  ** Obsolete normative reference: RFC 4288 (Obsoleted by RFC 6838)

  ** Downref: Normative reference to an Experimental RFC: RFC 2483

  ** Obsolete normative reference: RFC 3023 (Obsoleted by RFC 7303)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)

  -- Obsolete informational reference (is this intentional?): RFC 4395
     (Obsoleted by RFC 7595)

  -- Obsolete informational reference (is this intentional?): RFC 2818
     (Obsoleted by RFC 9110)


     Summary: 10 errors (**), 0 flaws (~~), 5 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SPEECHSC                                                      D. Burnett
3	Internet-Draft                                                     Voxeo
4	Intended status: Standards Track                           S. Shanmugham
5	Expires: March 1, 2013                               Cisco Systems, Inc.
6	                                                         August 28, 2012

8	           Media Resource Control Protocol Version 2 (MRCPv2)
9	                     draft-ietf-speechsc-mrcpv2-28

11	Abstract

13	   The MRCPv2 protocol allows client hosts to control media service
14	   resources such as speech synthesizers, recognizers, verifiers and
15	   identifiers residing in servers on the network.  MRCPv2 is not a
16	   "stand-alone" protocol - it relies on other protocols, such as
17	   Session Initiation Protocol (SIP), to rendezvous MRCPv2 clients and
18	   servers and manage sessions between them, and the Session Description
19	   Protocol (SDP) to describe, discover and exchange capabilities.  It
20	   also depends on SIP and SDP to establish the media sessions and
21	   associated parameters between the media source or sink and the media
22	   server.  Once this is done, the MRCPv2 protocol exchange operates
23	   over the control session established above, allowing the client to
24	   control the media processing resources on the speech resource server.

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on March 1, 2013.

43	Copyright Notice

45	   Copyright (c) 2012 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	   This document may contain material from IETF Documents or IETF
59	   Contributions published or made publicly available before November
60	   10, 2008.  The person(s) controlling the copyright in some of this
61	   material may not have granted the IETF Trust the right to allow
62	   modifications of such material outside the IETF Standards Process.
63	   Without obtaining an adequate license from the person(s) controlling
64	   the copyright in such materials, this document may not be modified
65	   outside the IETF Standards Process, and derivative works of it may
66	   not be created outside the IETF Standards Process, except to format
67	   it for publication as an RFC or to translate it into languages other
68	   than English.

70	Table of Contents

72	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   9
73	   2.  Document Conventions  . . . . . . . . . . . . . . . . . . . .  10
74	     2.1.   Definitions  . . . . . . . . . . . . . . . . . . . . . .  10
75	     2.2.   State-Machine Diagrams . . . . . . . . . . . . . . . . .  11
76	     2.3.   URI Schemes  . . . . . . . . . . . . . . . . . . . . . .  11
77	   3.  Architecture  . . . . . . . . . . . . . . . . . . . . . . . .  12
78	     3.1.   MRCPv2 Media Resource Types  . . . . . . . . . . . . . .  13
79	     3.2.   Server and Resource Addressing . . . . . . . . . . . . .  14
80	   4.  MRCPv2 Protocol Basics  . . . . . . . . . . . . . . . . . . .  15
81	     4.1.   Connecting to the Server . . . . . . . . . . . . . . . .  15
82	     4.2.   Managing Resource Control Channels . . . . . . . . . . .  15
83	     4.3.   SIP session example  . . . . . . . . . . . . . . . . . .  18
84	     4.4.   Media Streams and RTP Ports  . . . . . . . . . . . . . .  23
85	     4.5.   MRCPv2 Message Transport . . . . . . . . . . . . . . . .  24
86	     4.6.   MRCPv2 Session Termination . . . . . . . . . . . . . . .  25
87	   5.  MRCPv2 Specification  . . . . . . . . . . . . . . . . . . . .  25
88	     5.1.   Common Protocol Elements . . . . . . . . . . . . . . . .  25
89	     5.2.   Request  . . . . . . . . . . . . . . . . . . . . . . . .  28
90	     5.3.   Response . . . . . . . . . . . . . . . . . . . . . . . .  29
91	     5.4.   Status Codes . . . . . . . . . . . . . . . . . . . . . .  30
92	     5.5.   Events . . . . . . . . . . . . . . . . . . . . . . . . .  31
93	   6.  MRCPv2 Generic Methods, Headers, and Result Structure . . . .  32
94	     6.1.   Generic Methods  . . . . . . . . . . . . . . . . . . . .  33
95	       6.1.1.   SET-PARAMS . . . . . . . . . . . . . . . . . . . . .  33
96	       6.1.2.   GET-PARAMS . . . . . . . . . . . . . . . . . . . . .  34
97	     6.2.   Generic Message Headers  . . . . . . . . . . . . . . . .  35
98	       6.2.1.   Channel-Identifier . . . . . . . . . . . . . . . . .  36
99	       6.2.2.   Accept . . . . . . . . . . . . . . . . . . . . . . .  36
100	       6.2.3.   Active-Request-Id-List . . . . . . . . . . . . . . .  37
101	       6.2.4.   Proxy-Sync-Id  . . . . . . . . . . . . . . . . . . .  37
102	       6.2.5.   Accept-Charset . . . . . . . . . . . . . . . . . . .  38
103	       6.2.6.   Content-Type . . . . . . . . . . . . . . . . . . . .  38
104	       6.2.7.   Content-ID . . . . . . . . . . . . . . . . . . . . .  38
105	       6.2.8.   Content-Base . . . . . . . . . . . . . . . . . . . .  39
106	       6.2.9.   Content-Encoding . . . . . . . . . . . . . . . . . .  39
107	       6.2.10.  Content-Location . . . . . . . . . . . . . . . . . .  39
108	       6.2.11.  Content-Length . . . . . . . . . . . . . . . . . . .  40
109	       6.2.12.  Fetch Timeout  . . . . . . . . . . . . . . . . . . .  40
110	       6.2.13.  Cache-Control  . . . . . . . . . . . . . . . . . . .  41
111	       6.2.14.  Logging-Tag  . . . . . . . . . . . . . . . . . . . .  42
112	       6.2.15.  Set-Cookie . . . . . . . . . . . . . . . . . . . . .  42
113	       6.2.16.  Vendor Specific Parameters . . . . . . . . . . . . .  45
114	     6.3.   Generic Result Structure . . . . . . . . . . . . . . . .  46
115	       6.3.1.   Natural Language Semantics Markup Language . . . . .  47
116	   7.  Resource Discovery  . . . . . . . . . . . . . . . . . . . . .  47
117	   8.  Speech Synthesizer Resource . . . . . . . . . . . . . . . . .  49
118	     8.1.   Synthesizer State Machine  . . . . . . . . . . . . . . .  49
119	     8.2.   Synthesizer Methods  . . . . . . . . . . . . . . . . . .  50
120	     8.3.   Synthesizer Events . . . . . . . . . . . . . . . . . . .  50
121	     8.4.   Synthesizer Header Fields  . . . . . . . . . . . . . . .  51
122	       8.4.1.   Jump-Size  . . . . . . . . . . . . . . . . . . . . .  51
123	       8.4.2.   Kill-On-Barge-In . . . . . . . . . . . . . . . . . .  52
124	       8.4.3.   Speaker Profile  . . . . . . . . . . . . . . . . . .  53
125	       8.4.4.   Completion Cause . . . . . . . . . . . . . . . . . .  53
126	       8.4.5.   Completion Reason  . . . . . . . . . . . . . . . . .  54
127	       8.4.6.   Voice-Parameter  . . . . . . . . . . . . . . . . . .  54
128	       8.4.7.   Prosody-Parameters . . . . . . . . . . . . . . . . .  55
129	       8.4.8.   Speech Marker  . . . . . . . . . . . . . . . . . . .  55
130	       8.4.9.   Speech Language  . . . . . . . . . . . . . . . . . .  56
131	       8.4.10.  Fetch Hint . . . . . . . . . . . . . . . . . . . . .  56
132	       8.4.11.  Audio Fetch Hint . . . . . . . . . . . . . . . . . .  57
133	       8.4.12.  Failed URI . . . . . . . . . . . . . . . . . . . . .  57
134	       8.4.13.  Failed URI Cause . . . . . . . . . . . . . . . . . .  57
135	       8.4.14.  Speak Restart  . . . . . . . . . . . . . . . . . . .  57
136	       8.4.15.  Speak Length . . . . . . . . . . . . . . . . . . . .  58
137	       8.4.16.  Load-Lexicon . . . . . . . . . . . . . . . . . . . .  58
138	       8.4.17.  Lexicon-Search-Order . . . . . . . . . . . . . . . .  58
139	     8.5.   Synthesizer Message Body . . . . . . . . . . . . . . . .  59
140	       8.5.1.   Synthesizer Speech Data  . . . . . . . . . . . . . .  59
141	       8.5.2.   Lexicon Data . . . . . . . . . . . . . . . . . . . .  61
142	     8.6.   SPEAK Method . . . . . . . . . . . . . . . . . . . . . .  62
143	     8.7.   STOP . . . . . . . . . . . . . . . . . . . . . . . . . .  64
144	     8.8.   BARGE-IN-OCCURRED  . . . . . . . . . . . . . . . . . . .  65
145	     8.9.   PAUSE  . . . . . . . . . . . . . . . . . . . . . . . . .  67
146	     8.10.  RESUME . . . . . . . . . . . . . . . . . . . . . . . . .  68
147	     8.11.  CONTROL  . . . . . . . . . . . . . . . . . . . . . . . .  70
148	     8.12.  SPEAK-COMPLETE . . . . . . . . . . . . . . . . . . . . .  72
149	     8.13.  SPEECH-MARKER  . . . . . . . . . . . . . . . . . . . . .  73
150	     8.14.  DEFINE-LEXICON . . . . . . . . . . . . . . . . . . . . .  75
151	   9.  Speech Recognizer Resource  . . . . . . . . . . . . . . . . .  75
152	     9.1.   Recognizer State Machine . . . . . . . . . . . . . . . .  77
153	     9.2.   Recognizer Methods . . . . . . . . . . . . . . . . . . .  77
154	     9.3.   Recognizer Events  . . . . . . . . . . . . . . . . . . .  78
155	     9.4.   Recognizer Header Fields . . . . . . . . . . . . . . . .  78
156	       9.4.1.   Confidence Threshold . . . . . . . . . . . . . . . .  80
157	       9.4.2.   Sensitivity Level  . . . . . . . . . . . . . . . . .  80
158	       9.4.3.   Speed Vs Accuracy  . . . . . . . . . . . . . . . . .  81
159	       9.4.4.   N Best List Length . . . . . . . . . . . . . . . . .  81
160	       9.4.5.   Input Type . . . . . . . . . . . . . . . . . . . . .  81
161	       9.4.6.   No Input Timeout . . . . . . . . . . . . . . . . . .  82
162	       9.4.7.   Recognition Timeout  . . . . . . . . . . . . . . . .  82
163	       9.4.8.   Waveform URI . . . . . . . . . . . . . . . . . . . .  82
164	       9.4.9.   Media Type . . . . . . . . . . . . . . . . . . . . .  83
165	       9.4.10.  Input-Waveform-URI . . . . . . . . . . . . . . . . .  83
166	       9.4.11.  Completion Cause . . . . . . . . . . . . . . . . . .  83
167	       9.4.12.  Completion Reason  . . . . . . . . . . . . . . . . .  85
168	       9.4.13.  Recognizer Context Block . . . . . . . . . . . . . .  85
169	       9.4.14.  Start Input Timers . . . . . . . . . . . . . . . . .  86
170	       9.4.15.  Speech Complete Timeout  . . . . . . . . . . . . . .  86
171	       9.4.16.  Speech Incomplete Timeout  . . . . . . . . . . . . .  87
172	       9.4.17.  DTMF Interdigit Timeout  . . . . . . . . . . . . . .  87
173	       9.4.18.  DTMF Term Timeout  . . . . . . . . . . . . . . . . .  88
174	       9.4.19.  DTMF-Term-Char . . . . . . . . . . . . . . . . . . .  88
175	       9.4.20.  Failed URI . . . . . . . . . . . . . . . . . . . . .  88
176	       9.4.21.  Failed URI Cause . . . . . . . . . . . . . . . . . .  88
177	       9.4.22.  Save Waveform  . . . . . . . . . . . . . . . . . . .  89
178	       9.4.23.  New Audio Channel  . . . . . . . . . . . . . . . . .  89
179	       9.4.24.  Speech-Language  . . . . . . . . . . . . . . . . . .  89
180	       9.4.25.  Ver-Buffer-Utterance . . . . . . . . . . . . . . . .  90
181	       9.4.26.  Recognition-Mode . . . . . . . . . . . . . . . . . .  90
182	       9.4.27.  Cancel-If-Queue  . . . . . . . . . . . . . . . . . .  90
183	       9.4.28.  Hotword-Max-Duration . . . . . . . . . . . . . . . .  91
184	       9.4.29.  Hotword-Min-Duration . . . . . . . . . . . . . . . .  91
185	       9.4.30.  Interpret-Text . . . . . . . . . . . . . . . . . . .  91
186	       9.4.31.  DTMF-Buffer-Time . . . . . . . . . . . . . . . . . .  92
187	       9.4.32.  Clear-DTMF-Buffer  . . . . . . . . . . . . . . . . .  92
188	       9.4.33.  Early-No-Match . . . . . . . . . . . . . . . . . . .  92
189	       9.4.34.  Num-Min-Consistent-Pronunciations  . . . . . . . . .  92
190	       9.4.35.  Consistency-Threshold  . . . . . . . . . . . . . . .  93
191	       9.4.36.  Clash-Threshold  . . . . . . . . . . . . . . . . . .  93
192	       9.4.37.  Personal-Grammar-URI . . . . . . . . . . . . . . . .  93
193	       9.4.38.  Enroll-Utterance . . . . . . . . . . . . . . . . . .  94
194	       9.4.39.  Phrase-Id  . . . . . . . . . . . . . . . . . . . . .  94
195	       9.4.40.  Phrase-NL  . . . . . . . . . . . . . . . . . . . . .  94
196	       9.4.41.  Weight . . . . . . . . . . . . . . . . . . . . . . .  94
197	       9.4.42.  Save-Best-Waveform . . . . . . . . . . . . . . . . .  95
198	       9.4.43.  New-Phrase-Id  . . . . . . . . . . . . . . . . . . .  95
199	       9.4.44.  Confusable-Phrases-URI . . . . . . . . . . . . . . .  95
200	       9.4.45.  Abort-Phrase-Enrollment  . . . . . . . . . . . . . .  95
201	     9.5.   Recognizer Message Body  . . . . . . . . . . . . . . . .  96
202	       9.5.1.   Recognizer Grammar Data  . . . . . . . . . . . . . .  96
203	       9.5.2.   Recognizer Result Data . . . . . . . . . . . . . . . 100
204	       9.5.3.   Enrollment Result Data . . . . . . . . . . . . . . . 101
205	       9.5.4.   Recognizer Context Block . . . . . . . . . . . . . . 101
206	     9.6.   Recognizer Results . . . . . . . . . . . . . . . . . . . 101
207	       9.6.1.   Markup Functions . . . . . . . . . . . . . . . . . . 102
208	       9.6.2.   Overview of Recognizer Result Elements and their
209	                Relationships  . . . . . . . . . . . . . . . . . . . 103
210	       9.6.3.   Elements and Attributes  . . . . . . . . . . . . . . 103
211	     9.7.   Enrollment Results . . . . . . . . . . . . . . . . . . . 108
212	       9.7.1.   NUM-CLASHES Element  . . . . . . . . . . . . . . . . 108
213	       9.7.2.   NUM-GOOD-REPETITIONS Element . . . . . . . . . . . . 109
214	       9.7.3.   NUM-REPETITIONS-STILL-NEEDED Element . . . . . . . . 109
215	       9.7.4.   CONSISTENCY-STATUS Element . . . . . . . . . . . . . 109
216	       9.7.5.   CLASH-PHRASE-IDS Element . . . . . . . . . . . . . . 109
217	       9.7.6.   TRANSCRIPTIONS Element . . . . . . . . . . . . . . . 109
218	       9.7.7.   CONFUSABLE-PHRASES Element . . . . . . . . . . . . . 109
219	     9.8.   DEFINE-GRAMMAR . . . . . . . . . . . . . . . . . . . . . 109
220	     9.9.   RECOGNIZE  . . . . . . . . . . . . . . . . . . . . . . . 113
221	     9.10.  STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 119
222	     9.11.  GET-RESULT . . . . . . . . . . . . . . . . . . . . . . . 120
223	     9.12.  START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 121
224	     9.13.  START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 122
225	     9.14.  RECOGNITION-COMPLETE . . . . . . . . . . . . . . . . . . 122
226	     9.15.  START-PHRASE-ENROLLMENT  . . . . . . . . . . . . . . . . 124
227	     9.16.  ENROLLMENT-ROLLBACK  . . . . . . . . . . . . . . . . . . 125
228	     9.17.  END-PHRASE-ENROLLMENT  . . . . . . . . . . . . . . . . . 126
229	     9.18.  MODIFY-PHRASE  . . . . . . . . . . . . . . . . . . . . . 126
230	     9.19.  DELETE-PHRASE  . . . . . . . . . . . . . . . . . . . . . 127
231	     9.20.  INTERPRET  . . . . . . . . . . . . . . . . . . . . . . . 127
232	     9.21.  INTERPRETATION-COMPLETE  . . . . . . . . . . . . . . . . 128
233	     9.22.  DTMF Detection . . . . . . . . . . . . . . . . . . . . . 130
234	   10. Recorder Resource . . . . . . . . . . . . . . . . . . . . . . 130
235	     10.1.  Recorder State Machine . . . . . . . . . . . . . . . . . 131
236	     10.2.  Recorder Methods . . . . . . . . . . . . . . . . . . . . 131
237	     10.3.  Recorder Events  . . . . . . . . . . . . . . . . . . . . 131
238	     10.4.  Recorder Header Fields . . . . . . . . . . . . . . . . . 131
239	       10.4.1.  Sensitivity Level  . . . . . . . . . . . . . . . . . 132
240	       10.4.2.  No Input Timeout . . . . . . . . . . . . . . . . . . 132
241	       10.4.3.  Completion Cause . . . . . . . . . . . . . . . . . . 132
242	       10.4.4.  Completion Reason  . . . . . . . . . . . . . . . . . 133
243	       10.4.5.  Failed URI . . . . . . . . . . . . . . . . . . . . . 133
244	       10.4.6.  Failed URI Cause . . . . . . . . . . . . . . . . . . 134
245	       10.4.7.  Record URI . . . . . . . . . . . . . . . . . . . . . 134
246	       10.4.8.  Media Type . . . . . . . . . . . . . . . . . . . . . 134
247	       10.4.9.  Max Time . . . . . . . . . . . . . . . . . . . . . . 135
248	       10.4.10. Trim-Length  . . . . . . . . . . . . . . . . . . . . 135
249	       10.4.11. Final Silence  . . . . . . . . . . . . . . . . . . . 135
250	       10.4.12. Capture On Speech  . . . . . . . . . . . . . . . . . 135
251	       10.4.13. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 136
252	       10.4.14. Start Input Timers . . . . . . . . . . . . . . . . . 136
253	       10.4.15. New Audio Channel  . . . . . . . . . . . . . . . . . 136
254	     10.5.  Recorder Message Body  . . . . . . . . . . . . . . . . . 136
255	     10.6.  RECORD . . . . . . . . . . . . . . . . . . . . . . . . . 137
256	     10.7.  STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 138
257	     10.8.  RECORD-COMPLETE  . . . . . . . . . . . . . . . . . . . . 139
258	     10.9.  START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 140
259	     10.10. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 140
260	   11. Speaker Verification and Identification . . . . . . . . . . . 141
261	     11.1.  Speaker Verification State Machine . . . . . . . . . . . 142
262	     11.2.  Speaker Verification Methods . . . . . . . . . . . . . . 144
263	     11.3.  Verification Events  . . . . . . . . . . . . . . . . . . 145
264	     11.4.  Verification Header Fields . . . . . . . . . . . . . . . 145
265	       11.4.1.  Repository-URI . . . . . . . . . . . . . . . . . . . 146
266	       11.4.2.  Voiceprint-Identifier  . . . . . . . . . . . . . . . 146
267	       11.4.3.  Verification-Mode  . . . . . . . . . . . . . . . . . 147
268	       11.4.4.  Adapt-Model  . . . . . . . . . . . . . . . . . . . . 148
269	       11.4.5.  Abort-Model  . . . . . . . . . . . . . . . . . . . . 148
270	       11.4.6.  Min-Verification-Score . . . . . . . . . . . . . . . 148
271	       11.4.7.  Num-Min-Verification-Phrases . . . . . . . . . . . . 148
272	       11.4.8.  Num-Max-Verification-Phrases . . . . . . . . . . . . 149
273	       11.4.9.  No-Input-Timeout . . . . . . . . . . . . . . . . . . 149
274	       11.4.10. Save-Waveform  . . . . . . . . . . . . . . . . . . . 149
275	       11.4.11. Media Type . . . . . . . . . . . . . . . . . . . . . 150
276	       11.4.12. Waveform-URI . . . . . . . . . . . . . . . . . . . . 150
277	       11.4.13. Voiceprint-Exists  . . . . . . . . . . . . . . . . . 150
278	       11.4.14. Ver-Buffer-Utterance . . . . . . . . . . . . . . . . 151
279	       11.4.15. Input-Waveform-Uri . . . . . . . . . . . . . . . . . 151
280	       11.4.16. Completion-Cause . . . . . . . . . . . . . . . . . . 151
281	       11.4.17. Completion Reason  . . . . . . . . . . . . . . . . . 153
282	       11.4.18. Speech Complete Timeout  . . . . . . . . . . . . . . 153
283	       11.4.19. New Audio Channel  . . . . . . . . . . . . . . . . . 153
284	       11.4.20. Abort-Verification . . . . . . . . . . . . . . . . . 153
285	       11.4.21. Start Input Timers . . . . . . . . . . . . . . . . . 153
286	     11.5.  Verification Message Body  . . . . . . . . . . . . . . . 154
287	       11.5.1.  Verification Result Data . . . . . . . . . . . . . . 154
288	       11.5.2.  Verification Result Elements . . . . . . . . . . . . 154
289	     11.6.  START-SESSION  . . . . . . . . . . . . . . . . . . . . . 158
290	     11.7.  END-SESSION  . . . . . . . . . . . . . . . . . . . . . . 159
291	     11.8.  QUERY-VOICEPRINT . . . . . . . . . . . . . . . . . . . . 160
292	     11.9.  DELETE-VOICEPRINT  . . . . . . . . . . . . . . . . . . . 161
293	     11.10. VERIFY . . . . . . . . . . . . . . . . . . . . . . . . . 162
294	     11.11. VERIFY-FROM-BUFFER . . . . . . . . . . . . . . . . . . . 162
295	     11.12. VERIFY-ROLLBACK  . . . . . . . . . . . . . . . . . . . . 165
296	     11.13. STOP . . . . . . . . . . . . . . . . . . . . . . . . . . 165
297	     11.14. START-INPUT-TIMERS . . . . . . . . . . . . . . . . . . . 166
298	     11.15. VERIFICATION-COMPLETE  . . . . . . . . . . . . . . . . . 167
299	     11.16. START-OF-INPUT . . . . . . . . . . . . . . . . . . . . . 167
300	     11.17. CLEAR-BUFFER . . . . . . . . . . . . . . . . . . . . . . 168
301	     11.18. GET-INTERMEDIATE-RESULT  . . . . . . . . . . . . . . . . 168
302	   12. Security Considerations . . . . . . . . . . . . . . . . . . . 169
303	     12.1.  Rendezvous and Session Establishment . . . . . . . . . . 170
304	     12.2.  Control channel protection . . . . . . . . . . . . . . . 170
305	     12.3.  Media session protection . . . . . . . . . . . . . . . . 170
306	     12.4.  Indirect Content Access  . . . . . . . . . . . . . . . . 171
307	     12.5.  Protection of stored media . . . . . . . . . . . . . . . 172
308	     12.6.  DTMF and recognition buffers . . . . . . . . . . . . . . 172
309	     12.7.  Client-set server parameters . . . . . . . . . . . . . . 172
310	     12.8.  DELETE-VOICEPRINT and authorization  . . . . . . . . . . 172
311	   13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 173
312	     13.1.  New registries . . . . . . . . . . . . . . . . . . . . . 173
313	       13.1.1.  MRCPv2 resource types  . . . . . . . . . . . . . . . 173
314	       13.1.2.  MRCPv2 methods and events  . . . . . . . . . . . . . 173
315	       13.1.3.  MRCPv2 header fields . . . . . . . . . . . . . . . . 175
316	       13.1.4.  MRCPv2 status codes  . . . . . . . . . . . . . . . . 177
317	       13.1.5.  Grammar Reference List Parameters  . . . . . . . . . 177
318	       13.1.6.  MRCPv2 vendor-specific parameters  . . . . . . . . . 178
319	     13.2.  NLSML-related registrations  . . . . . . . . . . . . . . 178
320	       13.2.1.  application/nlsml+xml Media Type registration  . . . 178
321	     13.3.  NLSML XML Schema registration  . . . . . . . . . . . . . 179
322	     13.4.  MRCPv2 XML Namespace registration  . . . . . . . . . . . 179
323	     13.5.  text Media Type Registrations  . . . . . . . . . . . . . 179
324	       13.5.1.  text/grammar-ref-list  . . . . . . . . . . . . . . . 180
325	     13.6.  session URI scheme registration  . . . . . . . . . . . . 180
326	     13.7.  SDP parameter registrations  . . . . . . . . . . . . . . 181
327	       13.7.1.  sub-registry "proto" . . . . . . . . . . . . . . . . 182
328	       13.7.2.  sub-registry "att-field (media-level)" . . . . . . . 182
329	   14. Examples  . . . . . . . . . . . . . . . . . . . . . . . . . . 183
330	     14.1.  Message Flow . . . . . . . . . . . . . . . . . . . . . . 183
331	     14.2.  Recognition Result Examples  . . . . . . . . . . . . . . 193
332	       14.2.1.  Simple ASR Ambiguity . . . . . . . . . . . . . . . . 193
333	       14.2.2.  Mixed Initiative . . . . . . . . . . . . . . . . . . 194
334	       14.2.3.  DTMF Input . . . . . . . . . . . . . . . . . . . . . 195
335	       14.2.4.  Interpreting Meta-Dialog and Meta-Task Utterances  . 195
336	       14.2.5.  Anaphora and Deixis  . . . . . . . . . . . . . . . . 196
337	       14.2.6.  Distinguishing Individual Items from Sets with
338	                One Member . . . . . . . . . . . . . . . . . . . . . 197
339	       14.2.7.  Extensibility  . . . . . . . . . . . . . . . . . . . 198
340	   15. ABNF Normative Definition . . . . . . . . . . . . . . . . . . 198
341	   16. XML Schemas . . . . . . . . . . . . . . . . . . . . . . . . . 213
342	     16.1.  NLSML Schema Definition  . . . . . . . . . . . . . . . . 213
343	     16.2.  Enrollment Results Schema Definition . . . . . . . . . . 214
344	     16.3.  Verification Results Schema Definition . . . . . . . . . 216
345	   17. References  . . . . . . . . . . . . . . . . . . . . . . . . . 219
346	     17.1.  Normative References . . . . . . . . . . . . . . . . . . 219
347	     17.2.  Informative References . . . . . . . . . . . . . . . . . 222
348	   Appendix A.  Contributors . . . . . . . . . . . . . . . . . . . . 224
349	   Appendix B.  Acknowledgements . . . . . . . . . . . . . . . . . . 225
350	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . 225

352	1.  Introduction

354	   The MRCPv2 protocol is designed to allow a client device to control
355	   media processing resources on the network.  Some of these media
356	   processing resources include speech recognition engines, speech
357	   synthesis engines, speaker verification and speaker identification
358	   engines.  MRCPv2 enables the implementation of distributed
359	   Interactive Voice Response platforms using VoiceXML
360	   [W3C.REC-voicexml20-20040316] browsers or other client applications
361	   while maintaining separate back-end speech processing capabilities on
362	   specialized speech processing servers.  MRCPv2 is based on the
363	   earlier Media Resource Control Protocol (MRCP) [RFC4463] developed
364	   jointly by Cisco Systems, Inc., Nuance Communications, and
365	   Speechworks Inc. Although some of the method names are similar, the
366	   way in which these methods are communicated is different.  There are
367	   also more resources and more methods for each resource.  The first
368	   version of MRCP was essentially taken only as input to the
369	   development of this protocol.  There is no expectation that an MRCPv2
370	   client will work with an MRCPv1 server or vice versa.  There is no
371	   migration plan or gateway definition between the two protocols.

373	   The protocol requirements of SPEECHSC [RFC4313] include that the
374	   solution be capable of reaching a media processing server and setting
375	   up communication channels to the media resources, and sending and
376	   receiving control messages and media streams to/from the server.  The
377	   Session Initiation Protocol (SIP) [RFC3261] meets these requirements.

379	   The proprietary version of MRCP ran over the Real Time Streaming
380	   Protocol (RTSP) [RFC2326].  At the time work on MRCPv2 was begun, the
381	   consensus was that this use of RTSP would break the RTSP protocol or
382	   cause backward-compatibility problems, something forbidden by Section
383	   3.2 of the above mentioned requirements document.  This is the reason
384	   why MRCPv2 does not run over RTSP.

386	   MRCPv2 leverages these capabilities by building upon SIP and the
387	   Session Description Protocol (SDP) [RFC4566].  MRCPv2 uses SIP to
388	   setup and tear down media and control sessions with the server.  In
389	   addition, the client can use a SIP re-INVITE method (an INVITE dialog
390	   sent within an existing SIP Session) to change the characteristics of
391	   these media and control session while maintaining the SIP dialog
392	   between the client and server.  SDP is used to describe the
393	   parameters of the media sessions associated with that dialog.  It is
394	   mandatory to support SIP as the session establishment protocol to
395	   ensure interoperability.  Other protocols can be used for session
396	   establishment by prior agreement.  This document only describes the
397	   use of SIP and SDP.

399	   MRCPv2 uses SIP and SDP to create the speech client/server dialog and
400	   set up the media channels to the server.  It also uses SIP and SDP to
401	   establish MRCPv2 control sessions between the client and the server
402	   for each media processing resource required for that dialog.  The
403	   MRCPv2 protocol exchange between the client and the media resource is
404	   carried on that control session.  MRCPv2 protocol exchanges do not
405	   change the state of the SIP dialog, the media sessions, or other
406	   parameters of the dialog initiated via SIP.  It controls and affects
407	   the state of the media processing resource associated with the MRCPv2
408	   session(s).

410	   MRCPv2 defines the messages to control the different media processing
411	   resources and the state machines required to guide their operation.
412	   It also describes how these messages are carried over a transport
413	   layer protocol such as the Transmission Control Protocol (TCP)
414	   [RFC0793] or the Transport Layer Security (TLS) Protocol [RFC5246]
415	   (Note: the Stream Control Transmission Protocol (SCTP) [RFC4960] is a
416	   viable transport for MRCPv2 as well, but the mapping onto SCTP is not
417	   described in this specification).

419	2.  Document Conventions

421	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
422	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
423	   document are to be interpreted as described in RFC 2119 [RFC2119].

425	   Since many of the definitions and syntax are identical to those for
426	   the HTTP/1.1 (Hypertext Transfer Protocol (HTTP/1.1) [RFC2616]), this
427	   specification refers to the section where they are defined rather
428	   than copying it.  For brevity, [HX.Y] is to be taken to refer to
429	   Section X.Y of RFC 2616.

431	   All the mechanisms specified in this document are described in both
432	   prose and an augmented Backus-Naur form (ABNF [RFC5234]).

434	   The complete message format in ABNF form is provided in Section 15
435	   and is the normative format definition.  Note that productions may be
436	   duplicated within the main body of the document for reading
437	   convenience.  If a production in the body of the text conflicts with
438	   one in the normative definition, the latter rules.

440	2.1.  Definitions

442	   Media Resource
443	                  An entity on the speech processing server that can be
444	                  controlled through the MRCPv2 protocol.

446	   MRCP Server
447	                  Aggregate of one or more "Media Resource" entities on
448	                  a Server, exposed through the MRCPv2 protocol
449	                  ("Server" for short).
450	   MRCP Client
451	                  An entity controlling one or more Media Resources
452	                  through the MRCPv2 protocol ("Client" for short).
453	   DTMF
454	                  Dual Tone Multi-Frequency; a method of transmitting
455	                  key presses in-band, either as actual tones (Q.23
456	                  [Q.23]) or as named tone events (RFC 4733 [RFC4733]).
457	   Endpointing
458	                  The process of automatically detecting the beginning
459	                  and end of speech in an audio stream.  This is
460	                  critical both for speech recognition and for automated
461	                  recording as one would find in voice mail systems.
462	   Hotword Mode
463	                  A mode of speech recognition where a stream of
464	                  utterances is evaluated for match against a small set
465	                  of command words.  This is generally employed to
466	                  either trigger some action, or to control the
467	                  subsequent grammar to be used for further recognition

469	2.2.  State-Machine Diagrams

471	   The state-machine diagrams in this document do not show every
472	   possible method call.  Rather, they reflect the state of the resource
473	   based on the methods that have moved to IN-PROGRESS or COMPLETE
474	   states (see Section 5.3).  Note that since PENDING requests
475	   essentially have not affected the resource yet and are in queue to be
476	   processed, they are not reflected in the state-machine diagrams.

478	2.3.  URI Schemes

480	   This document defines many protocol headers that contain URIs
481	   (Uniform Resource Identifier (URI) [RFC3986]) or lists of URIs for
482	   referencing media.  The entire document, including the Security
483	   Considerations section (Section 12), assumes that HTTP or HTTP over
484	   TLS (HTTPS) [RFC2818] will be used as the URI addressing scheme
485	   unless otherwise stated.  However, implementations MAY support other
486	   schemes (such as "file") provided they have addressed any security
487	   considerations described in this document and any others particular
488	   to the specific scheme.  For example, implementations where the
489	   client and server both reside on the same physical hardware and the
490	   file system is secured by traditional user-level file access controls
491	   could be reasonable candidates for supporting the "file" scheme.

493	3.  Architecture

495	   A system using MRCPv2 consists of a client that requires the
496	   generation and/or consumption of media streams and a media resource
497	   server that has the resources or "engines" to process these streams
498	   as input or generate these streams as output.  The client uses SIP
499	   and SDP to establish an MRCPv2 control channel with the server to use
500	   its media processing resources.  MRCPv2 servers are addressed using
501	   SIP URIs.

503	   The session initiation protocol (SIP) uses SDP with the offer/answer
504	   model described in RFC3264 [RFC3264] to set up the MRCPv2 control
505	   channels and describe their characteristics.  A separate MRCPv2
506	   session is needed to control each of the media processing resources
507	   associated with the SIP dialog between the client and server.  Within
508	   a SIP dialog, the individual resource control channels for the
509	   different resources are added or removed through SDP offer/answer
510	   carried in a SIP re-INVITE transaction.

512	   The server, through the SDP exchange, provides the client with a
513	   difficult to guess, unambiguous channel identifier and a TCP port
514	   number (see Section 4.2).  The client MAY then open a new TCP
515	   connection with the server on this port number.  Multiple MRCPv2
516	   channels can share a TCP connection between the client and the
517	   server.  All MRCPv2 messages exchanged between the client and the
518	   server carry the specified channel identifier that the server MUST
519	   ensure is unambiguous among all MRCPv2 control channels that are
520	   active on that server.  The client uses this channel identifier to
521	   indicate the media processing resource associated with that channel.
522	   For information on message framing, see Section 5.

524	   The session initiation protocol (SIP) also establishes the media
525	   sessions between the client (or other source/sink of media) and the
526	   MRCPv2 server using SDP m-lines.  One or more media processing
527	   resources may share a media session under a SIP session, or each
528	   media processing resource may have its own media session.

530	   The following diagram shows the general architecture of a system that
531	   uses MRCPv2.  To simplify the diagram only a few resources are shown.

533	     MRCPv2 client                   MRCPv2 Media Resource Server
534	|--------------------|            |------------------------------------|
535	||------------------||            ||----------------------------------||
536	|| Application Layer||            ||Synthesis|Recognition|Verification||
537	||------------------||            || Engine  |  Engine   |   Engine   ||
538	||Media Resource API||            ||    ||   |    ||     |    ||      ||
539	||------------------||            ||Synthesis|Recognizer |  Verifier  ||
540	|| SIP  |  MRCPv2   ||            ||Resource | Resource  |  Resource  ||
541	||Stack |           ||            ||     Media Resource Management    ||
542	||      |           ||            ||----------------------------------||
543	||------------------||            ||   SIP  |        MRCPv2           ||
544	||   TCP/IP Stack   ||---MRCPv2---||  Stack |                         ||
545	||                  ||            ||----------------------------------||
546	||------------------||----SIP-----||           TCP/IP Stack           ||
547	|--------------------|            ||                                  ||
548	         |                        ||----------------------------------||
549	        SIP                       |------------------------------------|
550	         |                          /
551	|-------------------|             RTP
552	|                   |             /
553	| Media Source/Sink |------------/
554	|                   |
555	|-------------------|

557	                      Figure 1: Architectural Diagram

559	3.1.  MRCPv2 Media Resource Types

561	   An MRCPv2 server may offer one or more of the following media
562	   processing resources to its clients.
563	   Basic Synthesizer
564	                  A speech synthesizer resource with very limited
565	                  capabilities, that can generate its media stream
566	                  exclusively from concatenated audio clips.  The speech
567	                  data is described using a limited subset of the Speech
568	                  Synthesis Markup Language (SSML)
569	                  [W3C.REC-speech-synthesis-20040907] elements.  A basic
570	                  synthesizer MUST support the SSML tags <speak>,
571	                  <audio>, <say-as> and <mark>.
572	   Speech Synthesizer
573	                  A full capability speech synthesis resource capable of
574	                  rendering speech from text.  Such a synthesizer MUST
575	                  have full SSML [W3C.REC-speech-synthesis-20040907]
576	                  support.

578	   Recorder
579	                  A resource capable of recording audio and providing a
580	                  URI pointer to the recording.  A recorder MUST provide
581	                  endpointing capabilities for suppressing silence at
582	                  the beginning and end of a recording, and MAY also
583	                  suppress silence in the middle of a recording.  If
584	                  such suppression is done, the recorder MUST maintain
585	                  timing metadata to indicate the actual time stamps of
586	                  the recorded media.
587	   DTMF Recognizer
588	                  A recognition resource capable of extracting and
589	                  interpreting Dual-Tone Multi-Frequency (DTMF) [Q.23]
590	                  digits in a media stream and matching them against a
591	                  supplied digit grammar It could also do a semantic
592	                  interpretation based on semantic tags in the grammar.
593	   Speech Recognizer
594	                  A full speech recognition resource that is capable of
595	                  receiving a media stream containing audio and
596	                  interpreting it to recognition results.  It also has a
597	                  natural language semantic interpreter to post-process
598	                  the recognized data according to the semantic data in
599	                  the grammar and provide semantic results along with
600	                  the recognized input.  The recognizer MAY also support
601	                  enrolled grammars, where the client can enroll and
602	                  create new personal grammars for use in future
603	                  recognition operations.
604	   Speaker Verifier
605	                  A resource capable of verifying the authenticity of a
606	                  claimed identity by matching a media stream containing
607	                  spoken input to a pre-existing voiceprint.  This may
608	                  also involve matching the caller's voice against more
609	                  than one voiceprint, also called multi-verification or
610	                  speaker identification.

612	3.2.  Server and Resource Addressing

614	   The MRCPv2 server is a generic SIP server, and is thus addressed by a
615	   SIP URI (RFC 3261 [RFC3261]).

617	   For example:

619	        sip:mrcpv2@example.net   or
620	        sips:mrcpv2@example.net

622	4.  MRCPv2 Protocol Basics

624	   MRCPv2 requires a connection-oriented transport layer protocol such
625	   as TCP to guarantee reliable sequencing and delivery of MRCPv2
626	   control messages between the client and the server.  In order to meet
627	   the requirements for security enumerated in SpeechSC Requirements
628	   [RFC4313], clients and servers MUST implement TLS as well.  One or
629	   more connections between the client and the server can be shared
630	   among different MRCPv2 channels to the server.  The individual
631	   messages carry the channel identifier to differentiate messages on
632	   different channels.  MRCPv2 protocol encoding is text based with
633	   mechanisms to carry embedded binary data.  This allows arbitrary data
634	   like recognition grammars, recognition results, synthesizer speech
635	   markup etc. to be carried in MRCPv2 messages.  For information on
636	   message framing, see Section 5.

638	4.1.  Connecting to the Server

640	   MRCPv2 employs SIP, in conjunction with SDP, as the session
641	   establishment and management protocol.  The client reaches an MRCPv2
642	   server using conventional INVITE and other SIP requests for
643	   establishing, maintaining, and terminating SIP dialogs.  The SDP
644	   offer/answer exchange model over SIP is used to establish a resource
645	   control channel for each resource.  The SDP offer/answer exchange is
646	   also used to establish media sessions between the server and the
647	   source or sink of audio.

649	4.2.  Managing Resource Control Channels

651	   The client needs a separate MRCPv2 resource control channel to
652	   control each media processing resource under the SIP dialog.  A
653	   unique channel identifier string identifies these resource control
654	   channels.  The channel identifier is a difficult-to-guess,
655	   unambiguous string followed by an "@", then by a string token
656	   specifying the type of resource.  The server generates the channel
657	   identifier and MUST make sure it does not clash with the identifier
658	   of any other MRCP channel currently allocated by that server.  MRCPv2
659	   defines the following IANA-registered types of media processing
660	   resources.  Additional resource types, their associated methods/
661	   events and state machines may be added as described below in
662	   Section 13.

664	          +---------------+----------------------+--------------+
665	          | Resource Type | Resource Description | Described in |
666	          +---------------+----------------------+--------------+
667	          | speechrecog   | Speech Recognizer    | Section 9    |
668	          | dtmfrecog     | DTMF Recognizer      | Section 9    |
669	          | speechsynth   | Speech Synthesizer   | Section 8    |
670	          | basicsynth    | Basic Synthesizer    | Section 8    |
671	          | speakverify   | Speaker Verification | Section 11   |
672	          | recorder      | Speech Recorder      | Section 10   |
673	          +---------------+----------------------+--------------+

675	                          Table 1: Resource Types

677	   The SIP INVITE or re-INVITE transaction and the SDP offer/answer
678	   exchange it carries contain m-lines describing the resource control
679	   channel to be allocated.  There MUST be one SDP m-line for each
680	   MRCPv2 resource to be used in the session.  This m-line MUST have a
681	   media type field of "application" and a transport type field of
682	   either "TCP/MRCPv2" or "TCP/TLS/MRCPv2".  The port number field of
683	   the m-line MUST contain the "discard" port of the transport protocol
684	   (port 9 for TCP) in the SDP offer from the client and MUST contain
685	   the TCP listen port on the server in the SDP answer.  The client may
686	   then either set up a TCP or TLS connection to that server port or
687	   share an already established connection to that port.  Since MRCPv2
688	   allows multiple sessions to share the same TCP connection, multiple
689	   m-lines in a single SDP document MAY share the same port field value;
690	   MRCPv2 servers MUST NOT assume any relationship between resources
691	   using the same port other than the sharing of the communication
692	   channel.

694	   MRCPv2 resources do not use the port or format field of the m-line to
695	   distinguish themselves from other resources using the same channel.
696	   The client MUST specify the resource type identifier in the resource
697	   attribute associated with the control m-line of the SDP offer.  The
698	   server MUST respond with the full Channel-Identifier (which includes
699	   the resource type identifier and a difficult-to-guess, unambiguous
700	   string) in the "channel" attribute associated with the control m-line
701	   of the SDP answer.  To remain backwards compatible with conventional
702	   SDP usage, the format field of the m-line MUST have the arbitrarily-
703	   selected value of "1".

705	   When the client wants to add a media processing resource to the
706	   session, it issues a new SDP offer, according to the procedures of
707	   RFC 3264 [RFC3264], in a SIP re-INVITE request.  The SDP offer/answer
708	   exchange carried by this SIP transaction contains one or more
709	   additional control m-lines for the new resources to be allocated to
710	   the session.  The server, on seeing the new m-line, allocates the
711	   resources (if they are available) and responds with a corresponding
712	   control m-line in the SDP answer carried in the SIP response.  If the
713	   new resources are not available, the re-INVITE receives an error
714	   message, and existing media processing going on before the re-INVITE
715	   will continue as it was before.  It is not possible to allocate more
716	   than one resource of each type.  If a client requests more than one
717	   resource of any type, the server MUST behave as if the resources of
718	   that type beyond the first are not available.

720	   MRCPv2 clients and servers using TCP as a transport protocol MUST use
721	   the procedures specified in RFC 4145 [RFC4145] for setting up the TCP
722	   connection, with the considerations described hereby.  Similarly,
723	   MRCPv2 clients and servers using TCP/TLS as a transport protocol MUST
724	   use the procedures specified in RFC 4572 [RFC4572] for setting up the
725	   TLS connection, with the considerations described hereby.  The
726	   a=setup attribute, as described in RFC 4145 [RFC4145], MUST be
727	   "active" for the offer from the client and MUST be "passive" for the
728	   answer from the MRCPv2 server.  The a=connection attribute MUST have
729	   a value of "new" on the very first control m-line offer from the
730	   client to an MRCPv2 server.  Subsequent control m-line offers from
731	   the client to the MRCP server MAY contain "new" or "existing",
732	   depending on whether the client wants to set up a new connection or
733	   share an existing connection, respectively.  If the client specifies
734	   a value of "new", the server MUST respond with a value of "new".  If
735	   the client specifies a value of "existing", the server MUST respond.
736	   The legal values in the response are "existing" if the server prefers
737	   to share an existing connection or "new" if not.  In the latter case
738	   the client MUST initiate a new transport connection.

740	   When the client wants to de-allocate the resource from this session,
741	   it issues a new SDP offer, according to RFC 3264 [RFC3264], where the
742	   control m-line port MUST be set to 0.  This SDP offer is sent in a
743	   SIP re-INVITE request.  This de-allocates the associated MRCPv2
744	   identifier and resource.  The server MUST NOT close the TCP or TLS
745	   connection if it is currently being shared among multiple MRCP
746	   channels.  When all MRCP channels that may be sharing the connection
747	   are released and/or the associated SIP dialog is terminated, the
748	   client or server terminates the connection.

750	   When the client wants to tear down the whole session and all its
751	   resources, it MUST issue a SIP BYE request to close the SIP session.
752	   This will de-allocate all the control channels and resources
753	   allocated under the session.

755	   All servers MUST support TLS.  Servers MAY use TCP without TLS in
756	   controlled environments (e.g., not in the public internet) where both
757	   nodes are inside a protected perimeter, for example, preventing
758	   access to the MRCP server from remote nodes outside the controlled
759	   perimeter.  It is up to the client, through the SDP offer, to choose
760	   which transport it wants to use for an MRCPv2 session.  Aside from
761	   the exceptions given above, when using TCP the m-lines MUST conform
762	   to RFC4145 [RFC4145], which describes the usage of SDP for
763	   connection-oriented transport.  When using TLS the SDP m-line for the
764	   control stream MUST conform to comedia over TLS [RFC4572], which
765	   specifies the usage of SDP for establishing a secure connection-
766	   oriented transport over TLS.

768	4.3.  SIP session example

770	   This first example shows the power of using SIP to route to the
771	   appropriate resource.  In the example, note the use of a request to a
772	   domain's speech server service in the INVITE to
773	   mresources@example.com.  The SIP routing machinery in the domain
774	   locates the actual server, mresources@server.example.com, which gets
775	   returned in the 200 OK.  Note that "cmid" is defined in Section 4.4.

777	   This example exchange adds a resource control channel for a
778	   synthesizer.  Since a synthesizer also generates an audio stream,
779	   this interaction also creates a receive-only Real-Time Protocol (RTP)
780	   [RFC3550] media session for the server to send audio to.  The SIP
781	   dialog with the media source/sink is independent of MRCP and is not
782	   shown.

784	   C->S:  INVITE sip:mresources@example.com SIP/2.0
785	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
786	           branch=z9hG4bK74bf1
787	          Max-Forwards:6
788	          To:MediaServer <sip:mresources@example.com>
789	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
790	          Call-ID:a84b4c76e66710
791	          CSeq:314161 INVITE
792	          Contact:<sip:sarvi@client.example.com>
793	          Content-Type:application/sdp
794	          Content-Length:...

796	          v=0
797	          o=sarvi 2890844526 2890844526 IN IP4 192.0.2.12
798	          s=-
799	          c=IN IP4 192.0.2.12
800	          t=0 0
801	          m=application 9 TCP/MRCPv2 1
802	          a=setup:active
803	          a=connection:new
804	          a=resource:speechsynth
805	          a=cmid:1
806	          m=audio 49170 RTP/AVP 0
807	          a=rtpmap:0 pcmu/8000
808	          a=recvonly
809	          a=mid:1

811	   S->C:  SIP/2.0 200 OK
812	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
813	           branch=z9hG4bK74bf1;received=192.0.32.10
814	          To:MediaServer <sip:mresources@example.com>;tag=62784
815	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
816	          Call-ID:a84b4c76e66710
817	          CSeq:314161 INVITE
818	          Contact:<sip:mresources@server.example.com>
819	          Content-Type:application/sdp
820	          Content-Length:...

822	          v=0
823	          o=- 2890842808 2890842808 IN IP4 192.0.2.11
824	          s=-
825	          c=IN IP4 192.0.2.11
826	          t=0 0
827	          m=application 32416 TCP/MRCPv2 1
828	          a=setup:passive
829	          a=connection:new
830	          a=channel:32AECB234338@speechsynth
831	          a=cmid:1
832	          m=audio 48260 RTP/AVP 0
833	          a=rtpmap:0 pcmu/8000
834	          a=sendonly
835	          a=mid:1

837	   C->S:  ACK sip:mresources@server.example.com SIP/2.0
838	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
839	           branch=z9hG4bK74bf2
840	          Max-Forwards:6
841	          To:MediaServer <sip:mresources@example.com>;tag=62784
842	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
843	          Call-ID:a84b4c76e66710
844	          CSeq:314161 ACK
845	          Content-Length:0

847	                 Example: Add Synthesizer Control Channel

849	   This example exchange continues from the previous figure and
850	   allocates an additional resource control channel for a recognizer.
851	   Since a recognizer would need to receive an audio stream for
852	   recognition, this interaction also updates the audio stream to
853	   sendrecv, making it a 2-way RTP media session.

855	   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
856	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
857	           branch=z9hG4bK74bf3
858	          Max-Forwards:6
859	          To:MediaServer <sip:mresources@example.com>;tag=62784
860	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
861	          Call-ID:a84b4c76e66710
862	          CSeq:314162 INVITE
863	          Contact:<sip:sarvi@client.example.com>
864	          Content-Type:application/sdp
865	          Content-Length:...

867	          v=0
868	          o=sarvi 2890844526 2890844527 IN IP4 192.0.2.12
869	          s=-
870	          c=IN IP4 192.0.2.12
871	          t=0 0
872	          m=application 9 TCP/MRCPv2 1
873	          a=setup:active
874	          a=connection:existing
875	          a=resource:speechsynth
876	          a=cmid:1
877	          m=audio 49170 RTP/AVP 0 96
878	          a=rtpmap:0 pcmu/8000
879	          a=rtpmap:96 telephone-event/8000
880	          a=fmtp:96 0-15
881	          a=sendrecv
882	          a=mid:1
883	          m=application 9 TCP/MRCPv2 1
884	          a=setup:active
885	          a=connection:existing
886	          a=resource:speechrecog
887	          a=cmid:1

889	   S->C:  SIP/2.0 200 OK
890	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
891	           branch=z9hG4bK74bf3;received=192.0.32.10
892	          To:MediaServer <sip:mresources@example.com>;tag=62784
893	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
894	          Call-ID:a84b4c76e66710
895	          CSeq:314162 INVITE
896	          Contact:<sip:mresources@server.example.com>
897	          Content-Type:application/sdp
898	          Content-Length:...

900	          v=0
901	          o=- 2890842808 2890842809 IN IP4 192.0.2.11
902	          s=-
903	          c=IN IP4 192.0.2.11
904	          t=0 0
905	          m=application 32416 TCP/MRCPv2 1
906	          a=setup:passive
907	          a=connection:existing
908	          a=channel:32AECB234338@speechsynth
909	          a=cmid:1
910	          m=audio 48260 RTP/AVP 0 96
911	          a=rtpmap:0 pcmu/8000
912	          a=rtpmap:96 telephone-event/8000
913	          a=fmtp:96 0-15
914	          a=sendrecv
915	          a=mid:1
916	          m=application 32416 TCP/MRCPv2 1
917	          a=setup:passive
918	          a=connection:existing
919	          a=channel:32AECB234338@speechrecog
920	          a=cmid:1

922	   C->S:  ACK sip:mresources@server.example.com SIP/2.0
923	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
924	           branch=z9hG4bK74bf4
925	          Max-Forwards:6
926	          To:MediaServer <sip:mresources@example.com>;tag=62784
927	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
928	          Call-ID:a84b4c76e66710
929	          CSeq:314162 ACK
930	          Content-Length:0

932	                          Add Recognizer example

934	   This example exchange continues from the previous figure and de-
935	   allocates the recognizer channel.  Since a recognizer no longer needs
936	   to receive an audio stream, this interaction also updates the RTP
937	   media session to recvonly.

939	   C->S:  INVITE sip:mresources@server.example.com SIP/2.0
940	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
941	           branch=z9hG4bK74bf5
942	          Max-Forwards:6
943	          To:MediaServer <sip:mresources@example.com>;tag=62784
944	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
945	          Call-ID:a84b4c76e66710
946	          CSeq:314163 INVITE
947	          Contact:<sip:sarvi@client.example.com>
948	          Content-Type:application/sdp
949	          Content-Length:...

951	          v=0
952	          o=sarvi 2890844526 2890844528 IN IP4 192.0.2.12
953	          s=-
954	          c=IN IP4 192.0.2.12
955	          t=0 0
956	          m=application 9 TCP/MRCPv2 1
957	          a=resource:speechsynth
958	          a=cmid:1
959	          m=audio 49170 RTP/AVP 0
960	          a=rtpmap:0 pcmu/8000
961	          a=recvonly
962	          a=mid:1
963	          m=application 0 TCP/MRCPv2 1
964	          a=resource:speechrecog
965	          a=cmid:1

967	   S->C:  SIP/2.0 200 OK
968	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
969	           branch=z9hG4bK74bf5;received=192.0.32.10
970	          To:MediaServer <sip:mresources@example.com>;tag=62784
971	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
972	          Call-ID:a84b4c76e66710
973	          CSeq:314163 INVITE
974	          Contact:<sip:mresources@server.example.com>
975	          Content-Type:application/sdp
976	          Content-Length:...

978	          v=0
979	          o=- 2890842808 2890842810 IN IP4 192.0.2.11
980	          s=-
981	          c=IN IP4 192.0.2.11
982	          t=0 0
983	          m=application 32416 TCP/MRCPv2 1
984	          a=channel:32AECB234338@speechsynth
985	          a=cmid:1
986	          m=audio 48260 RTP/AVP 0
987	          a=rtpmap:0 pcmu/8000
988	          a=sendonly
989	          a=mid:1
990	          m=application 0 TCP/MRCPv2 1
991	          a=channel:32AECB234338@speechrecog
992	          a=cmid:1

994	   C->S:  ACK sip:mresources@server.example.com SIP/2.0
995	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
996	           branch=z9hG4bK74bf6
997	          Max-Forwards:6
998	          To:MediaServer <sip:mresources@example.com>;tag=62784
999	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
1000	          Call-ID:a84b4c76e66710
1001	          CSeq:314163 ACK
1002	          Content-Length:0

1004	                       Deallocate Recognizer example

1006	4.4.  Media Streams and RTP Ports

1008	   Since MRCPv2 resources either generate or consume media streams, the
1009	   client or the server needs to associate media sessions with their
1010	   corresponding resource or resources.  More than one resource could be
1011	   associated with a single media session or each resource could be
1012	   assigned a separate media session.  Also note that more than one
1013	   media session can be associated with a single resource if need be,
1014	   but this scenario is not useful for the current set of resources.
1015	   For example, a synthesizer and a recognizer could be associated to
1016	   the same media session (m=audio line), if it is opened in "sendrecv"
1017	   mode.  Alternatively, the recognizer could have its own "sendonly"
1018	   audio session and the synthesizer could have its own "recvonly" audio
1019	   session.

1021	   The association between control channels and their corresponding
1022	   media sessions is established using a new "resource channel media
1023	   identifier" media-level attribute ("cmid").  Valid values of this
1024	   attribute are the values of the "mid" attribute defined in RFC 5888
1025	   [RFC5888].  If there is more than 1 audio m-line, then each audio
1026	   m-line MUST have a "mid" attribute.  Each control m-line MAY have one
1027	   or more "cmid" attributes that match the resource control channel to
1028	   the "mid" attributes of the audio m-lines it is associated with.
1029	   Note that if a control m-line does not have a "cmid" attribute it
1030	   will not be associated with any media.  The operations on such a
1031	   resource will hence be limited.  For example, if it was a recognizer
1032	   resource, the RECOGNIZE method requires an associated media to
1033	   process while the INTERPRET method does not.  The formatting of the
1034	   "cmid" attribute is described by the following ABNF:

1036	   cmid-attribute = "a=cmid:" identification-tag
1037	   identification-tag = token

1039	   To allow this flexible mapping of media sessions to MRCPv2 control
1040	   channels, a single audio m-line can be associated with multiple
1041	   resources or each resource can have its own audio m-line.  For
1042	   example, if the client wants to allocate a recognizer and a
1043	   synthesizer and associate them with a single 2-way audio stream, the
1044	   SDP offer would contain two control m-lines and a single audio m-line
1045	   with an attribute of "sendrecv".  Each of the control m-lines would
1046	   have a "cmid" attribute whose value matches the "mid" of the audio
1047	   m-line.  If, on the other hand, the client wants to allocate a
1048	   recognizer and a synthesizer each with its own separate audio stream,
1049	   the SDP offer would carry two control m-lines (one for the recognizer
1050	   and another for the synthesizer) and two audio m-lines (one with the
1051	   attribute "sendonly" and another with attribute "recvonly").  The
1052	   "cmid" attribute of the recognizer control m-line would match the
1053	   "mid" value of the "sendonly" audio m-line and the "cmid" attribute
1054	   of the synthesizer control m-line would match the "mid" attribute of
1055	   the "recvonly" m-line.

1057	   When a server receives media (e.g. audio) on a media session that is
1058	   associated with more than one media processing resource, it is the
1059	   responsibility of the server to receive and fork the media to the
1060	   resources that need to consume it.  If multiple resources in an
1061	   MRCPv2 session are generating audio (or other media) to be sent on a
1062	   single associated media session, it is the responsibility of the
1063	   server to either multiplex the multiple streams onto the single RTP
1064	   session or contain an embedded RTP mixer (see RFC 3550 [RFC3550]) to
1065	   combine the multiple streams into one.  In the former case, the media
1066	   stream will contain RTP packets generated by different sources, and
1067	   hence the packets will have different Synchronization Source
1068	   identifiers (SSRCs).  In the latter case, the RTP packets will
1069	   contain multiple Contributing Source Identifiers (CSRCs)
1070	   corresponding to the original streams before being combined by the
1071	   mixer.  If an MRCPv2 server implementation neither multiplexes nor
1072	   mixes, it MUST disallow the client from associating multiple such
1073	   resources to a single audio stream by rejecting the SDP offer with a
1074	   SIP 488 "Not Acceptable" error.  Note that there is a large installed
1075	   base that will return a SIP 501 "Not Implemented" error in this case.
1076	   To facilitate interoperability with this installed base, new
1077	   implementations SHOULD treat a 501 in this context as a 488 when it
1078	   is received from an element known to be a legacy implementation.

1080	4.5.  MRCPv2 Message Transport

1082	   The MRCPv2 messages defined in this document are transported over a
1083	   TCP or TLS connection between the client and the server.  The method
1084	   for setting up this transport connection and the resource control
1085	   channel is discussed in Section 4.1 and Section 4.2.  Multiple
1086	   resource control channels between a client and a server that belong
1087	   to different SIP dialogs can share one or more TLS or TCP connections
1088	   between them; the server and client MUST support this mode of
1089	   operation.  Clients and servers MUST use the MRCPv2 channel
1090	   identifier, carried in the Channel-Identifier header field in
1091	   individual MRCPv2 messages, to differentiate MRCPv2 messages from
1092	   different resource channels (see Section 6.2.1 for details).  All
1093	   MRCPv2 servers MUST support TLS.  Servers MAY use TCP without TLS in
1094	   controlled environments (e.g., not in the public internet) where both
1095	   nodes are inside a protected perimeter, for example, preventing
1096	   access to the MRCP server from remote nodes outside the controlled
1097	   perimeter.  It is up to the client to choose which mode of transport
1098	   it wants to use for an MRCPv2 session.

1100	   Most examples from here on show only the MRCPv2 messages and do not
1101	   show the SIP messages that may have been used to establish the MRCPv2
1102	   control channel.

1104	4.6.  MRCPv2 Session Termination

1106	   If an MRCP client notices that the underlying connection has been
1107	   closed for one of its MRCP channels, and it has not previously
1108	   initiated a re-INVITE to close that channel, it MUST send a BYE to
1109	   close down the SIP dialog and all other MRCP channels.  If an MRCP
1110	   server notices that the underlying connection has been closed for one
1111	   of its MRCP channels, and it has not previously received and accepted
1112	   a re-INVITE closing that channel, then it MUST send a BYE to close
1113	   down the SIP dialog and all other MRCP channels.

1115	5.  MRCPv2 Specification

1117	   Except as otherwise indicated, MRCPv2 messages are Unicode encoded in
1118	   UTF-8 (RFC3629 [RFC3629]) to allow many different languages to be
1119	   represented.  DEFINE-GRAMMAR (Section 9.8), for example, is one such
1120	   an exception, since its body can contain arbitrary XML in arbitrary
1121	   (but specified via XML) encodings.  MRCPv2 also allows message bodies
1122	   to be represented in other character sets, for example ISO 8859-1
1123	   [ISO.8859-1.1987], because in some locales other character sets are
1124	   already in widespread use.  The MRCPv2 protocol headers (the first
1125	   line of an MRCP message) and header field names use only the US-ASCII
1126	   subset of UTF-8.

1128	   Lines are terminated by CRLF (carriage return, then line feed).
1129	   Also, some parameters in the message may contain binary data or a
1130	   record spanning multiple lines.  Such fields have a length value
1131	   associated with the parameter, which indicates the number of octets
1132	   immediately following the parameter.

1134	5.1.  Common Protocol Elements

1136	   The MRCPv2 message set consists of requests from the client to the
1137	   server, responses from the server to the client and asynchronous
1138	   events from the server to the client.  All these messages consist of
1139	   a start-line, one or more header fields, an empty line (i.e. a line
1140	   with nothing preceding the CRLF) indicating the end of the header
1141	   fields, and an optional message body.

1143	generic-message  =    start-line
1144	                      message-header
1145	                      CRLF
1146	                      [ message-body ]

1148	message-body     =    *OCTET

1150	start-line       =    request-line / response-line / event-line

1152	message-header   =  1*(generic-header / resource-header / generic-field)

1154	resource-header  =    synthesizer-header
1155	                 /    recognizer-header
1156	                 /    recorder-header
1157	                 /    verifier-header

1159	   The message-body contains resource-specific and message-specific
1160	   data.  The actual Media Types used to carry the data are specified
1161	   later in the sections defining the individual messages.  Generic
1162	   header fields are described in Section 6.2.

1164	   If a message contains a message body, the message MUST contain
1165	   content-headers indicating the Media Type and encoding of the data in
1166	   the message body.

1168	   Request, response and event messages (described in following
1169	   sections) include the version of MRCP that the message conforms to.
1170	   Version compatibility rules follow [H3.1] regarding version ordering,
1171	   compliance requirements, and upgrading of version numbers.  The
1172	   version information is indicated by "MRCP" (as opposed to "HTTP" in
1173	   [H3.1]) or "MRCP/2.0" (as opposed to "HTTP/1.1" in [H3.1]).  To be
1174	   compliant with this specification, clients and servers sending MRCPv2
1175	   messages MUST indicate an mrcp-version of "MRCP/2.0".  ABNF
1176	   productions using mrcp-version can be found in Section 5.2,
1177	   Section 5.3, and Section 5.5.

1179	   mrcp-version   =    "MRCP" "/" 1*2DIGIT "." 1*2DIGIT

1181	   The message-length field specifies the length of the message in
1182	   octets, including the start-line, and MUST be the 2nd token from the
1183	   beginning of the message.  This is to make the framing and parsing of
1184	   the message simpler to do.  This field specifies the length of the
1185	   message including data that may be encoded into the body of the
1186	   message.  Note that this value MAY be given as a fixed-length integer
1187	   that is zero-padded in front in order to eliminate or reduce
1188	   inefficiency in cases where the message-length value would change as
1189	   a result of the length of the message-length token itself.  This
1190	   value, as with all lengths in MRCP, is to be interpreted as a base-10
1191	   number.  In particular, leading zeros do not indicate that the value
1192	   is to be interpreted as a base-8 number.

1194	   message-length =    1*19DIGIT

1196	   The following sample MRCP exchange demonstrates proper message-length
1197	   values.  The values for message-length have been removed from all
1198	   other examples in the specification and replaced by '...' to reduce
1199	   confusion in the case of minor message-length computation errors in
1200	   those examples.

1202	   C->S:   MRCP/2.0 877 INTERPRET 543266
1203	           Channel-Identifier:32AECB23433801@speechrecog
1204	           Interpret-Text:may I speak to Andre Roy
1205	           Content-Type:application/srgs+xml
1206	           Content-ID:<request1@form-level.store>
1207	           Content-Length:661

1209	           <?xml version="1.0"?>
1210	           <!-- the default grammar language is US English -->
1211	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
1212	                    xml:lang="en-US" version="1.0" root="request">
1213	           <!-- single language attachment to tokens -->
1214	               <rule id="yes">
1215	                   <one-of>
1216	                       <item xml:lang="fr-CA">oui</item>
1217	                       <item xml:lang="en-US">yes</item>
1218	                   </one-of>
1219	               </rule>

1221	           <!-- single language attachment to a rule expansion -->
1222	               <rule id="request">
1223	                   may I speak to
1224	                   <one-of xml:lang="fr-CA">
1225	                       <item>Michel Tremblay</item>
1226	                       <item>Andre Roy</item>
1227	                   </one-of>
1228	               </rule>
1229	           </grammar>

1231	   S->C:   MRCP/2.0 82 543266 200 IN-PROGRESS
1232	           Channel-Identifier:32AECB23433801@speechrecog

1234	   S->C:   MRCP/2.0 634 INTERPRETATION-COMPLETE 543266 200 COMPLETE
1235	           Channel-Identifier:32AECB23433801@speechrecog
1236	           Completion-Cause:000 success
1237	           Content-Type:application/nlsml+xml
1238	           Content-Length:441

1240	           <?xml version="1.0"?>
1241	           <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
1242	                   xmlns:ex="http://www.example.com/example"
1243	                   grammar="session:request1@form-level.store">
1244	               <interpretation>
1245	                   <instance name="Person">
1246	                       <ex:Person>
1247	                           <ex:Name> Andre Roy </ex:Name>
1248	                       </ex:Person>
1249	                   </instance>
1250	                   <input>   may I speak to Andre Roy </input>
1251	               </interpretation>
1252	           </result>

1254	   All MRCPv2 messages, responses and events MUST carry the Channel-
1255	   Identifier header field so the server or client can differentiate
1256	   messages from different control channels that may share the same
1257	   transport connection.

1259	   In the resource-specific header field descriptions in sections 8-11,
1260	   a header field is disallowed on a method (request, response, or
1261	   event) for that resource unless specifically listed as being allowed.
1262	   Also, the phrasing "This header field MAY occur on method X"
1263	   indicates that the header field is allowed on that method but is not
1264	   required to be used in every instance of that method.

1266	5.2.  Request

1268	   An MRCPv2 request consists of a Request line followed by the message
1269	   header section and an optional message body containing data specific
1270	   to the request message.

1272	   The Request message from a client to the server includes within the
1273	   first line the method to be applied, a method tag for that request
1274	   and the version of the protocol in use.

1276	   request-line   =    mrcp-version SP message-length SP method-name
1277	                       SP request-id CRLF

1279	   The mrcp-version field is the MRCP protocol version that is being
1280	   used by the client.

1282	   The message-length field specifies the length of the message,
1283	   including the start-line.

1285	   Details about the mrcp-version and message-length fields are given in
1286	   Section 5.1.

1288	   The method-name field identifies the specific request that the client
1289	   is making to the server.  Each resource supports a subset of the
1290	   MRCPv2 methods.  The subset for each resource is defined in the
1291	   section of the specification for the corresponding resource.

1293	   method-name    =    generic-method
1294	                  /    synthesizer-method
1295	                  /    recognizer-method
1296	                  /    recorder-method
1297	                  /    verifier-method

1299	   The request-id field is a unique identifier representable as an
1300	   unsigned 32 bit integer created by the client and sent to the server.
1301	   Clients MUST utilize monotonically increasing request-id's for
1302	   consecutive requests within an MRCP session.  The request-id space is
1303	   linear, (i.e. not mod(32)) so the space does not wrap and validity
1304	   can be checked with a simple unsigned comparison operation.  The
1305	   client may choose any initial value for its first request, but a
1306	   small integer is RECOMMENDED to avoid exhausting the space in long
1307	   sessions.  If the server receives duplicate or out-of-order requests
1308	   the server MUST reject the request with a response code of 410.
1309	   Since request-id's are scoped to the MRCP session, they are unique
1310	   across all TCP connections and all resource channels in the session.

1312	   The server resource MUST use the client-assigned identifier in its
1313	   response to the request.  If the request does not complete
1314	   synchronously, future asynchronous events associated with this
1315	   request MUST carry the client-assigned request-id.

1317	   request-id     =    1*10DIGIT

1319	5.3.  Response

1321	   After receiving and interpreting the request message for a method,
1322	   the server resource responds with an MRCPv2 response message.  The
1323	   response consists of a response line followed by the message header
1324	   section and an optional message body containing data specific to the
1325	   method.

1327	   response-line  =    mrcp-version SP message-length SP request-id
1328	                                    SP status-code SP request-state CRLF

1330	   The mrcp-version field MUST contain the version of the request if
1331	   supported; otherwise, it MUST contain the highest version of the
1332	   MRCPv2 protocol supported by the server.

1334	   The message-length field specifies the length of the message,
1335	   including the start-line.

1337	   Details about the mrcp-version and message-length fields are given in
1338	   Section 5.1.

1340	   The request-id used in the response MUST match the one sent in the
1341	   corresponding request message.

1343	   The status-code field is a 3-digit code representing the success or
1344	   failure or other status of the request.

1346	   status-code     =    3DIGIT

1348	   The request-state field indicates if the action initiated by the
1349	   Request is PENDING, IN-PROGRESS or COMPLETE.  The COMPLETE status
1350	   means that the Request was processed to completion and that there
1351	   will be no more events or other messages from that resource to the
1352	   client with that request-id.  The PENDING status means that the
1353	   request has been placed on a queue and will be processed in first-in-
1354	   first-out order.  The IN-PROGRESS status means that the request is
1355	   being processed and is not yet complete.  A PENDING or IN-PROGRESS
1356	   status indicates that further Event messages may be delivered with
1357	   that request-id.

1359	   request-state    =  "COMPLETE"
1360	                    /  "IN-PROGRESS"
1361	                    /  "PENDING"

1363	5.4.  Status Codes

1365	   The status codes are classified under the Success (2XX) codes, Client
1366	   Failure (4XX) codes, and Server Failure (5XX).

1368	                               Success Codes

1370	     +------------+--------------------------------------------------+
1371	     | Code       | Meaning                                          |
1372	     +------------+--------------------------------------------------+
1373	     | 200        | Success                                          |
1374	     | 201        | Success with some optional header fields ignored |
1375	     +------------+--------------------------------------------------+

1377	                                Success 2xx

1379	                         Client Failure 4xx Codes

1381	   +------------+------------------------------------------------------+
1382	   | Code       | Meaning                                              |
1383	   +------------+------------------------------------------------------+
1384	   | 401        | Method not allowed                                   |
1385	   | 402        | Method not valid in this state                       |
1386	   | 403        | Unsupported header field                             |
1387	   | 404        | Illegal value for header field.  This is the error   |
1388	   |            | for a syntax violation.                              |
1389	   | 405        | Resource not allocated for this session or does not  |
1390	   |            | exist                                                |
1391	   | 406        | Mandatory Header Field Missing                       |
1392	   | 407        | Method or Operation Failed (e.g., Grammar            |
1393	   |            | compilation failed in the recognizer.  Detailed      |
1394	   |            | cause codes might be available through a resource    |
1395	   |            | specific header.)                                    |
1396	   | 408        | Unrecognized or unsupported message entity           |
1397	   | 409        | Unsupported Header Field Value.  This is a value     |
1398	   |            | that is syntactically legal but exceeds the          |
1399	   |            | implementation's capabilities or expectations.       |
1400	   | 410        | Non-Monotonic or Out of order sequence number in     |
1401	   |            | request.                                             |
1402	   | 411-420    | Reserved for future assignment                       |
1403	   +------------+------------------------------------------------------+

1405	                            Client Failure 4xx

1407	                         Server Failure 5xx Codes

1409	              +------------+--------------------------------+
1410	              | Code       | Meaning                        |
1411	              +------------+--------------------------------+
1412	              | 501        | Server Internal Error          |
1413	              | 502        | Protocol Version not supported |
1414	              | 503        | Reserved for future assignment |
1415	              | 504        | Message too large              |
1416	              +------------+--------------------------------+

1418	                            Server Failure 4xx

1420	5.5.  Events

1422	   The server resource may need to communicate a change in state or the
1423	   occurrence of a certain event to the client.  These messages are used
1424	   when a request does not complete immediately and the response returns
1425	   a status of PENDING or IN-PROGRESS.  The intermediate results and
1426	   events of the request are indicated to the client through the event
1427	   message from the server.  The event message consists of an event
1428	   header line followed by the message header section and an optional
1429	   message body containing data specific to the event message.  The
1430	   header line has the request-id of the corresponding request and
1431	   status value.  The request-state value is COMPLETE if the request is
1432	   done and this was the last event, else it is IN-PROGRESS.

1434	   event-line       =  mrcp-version SP message-length SP event-name
1435	                                    SP request-id SP request-state CRLF

1437	   The mrcp-version used here is identical to the one used in the
1438	   Request/Response Line and indicates the version of the MRCPv2
1439	   protocol running on the server.

1441	   The message-length field specifies the length of the message,
1442	   including the start-line.

1444	   Details about the mrcp-version and message-length fields are given in
1445	   Section 5.1.

1447	   The event-name identifies the nature of the event generated by the
1448	   media resource.  The set of valid event names depends on the resource
1449	   generating it.  See the corresponding resource-specific section of
1450	   the document.

1452	   event-name       =  synthesizer-event
1453	                    /  recognizer-event
1454	                    /  recorder-event
1455	                    /  verifier-event

1457	   The request-id used in the event MUST match the one sent in the
1458	   request that caused this event.

1460	   The request-state indicates whether the Request/Command causing this
1461	   event is complete or still in progress, and is the same as the one
1462	   mentioned in Section 5.3.  The final event for a request has a
1463	   COMPLETE status indicating the completion of the request.

1465	6.  MRCPv2 Generic Methods, Headers, and Result Structure

1467	   MRCPv2 supports a set of methods and header fields that are common to
1468	   all resources.  These are discussed here; resource-specific methods
1469	   and header fields are discussed in the corresponding resource-
1470	   specific section of the document.

1472	6.1.  Generic Methods

1474	   MRCPv2 supports two generic methods for reading and writing the state
1475	   associated with a resource.

1477	   generic-method      =    "SET-PARAMS"
1478	                       /    "GET-PARAMS"

1480	   These are described in the following sub-sections.

1482	6.1.1.  SET-PARAMS

1484	   The SET-PARAMS method, from the client to the server, tells the
1485	   MRCPv2 resource to define parameters for the session, such as voice
1486	   characteristics and prosody on synthesizers, recognition timers on
1487	   recognizers, etc.  If the server accepts and sets all parameters it
1488	   MUST return a response status-code of 200.  If it chooses to ignore
1489	   some optional header fields that can be safely ignored without
1490	   affecting operation of the server it MUST return 201.

1492	   If one or more of the header fields being sent is incorrect, error
1493	   403, 404, or 409 MUST be returned as follows:
1494	   o  If one or more of the header fields being set has an illegal
1495	      value, the server MUST reject the request with a 404 Illegal Value
1496	      for Header Field.
1497	   o  If one or more of the header fields being set is unsupported for
1498	      the resource, the server MUST reject the request with a 403
1499	      Unsupported Header Field, except as described in the next
1500	      paragraph.
1501	   o  If one or more of the header fields being set has an unsupported
1502	      value, the server MUST reject the request with a 409 Unsupported
1503	      Header Field Value, except as described in the next paragraph.

1505	   If both error 404 and another error have occurred, only error 404
1506	   MUST be returned.  If both errors 403 and 409 have occurred, but not
1507	   error 404, only error 403 MUST be returned.

1509	   If error 403, 404, or 409 is returned, the response MUST include the
1510	   bad or unsupported header fields and their values exactly as they
1511	   were sent from the client.  Session parameters modified using SET-
1512	   PARAMS do not override parameters explicitly specified on individual
1513	   requests or requests that are in-PROGRESS.

1515	   C->S:  MRCP/2.0 ... SET-PARAMS 543256
1516	          Channel-Identifier:32AECB23433802@speechsynth
1517	          Voice-gender:female
1518	          Voice-variant:3

1520	   S->C:  MRCP/2.0 ... 543256 200 COMPLETE
1521	          Channel-Identifier:32AECB23433802@speechsynth

1523	6.1.2.  GET-PARAMS

1525	   The GET-PARAMS method, from the client to the server, asks the MRCPv2
1526	   resource for its current session parameters, such as voice
1527	   characteristics and prosody on synthesizers, recognition-timer on
1528	   recognizers, etc.  For every header field the client sends in the
1529	   request without a value, the server MUST include the header field and
1530	   its corresponding value in the response.  If no parameter header
1531	   fields are specified by the client then the server MUST return all
1532	   the settable parameters and their values in the corresponding header
1533	   section of the response, including vendor-specific parameters.  Such
1534	   wild-card parameter requests can be very processing-intensive, since
1535	   the number of settable parameters can be large depending on the
1536	   implementation.  Hence, it is RECOMMENDED that the client not use the
1537	   wildcard GET-PARAMS operation very often.  Note that GET-PARAMS
1538	   returns header field values that apply to the whole session and not
1539	   values that have a request level scope.  For example, Input-Waveform-
1540	   URI is a request-level header field and thus would not be returned by
1541	   GET-PARAMS.

1543	   If all of the header fields requested are supported, the server MUST
1544	   return a response status-code of 200.  If some of the header fields
1545	   being retrieved are unsupported for the resource, the server MUST
1546	   reject the request with a 403 Unsupported Header Field.  Such a
1547	   response MUST include the unsupported header fields exactly as they
1548	   were sent from the client, without values.

1550	   C->S:   MRCP/2.0 ... GET-PARAMS 543256
1551	           Channel-Identifier:32AECB23433802@speechsynth
1552	           Voice-gender:
1553	           Voice-variant:
1554	           Vendor-Specific-Parameters:com.example.param1;
1555	                         com.example.param2

1557	   S->C:   MRCP/2.0 ... 543256 200 COMPLETE
1558	           Channel-Identifier:32AECB23433802@speechsynth
1559	           Voice-gender:female
1560	           Voice-variant:3
1561	           Vendor-Specific-Parameters:com.example.param1="Company Name";
1562	                         com.example.param2="124324234@example.com"

1564	6.2.  Generic Message Headers

1566	   All MRCPv2 header fields, which include both the generic-headers
1567	   defined in the following sub-sections and the resource-specific
1568	   header fields defined later, follow the same generic format as that
1569	   given in Section 3.1 of RFC5322 [RFC5322].  Each header field
1570	   consists of a name followed by a colon (":") and the value.  Header
1571	   field names are case-insensitive.  The value MAY be preceded by any
1572	   amount of LWS (linear white space), though a single SP (space) is
1573	   preferred.  Header fields may extend over multiple lines by preceding
1574	   each extra line with at least one SP or HT (horizontal tab).

1576	   generic-field  = field-name ":" [ field-value ]
1577	   field-name     = token
1578	   field-value    = *LWS field-content *( CRLF 1*LWS field-content)
1579	   field-content  = <the OCTETs making up the field-value
1580	                    and consisting of either *TEXT or combinations
1581	                    of token, separators, and quoted-string>

1583	   The field-content does not include any leading or trailing LWS (i.e.
1584	   linear white space occurring before the first non-whitespace
1585	   character of the field-value or after the last non-whitespace
1586	   character of the field-value).  Such leading or trailing LWS MAY be
1587	   removed without changing the semantics of the field value.  Any LWS
1588	   that occurs between field-content MAY be replaced with a single SP
1589	   before interpreting the field value or forwarding the message
1590	   downstream.

1592	   MRCPv2 servers and clients MUST NOT depend on header field order.  It
1593	   is RECOMMENDED to send general-header fields first, followed by
1594	   request-header or response-header fields, and ending with the entity-
1595	   header fields.  However, MRCPv2 servers and clients MUST be prepared
1596	   to process the header fields in any order.  The only exception to
1597	   this rule is when there are multiple header fields with the same name
1598	   in a message.

1600	   Multiple header fields with the same name MAY be present in a message
1601	   if and only if the entire value for that header field is defined as a
1602	   comma-separated list [i.e., #(values)].

1604	   Since vendor-specific parameters may be order-dependent, it MUST be
1605	   possible to combine multiple header fields of the same name into one
1606	   "name:value" pair without changing the semantics of the message, by
1607	   appending each subsequent value to the first, each separated by a
1608	   comma.  The order in which header fields with the same name are
1609	   received is therefore significant to the interpretation of the
1610	   combined header field value, and thus an intermediary MUST NOT change
1611	   the order of these values when a message is forwarded.

1613	   generic-header      =    channel-identifier
1614	                       /    accept
1615	                       /    active-request-id-list
1616	                       /    proxy-sync-id
1617	                       /    accept-charset
1618	                       /    content-type
1619	                       /    content-id
1620	                       /    content-base
1621	                       /    content-encoding
1622	                       /    content-location
1623	                       /    content-length
1624	                       /    fetch-timeout
1625	                       /    cache-control
1626	                       /    logging-tag
1627	                       /    set-cookie
1628	                       /    vendor-specific

1630	6.2.1.  Channel-Identifier

1632	   All MRCPv2 requests, responses and events MUST contain the Channel-
1633	   Identifier header field.  The value is allocated by the server when a
1634	   control channel is added to the session and communicated to the
1635	   client by the "a=channel" attribute in the SDP answer from the
1636	   server.  The header field value consists of 2 parts separated by the
1637	   '@' symbol.  The first part is an unambiguous string identifying the
1638	   MRCPv2 session.  The second part is a string token which specifies
1639	   one of the media processing resource types listed in Section 3.1.
1640	   The unambiguous string (first part) MUST be difficult to guess,
1641	   unique among the resource instances managed by the server, and common
1642	   to all resource channels with that server established through a
1643	   single SIP dialog.

1645	   channel-identifier  = "Channel-Identifier" ":" channel-id CRLF
1646	   channel-id          = 1*alphanum "@" 1*alphanum

1648	6.2.2.  Accept

1650	   The Accept header field follows the syntax defined in [H14.1].  The
1651	   semantics are also identical, with the exception that if no Accept
1652	   header field is present, the server MUST assume a default value that
1653	   is specific to the resource type that is being controlled.  This
1654	   default value can be changed for a resource on a session by sending
1655	   this header field in a SET-PARAMS method.  The current default value
1656	   of this header field for a resource in a session can be found through
1657	   a GET-PARAMS method.  This header field MAY occur on any request.

1659	6.2.3.  Active-Request-Id-List

1661	   In a request, this header field indicates the list of request-ids to
1662	   which the request applies.  This is useful when there are multiple
1663	   requests that are PENDING or IN-PROGRESS and the client wants this
1664	   request to apply to one or more of these specifically.

1666	   In a response, this header field returns the list of request-ids that
1667	   the method modified or affected.  There could be one or more requests
1668	   in a request-state of PENDING or IN-PROGRESS.  When a method
1669	   affecting one or more PENDING or IN-PROGRESS requests is sent from
1670	   the client to the server, the response MUST contain the list of
1671	   request-ids that were affected or modified by this command in its
1672	   header section.

1674	   The Active-Request-Id-List is only used in requests and responses,
1675	   not in events.

1677	   For example, if a STOP request with no Active-Request-Id-List is sent
1678	   to a synthesizer resource which has one or more SPEAK requests in the
1679	   PENDING or IN-PROGRESS state, all SPEAK requests MUST be cancelled,
1680	   including the one IN-PROGRESS.  The response to the STOP request
1681	   contains in the Active-Request-Id-List value the request-ids of all
1682	   the SPEAK requests that were terminated.  After sending the STOP
1683	   response, the server MUST NOT send any SPEAK-COMPLETE or RECOGNITION-
1684	   COMPLETE events for the terminated requests.

1686	   active-request-id-list  =  "Active-Request-Id-List" ":"
1687	                              request-id *("," request-id) CRLF

1689	6.2.4.  Proxy-Sync-Id

1691	   When any server resource generates a barge-in-able event, it also
1692	   generates a unique tag.  The tag is sent as this header field's value
1693	   in an event to the client.  The client then acts as a intermediary
1694	   among the server resources and sends a BARGE-IN-OCCURRED method to
1695	   the synthesizer server resource with the Proxy-Sync-Id it received
1696	   from the server resource.  When the recognizer and synthesizer
1697	   resources are part of the same session, they may choose to work
1698	   together to achieve quicker interaction and response.  Here the
1699	   Proxy-Sync-Id helps the resource receiving the event, intermediated
1700	   by the client, to decide if this event has been processed through a
1701	   direct interaction of the resources.  This header field MAY occur
1702	   only on events and the BARGE-IN-OCCURRED method.  The name of this
1703	   header field contains the word 'proxy' only for historical reasons
1704	   and does not imply that a proxy server is involved.

1706	   proxy-sync-id    =  "Proxy-Sync-Id" ":" 1*VCHAR CRLF

1708	6.2.5.  Accept-Charset

1710	   See [H14.2].  This specifies the acceptable character sets for
1711	   entities returned in the response or events associated with this
1712	   request.  This is useful in specifying the character set to use in
1713	   the Natural Language Semantic Markup Language (NLSML) results of a
1714	   RECOGNITION-COMPLETE event.  This header field is only used on
1715	   requests.

1717	6.2.6.  Content-Type

1719	   See [H14.17].  MRCPv2 supports a restricted set of registered Media
1720	   Types for content, including speech markup, grammar, and recognition
1721	   results.  The content types applicable to each MRCPv2 resource-type
1722	   are specified in the corresponding section of the document and are
1723	   registered in the MIME Media Types registry maintained by IANA.  The
1724	   multi-part content type "multi-part/mixed" is supported to
1725	   communicate multiple of the above mentioned contents, in which case
1726	   the body parts MUST NOT contain any MRCPv2 specific header fields.
1727	   This header field MAY occur on all messages.

1729	   content-type     =    "Content-Type" ":" media-type-value CRLF

1731	   media-type-value =    type "/" subtype *( ";" parameter )

1733	   type             =    token

1735	   subtype          =    token

1737	   parameter        =    attribute "=" value

1739	   attribute        =    token

1741	   value            =    token / quoted-string

1743	6.2.7.  Content-ID

1745	   This header field contains an ID or name for the content by which it
1746	   can be referenced.  This header field operates according to the
1747	   specification in RFC 2392 [RFC2392] and is required for content
1748	   disambiguation in multi-part messages.  In MRCPv2 whenever the
1749	   associated content is stored, by either the client or the server, it
1750	   MUST be retrievable using this ID.  Such content can be referenced
1751	   later in a session by addressing it with the "session" URI scheme
1752	   described in Section 13.6.  This header field MAY occur on all
1753	   messages.

1755	6.2.8.  Content-Base

1757	   The Content-Base entity-header MAY be used to specify the base URI
1758	   for resolving relative URIs within the entity.

1760	   content-base      = "Content-Base" ":" absoluteURI CRLF

1762	   Note, however, that the base URI of the contents within the entity-
1763	   body may be redefined within that entity-body.  An example of this
1764	   would be multi-part media, which in turn can have multiple entities
1765	   within it.  This header field MAY occur on all messages.

1767	6.2.9.  Content-Encoding

1769	   The Content-Encoding entity-header is used as a modifier to the
1770	   Content-Type.  When present, its value indicates what additional
1771	   content encoding has been applied to the entity-body, and thus what
1772	   decoding mechanisms must be applied in order to obtain the Media Type
1773	   referenced by the Content-Type header field.  Content-Encoding is
1774	   primarily used to allow a document to be compressed without losing
1775	   the identity of its underlying media type.  Note that the SIP session
1776	   can be used to determine accepted encodings (see Section 7).  This
1777	   header field MAY occur on all messages.

1779	   content-encoding  = "Content-Encoding" ":"
1780	                       *WSP content-coding
1781	                       *(*WSP "," *WSP content-coding *WSP )
1782	                       CRLF

1784	   Content-Encoding is defined in [H3.5].  An example of its use is
1785	   Content-Encoding:gzip

1787	   If multiple encodings have been applied to an entity, the content
1788	   encodings MUST be listed in the order in which they were applied.

1790	6.2.10.  Content-Location

1792	   The Content-Location entity-header MAY be used to supply the resource
1793	   location for the entity enclosed in the message when that entity is
1794	   accessible from a location separate from the requested resource's
1795	   URI.  Refer to [H14.14].

1797	   content-location  =  "Content-Location" ":"
1798	                        ( absoluteURI / relativeURI ) CRLF

1800	   The Content-Location value is a statement of the location of the
1801	   resource corresponding to this particular entity at the time of the
1802	   request.  This header field is provided for optimization purposes
1803	   only.  The receiver of this header field MAY assume that the entity
1804	   being sent is identical to what would have been retrieved or might
1805	   already have been retrieved from the Content-Location URI.

1807	   For example, if the client provided a grammar markup inline, and it
1808	   had previously retrieved it from a certain URI, that URI can be
1809	   provided as part of the entity, using the Content-Location header
1810	   field.  This allows a resource like the recognizer to look into its
1811	   cache to see if this grammar was previously retrieved, compiled and
1812	   cached.  In this case, it might optimize by using the previously
1813	   compiled grammar object.

1815	   If the Content-Location is a relative URI, the relative URI is
1816	   interpreted relative to the Content-Base URI.  This header field MAY
1817	   occur on all messages.

1819	6.2.11.  Content-Length

1821	   This header field contains the length of the content of the message
1822	   body (i.e. after the double CRLF following the last header field).
1823	   Unlike in HTTP, it MUST be included in all messages that carry
1824	   content beyond the header section.  If it is missing, a default value
1825	   of zero is assumed.  Otherwise, it is interpreted according to
1826	   [H14.13].  When a message having no use for a message body contains
1827	   one, i.e. the Content-Length is non-zero, the receiver MUST ignore
1828	   the content of the message body.  This header field MAY occur on all
1829	   messages.

1831	   content-length  =  "Content-Length" ":" 1*19DIGIT CRLF

1833	6.2.12.  Fetch Timeout

1835	   When the recognizer or synthesizer needs to fetch documents or other
1836	   resources this header field controls the corresponding URI access
1837	   properties.  This defines the timeout for content that the server may
1838	   need to fetch over the network.  The value is interpreted to be in
1839	   milliseconds and ranges from 0 to an implementation-specific maximum
1840	   value.  It is RECOMMENDED that servers be cautious about accepting
1841	   long timeout values.  The default value for this header field is
1842	   implementation-specific.  This header field MAY occur in DEFINE-
1843	   GRAMMAR, RECOGNIZE, SPEAK, SET-PARAMS or GET-PARAMS.

1845	   fetch-timeout       =   "Fetch-Timeout" ":" 1*19DIGIT CRLF

1847	6.2.13.  Cache-Control

1849	   If the server implements content caching, it MUST adhere to the cache
1850	   correctness rules of HTTP 1.1 [RFC2616] when accessing and caching
1851	   stored content.  In particular, the "expires" and "cache-control"
1852	   header fields of the cached URI or document MUST be honored and take
1853	   precedence over the Cache-Control defaults set by this header field.
1854	   The Cache-Control directives are used to define the default caching
1855	   algorithms on the server for the session or request.  The scope of
1856	   the directive is based on the method it is sent on.  If the directive
1857	   is sent on a SET-PARAMS method, it applies for all requests for
1858	   external documents the server makes during that session, unless
1859	   overridden by a Cache-Control header field on an individual request.
1860	   If the directives are sent on any other requests they apply only to
1861	   external document requests the server makes for that request.  An
1862	   empty Cache-Control header field on the GET-PARAMS method is a
1863	   request for the server to return the current Cache-Control directives
1864	   setting on the server.  This header field MAY occur only on requests.

1866	   cache-control    =    "Cache-Control" ":"
1867	                         [*WSP cache-directive
1868	                         *( *WSP "," *WSP cache-directive *WSP )]
1869	                         CRLF

1871	   cache-directive     = "max-age" "=" delta-seconds
1872	                       / "max-stale" [ "=" delta-seconds ]
1873	                       / "min-fresh" "=" delta-seconds

1875	   delta-seconds       = 1*19DIGIT

1877	   Here delta-seconds is a decimal time value specifying the number of
1878	   seconds since the instant the message response or data was received
1879	   by the server.

1881	   The different cache-directive options allow the client to ask the
1882	   server to override the default cache expiration mechanisms:
1883	   max-age        Indicates that the client can tolerate the server
1884	                  using content whose age is no greater than the
1885	                  specified time in seconds.  Unless a "max-stale"
1886	                  directive is also included, the client is not willing
1887	                  to accept a response based on stale data.
1888	   min-fresh      Indicates that the client is willing to accept a
1889	                  server response with cached data whose expiration is
1890	                  no less than its current age plus the specified time
1891	                  in seconds.  If the server's cache time to live
1892	                  exceeds the client-supplied min-fresh value, the
1893	                  server MUST NOT utilize cached content.

1895	   max-stale      Indicates that the client is willing to allow a server
1896	                  to utilize cached data that has exceeded its
1897	                  expiration time.  If "max-stale" is assigned a value,
1898	                  then the client is willing to allow the server to use
1899	                  cached data that has exceeded its expiration time by
1900	                  no more than the specified number of seconds.  If no
1901	                  value is assigned to "max-stale", then the client is
1902	                  willing to allow the server to use stale data of any
1903	                  age.

1905	   If the server cache is requested to use stale response/data without
1906	   validation, it MAY do so only if this does not conflict with any
1907	   "MUST"-level requirements concerning cache validation (e.g., a "must-
1908	   revalidate" Cache-Control directive in the HTTP 1.1 specification
1909	   pertaining to the corresponding URI).

1911	   If both the MRCPv2 Cache-Control directive and the cached entry on
1912	   the server include "max-age" directives, then the lesser of the two
1913	   values is used for determining the freshness of the cached entry for
1914	   that request.

1916	6.2.14.  Logging-Tag

1918	   This header field MAY be sent as part of a SET-PARAMS/GET-PARAMS
1919	   method to set or retrieve the logging tag for logs generated by the
1920	   server.  Once set, the value persists until a new value is set or the
1921	   session ends.  The MRCPv2 server MAY provide a mechanism to subset
1922	   its output logs so that system administrators can examine or extract
1923	   only the log file portion during which the logging tag was set to a
1924	   certain value.

1926	   It is RECOMMENDED that clients include in the logging tag information
1927	   to identify the MRCPv2 client User Agent, so that one can determine
1928	   which MRCPv2 client request generated a given log message at the
1929	   server.  It is also RECOMMENDED that MRCPv2 clients not log
1930	   personally identifiable information such as credit card numbers and
1931	   national identification numbers.

1933	   logging-tag    = "Logging-Tag" ":" 1*UTFCHAR CRLF

1935	6.2.15.  Set-Cookie

1937	   Since the associated HTTP client on an MRCPv2 server fetches
1938	   documents for processing on behalf of the MRCPv2 client, the cookie
1939	   store in the HTTP client of the MRCPv2 server is treated as an
1940	   extension of the cookie store in the HTTP client of the MRCPv2
1941	   client.  This requires that the MRCPv2 client and server be able to
1942	   synchronize their common cookie store as needed.  To enable the
1943	   MRCPv2 client to push its stored cookies to the MRCPv2 server and get
1944	   new cookies from the MRCPv2 server stored back to the MRCPv2 client,
1945	   the Set-Cookie entity-header field MAY be included in MRCPv2 requests
1946	   to update the cookie store on a server and be returned in final
1947	   MRCPv2 responses or events to subsequently update the client's own
1948	   cookie store.  The stored cookies on the server persist for the
1949	   duration of the MRCPv2 session and MUST be destroyed at the end of
1950	   the session.  To ensure support for cookies, MRCPv2 clients and
1951	   servers MUST support the Set-Cookie entity header field.

1953	   Note that it is the MRCPv2 client that determines which, if any,
1954	   cookies are sent to the server.  There is no requirement that all
1955	   cookies be shared.  Rather, it is RECOMMENDED that MRCPv2 clients
1956	   communicate only cookies needed by the MRCPv2 server to process its
1957	   requests.

1959	 set-cookie      =       "Set-Cookie:" cookies CRLF
1960	 cookies         =       cookie *("," *LWS cookie)
1961	 cookie          =       attribute "=" value *(";" cookie-av)
1962	 cookie-av       =       "Comment" "=" value
1963	                 /       "Domain" "=" value
1964	                 /       "Max-Age" "=" value
1965	                 /       "Path" "=" value
1966	                 /       "Secure"
1967	                 /       "Version" "=" 1*19DIGIT
1968	                 /       "Age" "=" delta-seconds

1970	 set-cookie        = "Set-Cookie:" SP set-cookie-string
1971	 set-cookie-string = cookie-pair *( ";" SP cookie-av )
1972	 cookie-pair       = cookie-name "=" cookie-value
1973	 cookie-name       = token
1974	 cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
1975	 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
1976	 token             = <token, defined in [RFC2616], Section 2.2>

1978	 cookie-av         = expires-av / max-age-av / domain-av /
1979	                      path-av / secure-av / httponly-av /
1980	                      extension-av / age-av
1981	 expires-av        = "Expires=" sane-cookie-date
1982	 sane-cookie-date  = <rfc1123-date, defined in [RFC2616], Section 3.3.1>
1983	 max-age-av        = "Max-Age=" non-zero-digit *DIGIT
1984	 non-zero-digit    = %x31-39
1985	 domain-av         = "Domain=" domain-value
1986	 domain-value      = <subdomain>
1987	 path-av           = "Path=" path-value
1988	 path-value        = <any CHAR except CTLs or ";">
1989	 secure-av         = "Secure"
1990	 httponly-av       = "HttpOnly"
1991	 extension-av      = <any CHAR except CTLs or ";">
1992	 age-av            = "Age=" delta-seconds

1994	   The Set-Cookie header field is specified in RFC 6265 [RFC6265].  The
1995	   "Age" attribute is introduced in this specification to indicate the
1996	   age of the cookie and is OPTIONAL.  An MRCPv2 client or server MUST
1997	   calculate the age of the cookie according to the age calculation
1998	   rules in the HTTP/1.1 specification [RFC2616] and append the "Age"
1999	   attribute accordingly.  This attribute is provided because time may
2000	   have passed since the client received the cookie from an HTTP server.
2001	   Rather than having the client reduce Max-Age by the actual age, it
2002	   passes Max-Age verbatim and appends the Age header, thus maintaining
2003	   the cookie as received while still accounting for the fact that time
2004	   has passed.

2006	   The MRCPv2 client or server MUST supply defaults for the "Domain" and
2007	   "Path" attributes if omitted by the HTTP origin server as specified
2008	   in RFC 6265.  Note that there is no leading dot present in the
2009	   "Domain" attribute value in this case.  Although an explicitly
2010	   specified "Domain" value received via the HTTP protocol may be
2011	   modified to include a leading dot, an MRCPv2 client or server MUST
2012	   NOT modify the "Domain" value when received via the MRCPv2 protocol.

2014	   An MRCPv2 client or server MAY combine multiple cookie header fields
2015	   of the same type into a single "field-name:field-value" pair as
2016	   described in Section 6.2.

2018	   The Set-Cookie header field MAY be specified in any request that
2019	   subsequently results in the server performing an HTTP access.  When a
2020	   server receives new cookie information from an HTTP origin server,
2021	   and assuming the cookie store is modified according to RFC 6265, the
2022	   server MUST return the new cookie information in the MRCPv2 COMPLETE
2023	   response or event as appropriate to allow the client to update its
2024	   own cookie store.

2026	   The SET-PARAMS request MAY specify the Set-Cookie header field to
2027	   update the cookie store on a server.  The GET-PARAMS request MAY be
2028	   used to return the entire cookie store of "Set-Cookie" type cookies
2029	   to the client.

2031	6.2.16.  Vendor Specific Parameters

2033	   This set of header fields allows for the client to set or retrieve
2034	   Vendor Specific parameters.

2036	   vendor-specific          =    "Vendor-Specific-Parameters" ":"
2037	                                 [vendor-specific-av-pair
2038	                                 *(";" vendor-specific-av-pair)] CRLF

2040	   vendor-specific-av-pair  = vendor-av-pair-name "="
2041	                              value

2043	   vendor-av-pair-name     = 1*UTFCHAR

2045	   Header fields of this form MAY be sent in any method (request) and
2046	   are used to manage implementation-specific parameters on the server
2047	   side.  The vendor-av-pair-name follows the reverse Internet Domain
2048	   Name convention (see Section 13.1.6 for syntax and registration
2049	   information).  The value of the vendor attribute is specified after
2050	   the "=" symbol and MAY be quoted.  For example:

2052	   com.example.companyA.paramxyz=256
2053	   com.example.companyA.paramabc=High
2054	   com.example.companyB.paramxyz=Low

2056	   When used in GET-PARAMS to get the current value of these parameters
2057	   from the server, this header field value MAY contain a semicolon-
2058	   separated list of implementation-specific attribute names.

2060	6.3.  Generic Result Structure

2062	   Result data from the server for the Recognizer and Verifier resources
2063	   is carried as a typed media entity in the MRCPv2 message body of
2064	   various events.  The Natural Language Semantics Markup Language
2065	   (NLSML), an XML markup based on an early draft from the W3C, is the
2066	   default standard for returning results back to the client.  Hence,
2067	   all servers implementing these resource types MUST support the Media
2068	   Type application/nlsml+xml.  The Extensible MultiModal Annotation
2069	   (EMMA) [W3C.REC-emma-20090210] format can be used to return results
2070	   as well.  This can be done by negotiating the format at session
2071	   establishment time with SDP (a=resultformat:application/emma+xml) or
2072	   with SIP (Allow/Accept).  With SIP, for example, if a client wants
2073	   results in EMMA, an MRCPv2 server can route the request to another
2074	   server that supports EMMA by inspecting the SIP header fields, rather
2075	   than having to introspect into the SDP.

2077	   MRCPv2 uses this representation to convey content among the clients
2078	   and servers that generate and make use of the markup.  MRCPv2 uses
2079	   NSLML specifically to convey recognition, enrollment, and
2080	   verification results between the corresponding resource on the MRCPv2
2081	   server and the MRCPv2 client.  Details of this result format are
2082	   fully described in Section 6.3.1.

2084	   Content-Type:application/nlsml+xml
2085	   Content-Length:...

2087	   <?xml version="1.0"?>
2088	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
2089	           xmlns:ex="http://www.example.com/example"
2090	           grammar="http://theYesNoGrammar">
2091	       <interpretation>
2092	           <instance>
2093	                   <ex:response>yes</ex:response>
2094	           </instance>
2095	           <input>ok</input>
2096	       </interpretation>
2097	   </result>

2099	                              Result Example

2101	6.3.1.  Natural Language Semantics Markup Language

2103	   The Natural Language Semantics Markup Language (NLSML) is an XML data
2104	   structure with elements and attributes designed to carry result
2105	   information from recognizer (including enrollment) and verifier
2106	   resources.  The normative definition of NLSML is the RelaxNG schema
2107	   in Section 16.1.  Note that the elements and attributes of this
2108	   format are defined in the MRCPv2 namespace.  In the result structure,
2109	   they must either be prefixed by a namespace prefix declared within
2110	   the result or must be children of an element identified as belonging
2111	   to the respective namespace.  For details on how to use XML
2112	   Namespaces, see [W3C.REC-xml-names11-20040204].  Section 2 of
2113	   [W3C.REC-xml-names11-20040204] provides details on how to declare
2114	   namespaces and namespace prefixes.

2116	   The root element of NLSML is <result>.  Optional child elements are
2117	   <interpretation>, <enrollment-result>, and <verification-result>, at
2118	   least one of which must be present.  A single <result> MAY contain
2119	   any or all of the optional child elements.  Details of the <result>
2120	   and <interpretation> elements and their subelements and attributes
2121	   can be found in Section 9.6.  Details of the <enrollment-result>
2122	   element and its subelements can be found in Section 9.7.  Details of
2123	   the <verification-result> element and its subelements can be found in
2124	   Section 11.5.2.

2126	7.  Resource Discovery

2128	   Server resources may be discovered and their capabilities learned by
2129	   clients through standard SIP machinery.  The client MAY issue a SIP
2130	   OPTIONS transaction to a server, which has the effect of requesting
2131	   the capabilities of the server.  The server MUST respond to such a
2132	   request with an SDP-encoded description of its capabilities according
2133	   to RFC3264 [RFC3264].  The MRCPv2 capabilities are described by a
2134	   single m-line containing the media type "application" and transport
2135	   type "TCP/TLS/MRCPv2" or "TCP/MRCPv2".  There MUST be one "resource"
2136	   attribute for each media resource that the server supports with the
2137	   resource type identifier as its value.

2139	   The SDP description MUST also contain m-lines describing the audio
2140	   capabilities and the coders the server supports.

2142	   In this example, the client uses the SIP OPTIONS method to query the
2143	   capabilities of the MRCPv2 server.

2145	   C->S:
2146	        OPTIONS sip:mrcp@server.example.com SIP/2.0
2147	        Via:SIP/2.0/TCP client.atlanta.example.com:5060;
2148	         branch=z9hG4bK74bf7
2149	        Max-Forwards:6
2150	        To:<sip:mrcp@example.com>
2151	        From:Sarvi <sip:sarvi@example.com>;tag=1928301774
2152	        Call-ID:a84b4c76e66710
2153	        CSeq:63104 OPTIONS
2154	        Contact:<sip:sarvi@client.example.com>
2155	        Accept:application/sdp
2156	        Content-Length:0

2158	   S->C:
2159	        SIP/2.0 200 OK
2160	        Via:SIP/2.0/TCP client.atlanta.example.com:5060;
2161	         branch=z9hG4bK74bf7;received=192.0.32.10
2162	        To:<sip:mrcp@example.com>;tag=62784
2163	        From:Sarvi <sip:sarvi@example.com>;tag=1928301774
2164	        Call-ID:a84b4c76e66710
2165	        CSeq:63104 OPTIONS
2166	        Contact:<sip:mrcp@server.example.com>
2167	        Allow:INVITE, ACK, CANCEL, OPTIONS, BYE
2168	        Accept:application/sdp
2169	        Accept-Encoding:gzip
2170	        Accept-Language:en
2171	        Supported:foo
2172	        Content-Type:application/sdp
2173	        Content-Length:...

2175	        v=0
2176	        o=sarvi 2890844536 2890842811 IN IP4 192.0.2.12
2177	        s=-
2178	        i=MRCPv2 server capabilities
2179	        c=IN IP4 192.0.2.12/127
2180	        t=0 0
2181	        m=application 0 TCP/TLS/MRCPv2 1
2182	        a=resource:speechsynth
2183	        a=resource:speechrecog
2184	        a=resource:speakverify
2185	        m=audio 0 RTP/AVP 0 3
2186	        a=rtpmap:0 PCMU/8000
2187	        a=rtpmap:3 GSM/8000
2188	         Using SIP OPTIONS for MRCPv2 Server Capability Discovery

2190	8.  Speech Synthesizer Resource

2192	   This resource processes text markup provided by the client and
2193	   generates a stream of synthesized speech in real-time.  Depending
2194	   upon the server implementation and capability of this resource, the
2195	   client can also dictate parameters of the synthesized speech such as
2196	   voice characteristics, speaker speed, etc.

2198	   The synthesizer resource is controlled by MRCPv2 requests from the
2199	   client.  Similarly, the resource can respond to these requests or
2200	   generate asynchronous events to the client to indicate conditions of
2201	   interest to the client during the generation of the synthesized
2202	   speech stream.

2204	   This section applies for the following resource types:
2205	   o  speechsynth
2206	   o  basicsynth

2208	   The capabilities of these resources are defined in Section 3.1.

2210	8.1.  Synthesizer State Machine

2212	   The synthesizer maintains a state machine to process MRCPv2 requests
2213	   from the client.  The state transitions shown below describe the
2214	   states of the synthesizer and reflect the state of the request at the
2215	   head of the synthesizer resource queue.  A SPEAK request in the
2216	   PENDING state can be deleted or stopped by a STOP request without
2217	   affecting the state of the resource.

2219	   Idle                    Speaking                  Paused
2220	   State                   State                     State
2221	     |                        |                          |
2222	     |----------SPEAK-------->|                 |--------|
2223	     |<------STOP-------------|             CONTROL      |
2224	     |<----SPEAK-COMPLETE-----|                 |------->|
2225	     |<----BARGE-IN-OCCURRED--|                          |
2226	     |              |---------|                          |
2227	     |          CONTROL       |-----------PAUSE--------->|
2228	     |              |-------->|<----------RESUME---------|
2229	     |                        |               |----------|
2230	     |----------|             |              PAUSE       |
2231	     |    BARGE-IN-OCCURRED   |               |--------->|
2232	     |<---------|             |----------|               |
2233	     |                        |      SPEECH-MARKER       |
2234	     |                        |<---------|               |
2235	     |----------|             |----------|               |
2236	     |         STOP           |       RESUME             |
2237	     |          |             |<---------|               |
2238	     |<---------|             |                          |
2239	     |<---------------------STOP-------------------------|
2240	     |----------|             |                          |
2241	     |     DEFINE-LEXICON     |                          |
2242	     |          |             |                          |
2243	     |<---------|             |                          |
2244	     |<---------------BARGE-IN-OCCURRED------------------|

2246	                         Synthesizer State Machine

2248	8.2.  Synthesizer Methods

2250	   The synthesizer supports the following methods.

2252	   synthesizer-method   =  "SPEAK"
2253	                        /  "STOP"
2254	                        /  "PAUSE"
2255	                        /  "RESUME"
2256	                        /  "BARGE-IN-OCCURRED"
2257	                        /  "CONTROL"
2258	                        /  "DEFINE-LEXICON"

2260	8.3.  Synthesizer Events

2262	   The synthesizer can generate the following events.

2264	   synthesizer-event    =  "SPEECH-MARKER"
2265	                        /  "SPEAK-COMPLETE"

2267	8.4.  Synthesizer Header Fields

2269	   A synthesizer method can contain header fields containing request
2270	   options and information to augment the Request, Response or Event it
2271	   is associated with.

2273	   synthesizer-header  =  jump-size
2274	                       /  kill-on-barge-in
2275	                       /  speaker-profile
2276	                       /  completion-cause
2277	                       /  completion-reason
2278	                       /  voice-parameter
2279	                       /  prosody-parameter
2280	                       /  speech-marker
2281	                       /  speech-language
2282	                       /  fetch-hint
2283	                       /  audio-fetch-hint
2284	                       /  failed-uri
2285	                       /  failed-uri-cause
2286	                       /  speak-restart
2287	                       /  speak-length
2288	                       /  load-lexicon
2289	                       /  lexicon-search-order

2291	8.4.1.  Jump-Size

2293	   This header field MAY be specified in a CONTROL method and controls
2294	   the amount to jump forward or backward in an active SPEAK request.  A
2295	   + or - indicates a relative value to what is being currently played.
2296	   This header field MAY also be specified in a SPEAK request as a
2297	   desired offset into the synthesized speech.  In this case, the
2298	   synthesizer MUST begin speaking from this amount of time into the
2299	   speech markup.  Note that an offset that extends beyond the end of
2300	   the produced speech will result in audio of length zero.  The
2301	   different speech length units supported are dependent on the
2302	   synthesizer implementation.  If the synthesizer resource does not
2303	   support a unit for the operation, the resource MUST respond with a
2304	   status-code of 409 "Unsupported Header Field Value".

2306	   jump-size             =   "Jump-Size" ":" speech-length-value CRLF

2308	   speech-length-value   =   numeric-speech-length
2309	                         /   text-speech-length

2311	   text-speech-length    =   1*UTFCHAR SP "Tag"

2313	   numeric-speech-length =    ("+" / "-") positive-speech-length

2315	   positive-speech-length =   1*19DIGIT SP numeric-speech-unit

2317	   numeric-speech-unit   =   "Second"
2318	                         /   "Word"
2319	                         /   "Sentence"
2320	                         /   "Paragraph"

2322	8.4.2.  Kill-On-Barge-In

2324	   This header field MAY be sent as part of the SPEAK method to enable
2325	   kill-on-barge-in support.  If enabled, the SPEAK method is
2326	   interrupted by DTMF input detected by a signal detector resource or
2327	   by the start of speech sensed or recognized by the speech recognizer
2328	   resource.

2330	   kill-on-barge-in      =   "Kill-On-Barge-In" ":" BOOLEAN CRLF

2332	   The client MUST send a BARGE-IN-OCCURRED method to the synthesizer
2333	   resource when it receives a barge-in-able event from any source.
2334	   This source could be a synthesizer resource or signal detector
2335	   resource and MAY be either local or distributed.  If this header
2336	   field is not specified in a SPEAK request or explicitly set by a SET-
2337	   PARAMS, the default value for this header field is "true".

2339	   If the recognizer or signal detector resource is on the same server
2340	   as the synthesizer and both are part of the same session, the server
2341	   MAY work with both to provide internal notification to the
2342	   synthesizer so that audio may be stopped without having to wait for
2343	   the client's BARGE-IN-OCCURRED event.

2345	   It is generally RECOMMENDED when playing a prompt to the user with
2346	   Kill-On-Barge-In and asking for input, that the client issue the
2347	   RECOGNIZE request ahead of the SPEAK request for optimum performance
2348	   and user experience.  This way, it is guaranteed that the recognizer
2349	   is online before the prompt starts playing and the user's speech will
2350	   not be truncated at the beginning (especially for power users).

2352	8.4.3.  Speaker Profile

2354	   This header field MAY be part of the SET-PARAMS/GET-PARAMS or SPEAK
2355	   request from the client to the server and specifies a URI which
2356	   references the profile of the speaker.  Speaker profiles are
2357	   collections of voice parameters like gender, accent etc.

2359	   speaker-profile       =   "Speaker-Profile" ":" uri CRLF

2361	8.4.4.  Completion Cause

2363	   This header field MUST be specified in a SPEAK-COMPLETE event coming
2364	   from the synthesizer resource to the client.  This indicates the
2365	   reason the SPEAK request completed.

2367	   completion-cause      =   "Completion-Cause" ":" 3DIGIT SP
2368	                             1*VCHAR CRLF

2370	   +------------+-----------------------+------------------------------+
2371	   | Cause-Code | Cause-Name            | Description                  |
2372	   +------------+-----------------------+------------------------------+
2373	   | 000        | normal                | SPEAK completed normally.    |
2374	   | 001        | barge-in              | SPEAK request was terminated |
2375	   |            |                       | because of barge-in.         |
2376	   | 002        | parse-failure         | SPEAK request terminated     |
2377	   |            |                       | because of a failure to      |
2378	   |            |                       | parse the speech markup      |
2379	   |            |                       | text.                        |
2380	   | 003        | uri-failure           | SPEAK request terminated     |
2381	   |            |                       | because access to one of the |
2382	   |            |                       | URIs failed.                 |
2383	   | 004        | error                 | SPEAK request terminated     |
2384	   |            |                       | prematurely due to           |
2385	   |            |                       | synthesizer error.           |
2386	   | 005        | language-unsupported  | Language not supported.      |
2387	   | 006        | lexicon-load-failure  | Lexicon loading failed.      |
2388	   | 007        | cancelled             | A prior SPEAK request failed |
2389	   |            |                       | while this one was still in  |
2390	   |            |                       | the queue.                   |
2391	   +------------+-----------------------+------------------------------+

2393	                Synthesizer Resource Compleion Cause Codes

2395	8.4.5.  Completion Reason

2397	   This header field MAY be specified in a SPEAK-COMPLETE event coming
2398	   from the synthesizer resource to the client.  This contains the
2399	   reason text behind the SPEAK request completion.  This header field
2400	   communicates text describing the reason for the failure, such as an
2401	   error in parsing the speech markup text.

2403	   completion-reason   =   "Completion-Reason" ":"
2404	                           quoted-string CRLF

2406	   The completion reason text is provided for client use in logs and for
2407	   debugging and instrumentation purposes.  Clients MUST NOT interpret
2408	   the completion reason text.

2410	8.4.6.  Voice-Parameter

2412	   This set of header fields defines the voice of the speaker.

2414	   voice-parameter    =   voice-gender
2415	                       /   voice-age
2416	                       /   voice-variant
2417	                       /   voice-name

2419	   voice-gender        =   "Voice-Gender:" voice-gender-value CRLF
2420	   voice-gender-value  =   "male"
2421	                       /   "female"
2422	                       /   "neutral"
2423	   voice-age           =   "Voice-Age:" 1*3DIGIT CRLF
2424	   voice-variant       =   "Voice-Variant:" 1*19DIGIT CRLF
2425	   voice-name          =   "Voice-Name:"
2426	                           1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF

2428	   The Voice- parameters are derived from the similarly-named attributes
2429	   of the voice element specified in W3C's Speech Synthesis Markup
2430	   Language Specification (SSML) [W3C.REC-speech-synthesis-20040907].
2431	   Legal values for these parameters are as defined in that
2432	   specification.

2434	   These header fields MAY be sent in SET-PARAMS/GET-PARAMS request to
2435	   define/get default values for the entire session or MAY be sent in
2436	   the SPEAK request to define default values for that speak request.
2437	   Note that SSML content can itself set these values internal to the
2438	   SSML document, of course.

2440	   Voice parameter header fields MAY also be sent in a CONTROL method to
2441	   affect a SPEAK request in progress and change its behavior on the
2442	   fly.  If the synthesizer resource does not support this operation, it
2443	   MUST reject the request with a status-code of 403 "Unsupported Header
2444	   Field".

2446	8.4.7.  Prosody-Parameters

2448	   This set of header fields defines the prosody of the speech.

2450	   prosody-parameter   =   "Prosody-" prosody-param-name ":"
2451	                           prosody-param-value CRLF

2453	   prosody-param-name    =    1*VCHAR

2455	   prosody-param-value   =    1*VCHAR

2457	   prosody-param-name is any one of the attribute names under the
2458	   prosody element specified in W3C's Speech Synthesis Markup Language
2459	   Specification [W3C.REC-speech-synthesis-20040907].  The prosody-
2460	   param-value is any one of the value choices of the corresponding
2461	   prosody element attribute specified in the above section.

2463	   These header fields MAY be sent in SET-PARAMS/GET-PARAMS request to
2464	   define/get default values for the entire session or MAY be sent in
2465	   the SPEAK request to define default values for that speak request.
2466	   Furthermore, these attributes can be part of the speech text marked
2467	   up in SSML.

2469	   The prosody parameter header fields in the SET-PARAMS or SPEAK
2470	   request only apply if the speech data is of type text/plain and does
2471	   not use a speech markup format.

2473	   These prosody parameter header fields MAY also be sent in a CONTROL
2474	   method to affect a SPEAK request in progress and change its behavior
2475	   on the fly.  If the synthesizer resource does not support this
2476	   operation, it MUST respond back to the client with a status-code of
2477	   403 "Unsupported Header Field".

2479	8.4.8.  Speech Marker

2481	   This header field contains timestamp information in a "timestamp"
2482	   field.  This is a Network Time Protocol (NTP) [RFC5905] timestamp, a
2483	   64 bit number in decimal form.  It MUST be synced with the Real-Time
2484	   Protocol (RTP) [RFC3550] timestamp of the media stream through the
2485	   Real-Time Control Protocol (RTCP) [RFC3550].

2487	   Markers are bookmarks that are defined within the markup.  Most
2488	   speech markup formats provide mechanisms to embed marker fields
2489	   within speech texts.  The synthesizer generates SPEECH-MARKER events
2490	   when it reaches these marker fields.  This header field MUST be part
2491	   of the SPEECH-MARKER event and contain the marker tag value after the
2492	   timestamp, separated by a semicolon.  In these events the timestamp
2493	   marks the time the text corresponding to the marker was emitted as
2494	   speech by the synthesizer.

2496	   This header field MUST also be returned in responses to STOP,
2497	   CONTROL, and BARGE-IN-OCCURRED methods, in the SPEAK-COMPLETE event,
2498	   and in an IN-PROGRESS SPEAK response.  In these messages, if any
2499	   markers have been encountered for the current SPEAK, the marker tag
2500	   value MUST be the last embedded marker encountered.  If no markers
2501	   have yet been encountered for the current SPEAK, only the timestamp
2502	   is REQUIRED.  Note than in these events the purpose of this header
2503	   field is to provide timestamp information associated with important
2504	   events within the lifecycle of a request (start of SPEAK processing,
2505	   end of SPEAK processing, receipt of CONTROL/STOP/BARGE-IN-OCCURRED).

2507	   timestamp           =   "timestamp" "=" time-stamp-value

2509	   time-stamp-value    =   1*20DIGIT

2511	   speech-marker       =   "Speech-Marker" ":"
2512	                           timestamp
2513	                           [";" 1*(UTFCHAR / %x20)] CRLF

2515	8.4.9.  Speech Language

2517	   This header field specifies the default language of the speech data
2518	   if the language is not specified in the markup.  The value of this
2519	   header field MUST follow RFC 5646 [RFC5646] for its values.  The
2520	   header field MAY occur in SPEAK, SET-PARAMS or GET-PARAMS requests.

2522	   speech-language     =   "Speech-Language" ":" 1*VCHAR CRLF

2524	8.4.10.  Fetch Hint

2526	   When the synthesizer needs to fetch documents or other resources like
2527	   speech markup or audio files, this header field controls the
2528	   corresponding URI access properties.  This provides client policy on
2529	   when the synthesizer should retrieve content from the server.  A
2530	   value of "prefetch" indicates the content MAY be downloaded when the
2531	   request is received, whereas "safe" indicates that content MUST NOT
2532	   be downloaded until actually referenced.  The default value is
2533	   "prefetch".  This header field MAY occur in SPEAK, SET-PARAMS or GET-
2534	   PARAMS requests.

2536	   fetch-hint          =   "Fetch-Hint" ":" ("prefetch" / "safe") CRLF

2538	8.4.11.  Audio Fetch Hint

2540	   When the synthesizer needs to fetch documents or other resources like
2541	   speech audio files, this header field controls the corresponding URI
2542	   access properties.  This provides client policy whether or not the
2543	   synthesizer is permitted to attempt to optimize speech by pre-
2544	   fetching audio.  The value is either "safe" to say that audio is only
2545	   fetched when it is referenced, never before; "prefetch" to permit,
2546	   but not require the implementation to pre-fetch the audio; or
2547	   "stream" to allow it to stream the audio fetches.  The default value
2548	   is "prefetch".  This header field MAY occur in SPEAK, SET-PARAMS or
2549	   GET-PARAMS requests.

2551	   audio-fetch-hint    =   "Audio-Fetch-Hint" ":"
2552	                           ("prefetch" / "safe" / "stream") CRLF

2554	8.4.12.  Failed URI

2556	   When a synthesizer method needs a synthesizer to fetch or access a
2557	   URI and the access fails, the server SHOULD provide the failed URI in
2558	   this header field in the method response, unless there are multiple
2559	   URI failures, in which case the server MUST provide one of the failed
2560	   URIs in this header field in the method response.

2562	   failed-uri          =   "Failed-URI" ":" absoluteURI CRLF

2564	8.4.13.  Failed URI Cause

2566	   When a synthesizer method needs a synthesizer to fetch or access a
2567	   URI and the access fails the server MUST provide the URI-specific or
2568	   protocol-specific response code for the URI in the Failed-URI header
2569	   field in the method response through this header field.  The value
2570	   encoding is UTF-8 (RFC3629 [RFC3629]) to accommodate any access
2571	   protocol, some of which might have a response string instead of a
2572	   numeric response code.
2573	   failed-uri-cause    =   "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

2575	8.4.14.  Speak Restart

2577	   When a client issues a CONTROL request to a currently speaking
2578	   synthesizer resource to jump backward, and the target jump point is
2579	   before the start of the current SPEAK request, the current SPEAK
2580	   request MUST restart from the beginning of its speech data and the
2581	   server's response to the CONTROL request MUST contain this header
2582	   field with a value of "true" indicating a restart.

2584	   speak-restart       =   "Speak-Restart" ":" BOOLEAN CRLF

2586	8.4.15.  Speak Length

2588	   This header field MAY be specified in a CONTROL method to control the
2589	   maximum length of speech to speak, relative to the current speaking
2590	   point in the currently active SPEAK request.  If numeric, the value
2591	   MUST be a positive integer.  If a header field with a Tag unit is
2592	   specified, then the speech output continues until the tag is reached
2593	   or the SPEAK request complete, whichever comes first.  This header
2594	   field MAY be specified in a SPEAK request to indicate the length to
2595	   speak from the speech data and is relative to the point in speech
2596	   that the SPEAK request starts.  The different speech length units
2597	   supported are synthesizer implementation dependent.  If a server does
2598	   not support the specified unit, the server MUST respond with a
2599	   status-code of 409 "Unsupported Header Field Value".

2601	   speak-length          =   "Speak-Length" ":" positive-length-value
2602	                             CRLF

2604	   positive-length-value =   positive-speech-length
2605	                         /   text-speech-length

2607	   text-speech-length    =   1*UTFCHAR SP "Tag"

2609	   positive-speech-length =  1*19DIGIT SP numeric-speech-unit

2611	   numeric-speech-unit   =   "Second"
2612	                         /   "Word"
2613	                         /   "Sentence"
2614	                         /   "Paragraph"

2616	8.4.16.  Load-Lexicon

2618	   This header field is used to indicate whether a lexicon has to be
2619	   loaded or unloaded.  The value "true" means to load the lexicon if
2620	   not already loaded, and the value "false" means to unload the lexicon
2621	   if it is loaded.  The default value for this header field is "true".
2622	   This header field MAY be specified in a DEFINE-LEXICON method.

2624	   load-lexicon       =   "Load-Lexicon" ":" BOOLEAN CRLF

2626	8.4.17.  Lexicon-Search-Order

2628	   This header field is used to specify a list of active pronunciation
2629	   lexicon URIs and the search order among the active lexicons.
2630	   Lexicons specified within the SSML document take precedence over the
2631	   lexicons specified in this header field.  This header field MAY be
2632	   specified in the SPEAK, SET-PARAMS, and GET-PARAMS methods.

2634	   lexicon-search-order =   "Lexicon-Search-Order" ":"
2635	             "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF

2637	8.5.  Synthesizer Message Body

2639	   A synthesizer message can contain additional information associated
2640	   with the Request, Response or Event in its message body.

2642	8.5.1.  Synthesizer Speech Data

2644	   Marked-up text for the synthesizer to speak is specified as a typed
2645	   media entity in the message body.  The speech data to be spoken by
2646	   the synthesizer can be specified inline by embedding the data in the
2647	   message body or by reference by providing a URI for accessing the
2648	   data.  In either case the data and the format used to markup the
2649	   speech needs to be of a content type supported by the server.

2651	   All MRCPv2 servers containing synthesizer resources MUST support both
2652	   plain text speech data and W3C's Speech Synthesis Markup Language
2653	   [W3C.REC-speech-synthesis-20040907] and hence MUST support the Media
2654	   Types text/plain and application/ssml+xml.  Other formats MAY be
2655	   supported.

2657	   If the speech data is to be fetched by URI reference, the Media Type
2658	   text/uri-list (see RFC2483 [RFC2483] ) is used to indicate one or
2659	   more URIs that, when dereferenced, will contain the content to be
2660	   spoken.  If a list of speech URIs is specified, the resource MUST
2661	   speak the speech data provided by each URI in the order in which the
2662	   URIs are specified in the content.

2664	   MRCPv2 clients and servers MUST support the multipart/mixed Media
2665	   Type.  This is the appropriate Media Type to use when providing a mix
2666	   of URI and inline speech data.  Embedded within the multi-part
2667	   content block there MAY be content for the text/uri-list,
2668	   application/ssml+xml and/or text/plain media types.  The character
2669	   set and encoding used in the speech data is specified according to
2670	   standard Media Type definitions.  The multi-part content MAY also
2671	   contain actual audio data.  Clients may have recorded audio clips
2672	   stored in memory or on a local device and wish to play it as part of
2673	   the SPEAK request.  The audio portions MAY be sent by the client as
2674	   part of the multi-part content block.  This audio is referenced in
2675	   the speech markup data that is another part in the multi-part content
2676	   block according to the multipart/mixed Media Type specification.

2678	   Content-Type:text/uri-list
2679	   Content-Length:...

2681	   http://www.example.com/ASR-Introduction.ssml
2682	   http://www.example.com/ASR-Document-Part1.ssml
2683	   http://www.example.com/ASR-Document-Part2.ssml
2684	   http://www.example.com/ASR-Conclusion.ssml

2686	                             URI List Example

2688	   Content-Type:application/ssml+xml
2689	   Content-Length:...

2691	   <?xml version="1.0"?>
2692	        <speak version="1.0"
2693	               xmlns="http://www.w3.org/2001/10/synthesis"
2694	               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2695	               xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2696	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2697	               xml:lang="en-US">
2698	          <p>
2699	            <s>You have 4 new messages.</s>
2700	            <s>The first is from Aldine Turnbet
2701	            and arrived at <break/>
2702	            <say-as interpret-as="vxml:time">0345p</say-as>.</s>

2704	            <s>The subject is <prosody
2705	            rate="-20%">ski trip</prosody></s>
2706	         </p>
2707	        </speak>

2709	                               SSML Example

2711	   Content-Type:multipart/mixed; boundary="break"

2713	   --break
2714	   Content-Type:text/uri-list
2715	   Content-Length:...

2717	   http://www.example.com/ASR-Introduction.ssml
2718	   http://www.example.com/ASR-Document-Part1.ssml
2719	   http://www.example.com/ASR-Document-Part2.ssml
2720	   http://www.example.com/ASR-Conclusion.ssml

2722	   --break
2723	   Content-Type:application/ssml+xml
2724	   Content-Length:...

2726	   <?xml version="1.0"?>
2727	       <speak version="1.0"
2728	              xmlns="http://www.w3.org/2001/10/synthesis"
2729	              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2730	              xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2731	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2732	              xml:lang="en-US">
2733	          <p>
2734	            <s>You have 4 new messages.</s>
2735	            <s>The first is from Stephanie Williams
2736	            and arrived at <break/>
2737	            <say-as interpret-as="vxml:time">0342p</say-as>.</s>

2739	            <s>The subject is <prosody
2740	            rate="-20%">ski trip</prosody></s>
2741	          </p>
2742	       </speak>
2743	   --break--

2745	                             Multipart Example

2747	8.5.2.  Lexicon Data

2749	   Synthesizer lexicon data from the client to the server can be
2750	   provided inline or by reference.  Either way they are carried as
2751	   typed media in the message body of the MRCPv2 request message (see
2752	   Section 8.14).

2754	   When a lexicon is specified in-line in the message, the client MUST
2755	   provide a Content-ID for that lexicon as part of the content header
2756	   fields.  The server MUST store the lexicon associated with that
2757	   Content-ID for the duration of the session.  A stored lexicon can be
2758	   overwritten by defining a new lexicon with the same Content-ID.

2760	   Lexicons that have been associated with a Content-ID can be
2761	   referenced through the "session" URI scheme (see Section 13.6).

2763	   If lexicon data is specified by external URI reference, the Media
2764	   Type text/uri-list (see RFC2483 [RFC2483] ) is used to list the one
2765	   or more URIs that may be dereferenced to obtain the lexicon data.
2766	   All MRCPv2 servers MUST support the "http" and "https" uri access
2767	   mechanisms, and MAY support other mechanisms.

2769	   If the data in the message body consists of a mix of URI and inline
2770	   lexicon data the multipart/mixed Media Type is used.  The character
2771	   set and encoding used in the lexicon data may be specified according
2772	   to standard Media Type definitions.

2774	8.6.  SPEAK Method

2776	   The SPEAK Request provides the synthesizer resource with the speech
2777	   text and initiates speech synthesis and streaming.  The SPEAK method
2778	   MAY carry voice and prosody header fields that alter the behavior of
2779	   the voice being synthesized, as well as a typed media message body
2780	   containing the actual marked-up text to be spoken.

2782	   The SPEAK method implementation MUST do a fetch of all external URIs
2783	   that are part of that operation.  If caching is implemented, this URI
2784	   fetching MUST conform to the cache control hints and parameter header
2785	   fields associated with the method in deciding whether it is to be
2786	   fetched from cache or from the external server.  If these hints/
2787	   parameters are not specified in the method, the values set for the
2788	   session using SET-PARAMS/GET-PARAMS apply.  If it was not set for the
2789	   session their default values apply.

2791	   When applying voice parameters there are 3 levels of precedence.  The
2792	   highest precedence are those specified within the speech markup text,
2793	   followed by those specified in the header fields of the SPEAK request
2794	   and hence apply for that SPEAK request only, followed by the session
2795	   default values which can be set using the SET-PARAMS request and
2796	   apply for subsequent methods invoked during the session.

2798	   If the resource was idle at the time the SPEAK request arrived at the
2799	   server and the SPEAK method is being actively processed, the resource
2800	   responds immediately with a success status code and a request-state
2801	   of IN-PROGRESS.

2803	   If the resource is in the speaking or paused state when the SPEAK
2804	   method arrives at the server, i.e. it is in the middle of processing
2805	   a previous SPEAK request, the status returns success with a request-
2806	   state of PENDING.  The server places the SPEAK request in the
2807	   synthesizer resource request queue.  The request queue operates
2808	   strictly FIFO: requests are processed serially in order of receipt.
2809	   If the current SPEAK fails, all SPEAK methods in the pending queue
2810	   are cancelled and each generates a SPEAK-COMPLETE event with a
2811	   Completion-Cause of "cancelled".

2813	   For the synthesizer resource, SPEAK is the only method that can
2814	   return a request-state of IN-PROGRESS or PENDING.  When the text has
2815	   been synthesized and played into the media stream, the resource
2816	   issues a SPEAK-COMPLETE event with the request-id of the SPEAK
2817	   request and a request-state of COMPLETE.

2819	   C->S: MRCP/2.0 ... SPEAK 543257
2820	         Channel-Identifier:32AECB23433802@speechsynth
2821	         Voice-gender:neutral
2822	         Voice-Age:25
2823	         Prosody-volume:medium
2824	         Content-Type:application/ssml+xml
2825	         Content-Length:...

2827	         <?xml version="1.0"?>
2828	            <speak version="1.0"
2829	                xmlns="http://www.w3.org/2001/10/synthesis"
2830	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2831	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2832	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2833	                xml:lang="en-US">
2834	            <p>
2835	             <s>You have 4 new messages.</s>
2836	             <s>The first is from Stephanie Williams and arrived at
2837	                <break/>
2838	                <say-as interpret-as="vxml:time">0342p</say-as>.
2839	                </s>
2840	             <s>The subject is
2841	                    <prosody rate="-20%">ski trip</prosody>
2842	             </s>
2843	            </p>
2844	           </speak>

2846	   S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
2847	         Channel-Identifier:32AECB23433802@speechsynth
2848	         Speech-Marker:timestamp=857206027059

2850	   S->C: MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE
2851	         Channel-Identifier:32AECB23433802@speechsynth
2852	         Completion-Cause:000 normal
2853	         Speech-Marker:timestamp=857206027059
2854	                               SPEAK Example

2856	8.7.  STOP

2858	   The STOP method from the client to the server tells the synthesizer
2859	   resource to stop speaking if it is speaking something.

2861	   The STOP request can be sent with an Active-Request-Id-List header
2862	   field to stop the zero or more specific SPEAK requests that may be in
2863	   queue and return a response status-code of 200 (Success).  If no
2864	   Active-Request-Id-List header field is sent in the STOP request the
2865	   server terminates all outstanding SPEAK requests.

2867	   If a STOP request successfully terminated one or more PENDING or IN-
2868	   PROGRESS SPEAK requests, then the response MUST contain an Active-
2869	   Request-Id-List header field enumerating the SPEAK request-ids that
2870	   were terminated.  Otherwise there is no Active-Request-Id-List header
2871	   field in the response.  No SPEAK-COMPLETE events are sent for such
2872	   terminated requests.

2874	   If a SPEAK request that was IN-PROGRESS and speaking was stopped, the
2875	   next pending SPEAK request, if any, becomes IN-PROGRESS at the
2876	   resource and enters the speaking state.

2878	   If a SPEAK request that was IN-PROGRESS and paused was stopped, the
2879	   next pending SPEAK request, if any, becomes IN-PROGRESS and enters
2880	   the paused state.

2882	   C->S: MRCP/2.0 ... SPEAK 543258
2883	         Channel-Identifier:32AECB23433802@speechsynth
2884	         Content-Type:application/ssml+xml
2885	         Content-Length:...

2887	         <?xml version="1.0"?>
2888	           <speak version="1.0"
2889	                xmlns="http://www.w3.org/2001/10/synthesis"
2890	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2891	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2892	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2893	                xml:lang="en-US">
2894	            <p>
2895	             <s>You have 4 new messages.</s>
2896	             <s>The first is from Stephanie Williams and arrived at
2897	                <break/>
2898	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>
2899	             <s>The subject is
2900	                 <prosody rate="-20%">ski trip</prosody></s>
2901	            </p>
2902	           </speak>

2904	   S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
2905	         Channel-Identifier:32AECB23433802@speechsynth
2906	         Speech-Marker:timestamp=857206027059

2908	   C->S: MRCP/2.0 ... STOP 543259
2909	         Channel-Identifier:32AECB23433802@speechsynth

2911	   S->C: MRCP/2.0 ... 543259 200 COMPLETE
2912	         Channel-Identifier:32AECB23433802@speechsynth
2913	         Active-Request-Id-List:543258
2914	         Speech-Marker:timestamp=857206039059

2916	                               STOP Example

2918	8.8.  BARGE-IN-OCCURRED

2920	   The BARGE-IN-OCCURRED method, when used with the synthesizer
2921	   resource, provides a client which has detected a barge-in-able event
2922	   a means to communicate the occurrence of the event to the synthesizer
2923	   resource.

2925	   This method is useful in two scenarios,
2926	   1.  The client has detected DTMF digits in the input media or some
2927	       other barge-in-able event and wants to communicate that to the
2928	       synthesizer resource.

2930	   2.  The recognizer resource and the synthesizer resource are in
2931	       different servers.  In this case the client acts as an
2932	       intermediary for the two servers.  It receives an event from the
2933	       recognition resource and sends a BARGE-IN-OCCURRED request to the
2934	       synthesizer.  In such cases, the BARGE-IN-OCCURRED method would
2935	       also have a Proxy-Sync-Id header field received from the resource
2936	       generating the original event.

2938	   If a SPEAK request is active with kill-on-barge-in enabled (see
2939	   Section 8.4.2), and the BARGE-IN-OCCURRED event is received, the
2940	   synthesizer MUST immediately stop streaming out audio.  It MUST also
2941	   terminate any speech requests queued behind the current active one,
2942	   irrespective of whether they have barge-in enabled or not.  If a
2943	   barge-in-able SPEAK request was playing and it was terminated, the
2944	   response MUST contain an Active-Request-Id-List header field listing
2945	   the request-ids of all SPEAK requests that were terminated.  The
2946	   server generates no SPEAK-COMPLETE events for these requests.

2948	   If there were no SPEAK requests terminated by the synthesizer
2949	   resource as a result of the BARGE-IN-OCCURRED method, the server MUST
2950	   respond to the BARGE-IN-OCCURRED with a status-code of 200 success,
2951	   and the response MUST NOT contain an Active-Request-Id-List header
2952	   field.

2954	   If the synthesizer and recognizer resources are part of the same
2955	   MRCPv2 session, they can be optimized for a quicker kill-on-barge-in
2956	   response if the recognizer and synthesizer interact directly.  In
2957	   these cases, the client MUST still react to a START-OF-INPUT event
2958	   from the recognizer by invoking the BARGE-IN-OCCURRED method to the
2959	   synthesizer.  The client MUST invoke the BARGE-IN-OCCURRED if it has
2960	   any outstanding requests to the synthesizer resource in either the
2961	   PENDING or IN-PROGRESS state.

2963	   C->S: MRCP/2.0 ... SPEAK 543258
2964	         Channel-Identifier:32AECB23433802@speechsynth
2965	         Voice-gender:neutral
2966	         Voice-Age:25
2967	         Prosody-volume:medium
2968	         Content-Type:application/ssml+xml
2969	         Content-Length:...

2971	         <?xml version="1.0"?>
2972	           <speak version="1.0"
2973	                xmlns="http://www.w3.org/2001/10/synthesis"
2974	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
2975	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
2976	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
2977	                xml:lang="en-US">
2978	            <p>
2979	             <s>You have 4 new messages.</s>
2980	             <s>The first is from Stephanie Williams and arrived at
2981	                <break/>
2982	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>
2983	             <s>The subject is
2984	                <prosody rate="-20%">ski trip</prosody></s>
2985	            </p>
2986	           </speak>

2988	   S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
2989	         Channel-Identifier:32AECB23433802@speechsynth
2990	         Speech-Marker:timestamp=857206027059

2992	   C->S: MRCP/2.0 ... BARGE-IN-OCCURRED 543259
2993	         Channel-Identifier:32AECB23433802@speechsynth
2994	         Proxy-Sync-Id:987654321

2996	   S->C:MRCP/2.0 ... 543259 200 COMPLETE
2997	         Channel-Identifier:32AECB23433802@speechsynth
2998	         Active-Request-Id-List:543258
2999	         Speech-Marker:timestamp=857206039059

3001	                         BARGE-IN-OCCURED Example

3003	8.9.  PAUSE

3005	   The PAUSE method from the client to the server tells the synthesizer
3006	   resource to pause speech output if it is speaking something.  If a
3007	   PAUSE method is issued on a session when a SPEAK is not active the
3008	   server MUST respond with a status-code of 402 "Method not valid in
3009	   this state".  If a PAUSE method is issued on a session when a SPEAK
3010	   is active and paused the server MUST respond with a status-code of
3011	   200 "Success".  If a SPEAK request was active the server MUST return
3012	   an Active-Request-Id-List header field whose value contains the
3013	   request-id of the SPEAK request that was paused.

3015	   C->S: MRCP/2.0 ... SPEAK 543258
3016	         Channel-Identifier:32AECB23433802@speechsynth
3017	         Voice-gender:neutral
3018	         Voice-Age:25
3019	         Prosody-volume:medium
3020	         Content-Type:application/ssml+xml
3021	         Content-Length:...

3023	         <?xml version="1.0"?>
3024	           <speak version="1.0"
3025	                xmlns="http://www.w3.org/2001/10/synthesis"
3026	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3027	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
3028	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
3029	                xml:lang="en-US">
3030	            <p>
3031	             <s>You have 4 new messages.</s>
3032	             <s>The first is from Stephanie Williams and arrived at
3033	                <break/>
3034	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>

3036	             <s>The subject is
3037	                <prosody rate="-20%">ski trip</prosody></s>
3038	            </p>
3039	           </speak>

3041	   S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
3042	         Channel-Identifier:32AECB23433802@speechsynth
3043	         Speech-Marker:timestamp=857206027059

3045	   C->S: MRCP/2.0 ... PAUSE 543259
3046	         Channel-Identifier:32AECB23433802@speechsynth

3048	   S->C: MRCP/2.0 ... 543259 200 COMPLETE
3049	         Channel-Identifier:32AECB23433802@speechsynth
3050	         Active-Request-Id-List:543258

3052	                               PAUSE Example

3054	8.10.  RESUME

3056	   The RESUME method from the client to the server tells a paused
3057	   synthesizer resource to resume speaking.  If a RESUME request is
3058	   issued on a session with no active SPEAK request, the server MUST
3059	   respond with a status-code of 402 "Method not valid in this state".
3060	   If a RESUME request is issued on a session with an active SPEAK
3061	   request that is speaking (i.e., not paused) the server MUST respond
3062	   with a status-code of 200 "Success".  If a SPEAK request was paused
3063	   the server MUST return an Active-Request-Id-List header field whose
3064	   value contains the request-id of the SPEAK request that was resumed.

3066	   C->S: MRCP/2.0 ... SPEAK 543258
3067	         Channel-Identifier:32AECB23433802@speechsynth
3068	         Voice-gender:neutral
3069	         Voice-age:25
3070	         Prosody-volume:medium
3071	         Content-Type:application/ssml+xml
3072	         Content-Length:...

3074	         <?xml version="1.0"?>
3075	           <speak version="1.0"
3076	                xmlns="http://www.w3.org/2001/10/synthesis"
3077	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3078	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
3079	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
3080	                xml:lang="en-US">
3081	            <p>
3082	             <s>You have 4 new messages.</s>
3083	             <s>The first is from Stephanie Williams and arrived at
3084	                <break/>
3085	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>
3086	             <s>The subject is
3087	                <prosody rate="-20%">ski trip</prosody></s>
3088	            </p>
3089	           </speak>

3091	   S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS@speechsynth
3092	         Channel-Identifier:32AECB23433802
3093	         Speech-Marker:timestamp=857206027059

3095	   C->S: MRCP/2.0 ... PAUSE 543259
3096	         Channel-Identifier:32AECB23433802@speechsynth

3098	   S->C: MRCP/2.0 ... 543259 200 COMPLETE
3099	         Channel-Identifier:32AECB23433802@speechsynth
3100	         Active-Request-Id-List:543258

3102	   C->S: MRCP/2.0 ... RESUME 543260
3103	         Channel-Identifier:32AECB23433802@speechsynth

3105	   S->C: MRCP/2.0 ... 543260 200 COMPLETE
3106	         Channel-Identifier:32AECB23433802@speechsynth
3107	         Active-Request-Id-List:543258

3109	                              RESUME Example

3111	8.11.  CONTROL

3113	   The CONTROL method from the client to the server tells a synthesizer
3114	   that is speaking to modify what it is speaking on the fly.  This
3115	   method is used to request the synthesizer to jump forward or backward
3116	   in what it is speaking, change speaker rate, speaker parameters, etc.
3117	   It affects only the currently IN-PROGRESS SPEAK request.  Depending
3118	   on the implementation and capability of the synthesizer resource it
3119	   may or may not support the various modifications indicated by header
3120	   fields in the CONTROL request.

3122	   When a client invokes a CONTROL method to jump forward and the
3123	   operation goes beyond the end of the active SPEAK method's text, the
3124	   CONTROL request still succeeds.  The active SPEAK request completes
3125	   and returns a SPEAK-COMPLETE event following the response to the
3126	   CONTROL method.  If there are more SPEAK requests in the queue, the
3127	   synthesizer resource starts at the beginning of the next SPEAK
3128	   request in the queue.

3130	   When a client invokes a CONTROL method to jump backward and the
3131	   operation jumps to the beginning or beyond the beginning of the
3132	   speech data of the active SPEAK method, the CONTROL request still
3133	   succeeds.  The response to the CONTROL request contains the speak-
3134	   restart header field, and the active SPEAK request restarts from the
3135	   beginning of its speech data.

3137	   These two behaviors can be used to rewind or fast-forward across
3138	   multiple speech requests, if the client wants to break up a speech
3139	   markup text to multiple SPEAK requests.

3141	   If a SPEAK request was active when the CONTROL method was received
3142	   the server MUST return an Active-Request-Id-List header field
3143	   containing the request-id of the SPEAK request that was active.

3145	   C->S: MRCP/2.0 ... SPEAK 543258
3146	         Channel-Identifier:32AECB23433802@speechsynth
3147	         Voice-gender:neutral
3148	         Voice-age:25
3149	         Prosody-volume:medium
3150	         Content-Type:application/ssml+xml
3151	         Content-Length:...

3153	         <?xml version="1.0"?>
3154	           <speak version="1.0"
3155	                xmlns="http://www.w3.org/2001/10/synthesis"
3156	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3157	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
3158	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
3159	                xml:lang="en-US">
3160	            <p>
3161	             <s>You have 4 new messages.</s>
3162	             <s>The first is from Stephanie Williams
3163	                and arrived at <break/>
3164	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>

3166	             <s>The subject is <prosody
3167	                rate="-20%">ski trip</prosody></s>
3168	            </p>
3169	           </speak>

3171	   S->C: MRCP/2.0 ... 543258 200 IN-PROGRESS
3172	         Channel-Identifier:32AECB23433802@speechsynth
3173	         Speech-Marker:timestamp=857205016059

3175	   C->S: MRCP/2.0 ... CONTROL 543259
3176	         Channel-Identifier:32AECB23433802@speechsynth
3177	         Prosody-rate:fast

3179	   S->C: MRCP/2.0 ... 543259 200 COMPLETE
3180	         Channel-Identifier:32AECB23433802@speechsynth
3181	         Active-Request-Id-List:543258
3182	         Speech-Marker:timestamp=857206027059

3184	   C->S: MRCP/2.0 ... CONTROL 543260
3185	         Channel-Identifier:32AECB23433802@speechsynth
3186	         Jump-Size:-15 Words

3188	   S->C: MRCP/2.0 ... 543260 200 COMPLETE
3189	         Channel-Identifier:32AECB23433802@speechsynth
3190	         Active-Request-Id-List:543258
3191	         Speech-Marker:timestamp=857206039059
3192	                              CONTROL Example

3194	8.12.  SPEAK-COMPLETE

3196	   This is an Event message from the synthesizer resource to the client
3197	   indicating that the corresponding SPEAK request was completed.  The
3198	   request-id field matches the request-id of the SPEAK request that
3199	   initiated the speech that just completed.  The request-state field is
3200	   set to COMPLETE by the server, indicating that this is the last event
3201	   with the corresponding request-id.  The Completion-Cause header field
3202	   specifies the cause code pertaining to the status and reason of
3203	   request completion such as the SPEAK completed normally or because of
3204	   an error, kill-on-barge-in etc.

3206	   C->S: MRCP/2.0 ... SPEAK 543260
3207	         Channel-Identifier:32AECB23433802@speechsynth
3208	         Voice-gender:neutral
3209	         Voice-age:25
3210	         Prosody-volume:medium
3211	         Content-Type:application/ssml+xml
3212	         Content-Length:...

3214	         <?xml version="1.0"?>
3215	           <speak version="1.0"
3216	                xmlns="http://www.w3.org/2001/10/synthesis"
3217	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3218	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
3219	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
3220	                xml:lang="en-US">
3221	            <p>
3222	             <s>You have 4 new messages.</s>
3223	             <s>The first is from Stephanie Williams
3224	                and arrived at <break/>
3225	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>
3226	             <s>The subject is
3227	                <prosody rate="-20%">ski trip</prosody></s>
3228	            </p>
3229	           </speak>

3231	   S->C: MRCP/2.0 ... 543260 200 IN-PROGRESS
3232	         Channel-Identifier:32AECB23433802@speechsynth
3233	         Speech-Marker:timestamp=857206027059

3235	   S->C: MRCP/2.0 ... SPEAK-COMPLETE 543260 COMPLETE
3236	         Channel-Identifier:32AECB23433802@speechsynth
3237	         Completion-Cause:000 normal
3238	         Speech-Marker:timestamp=857206039059
3239	                          SPEAK-COMPLETE Example

3241	8.13.  SPEECH-MARKER

3243	   This is an event generated by the synthesizer resource to the client
3244	   when the synthesizer encounters a marker tag in the speech markup it
3245	   is currently processing.  The value of the request-id field MUST
3246	   match that of the corresponding SPEAK request.  The request-state
3247	   field MUST have the value "IN-PROGRESS" as the speech is still not
3248	   complete.  The value of the speech marker tag hit, describing where
3249	   the synthesizer is in the speech markup, MUST be returned in the
3250	   Speech-Marker header field, along with an NTP timestamp indicating
3251	   the instant in the output speech stream that the marker was
3252	   encountered.  The SPEECH-MARKER event MUST also be generated with a
3253	   null marker value and output NTP timestamp when a SPEAK request in
3254	   Pending-State (i.e. in the queue) changes state to IN-PROGRESS and
3255	   starts speaking.  The NTP timestamp MUST be synchronized with the RTP
3256	   timestamp used to generate the speech stream through standard RTCP
3257	   machinery.

3259	   C->S: MRCP/2.0 ... SPEAK 543261
3260	         Channel-Identifier:32AECB23433802@speechsynth
3261	         Voice-gender:neutral
3262	         Voice-age:25
3263	         Prosody-volume:medium
3264	         Content-Type:application/ssml+xml
3265	         Content-Length:...

3267	         <?xml version="1.0"?>
3268	           <speak version="1.0"
3269	                xmlns="http://www.w3.org/2001/10/synthesis"
3270	                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
3271	                xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
3272	                   http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
3273	                xml:lang="en-US">
3274	            <p>
3275	             <s>You have 4 new messages.</s>
3276	             <s>The first is from Stephanie Williams
3277	                and arrived at <break/>
3278	                <say-as interpret-as="vxml:time">0342p</say-as>.</s>
3279	                <mark name="here"/>
3280	             <s>The subject is
3281	                <prosody rate="-20%">ski trip</prosody>
3282	             </s>
3283	             <mark name="ANSWER"/>
3284	            </p>
3285	           </speak>

3287	   S->C: MRCP/2.0 ... 543261 200 IN-PROGRESS
3288	         Channel-Identifier:32AECB23433802@speechsynth
3289	         Speech-Marker:timestamp=857205015059

3291	   S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS
3292	         Channel-Identifier:32AECB23433802@speechsynth
3293	         Speech-Marker:timestamp=857206027059;here

3295	   S->C: MRCP/2.0 ... SPEECH-MARKER 543261 IN-PROGRESS
3296	         Channel-Identifier:32AECB23433802@speechsynth
3297	         Speech-Marker:timestamp=857206039059;ANSWER

3299	   S->C: MRCP/2.0 ... SPEAK-COMPLETE 543261 COMPLETE
3300	         Channel-Identifier:32AECB23433802@speechsynth
3301	         Completion-Cause:000 normal
3302	         Speech-Marker:timestamp=857207689259;ANSWER

3304	                           SPEECH-MARKER Example

3306	8.14.  DEFINE-LEXICON

3308	   The DEFINE-LEXICON method, from the client to the server, provides a
3309	   lexicon and tells the server to load or unload the lexicon (see
3310	   Section 8.4.16).  The media type of the lexicon is provided in the
3311	   Content-Type header (see Section 8.5.2).  One such media type is PLS
3312	   [W3C.REC-pronunciation-lexicon-20081014].

3314	   If the server resource is in the speaking or paused state, the server
3315	   MUST respond with a failure status-code of 402 "Method not valid in
3316	   this state".

3318	   If the resource is in the idle state and is able to successfully
3319	   load/unload the lexicon the status MUST return a 200 "Success"
3320	   status-code and the request-state MUST be COMPLETE.

3322	   If the synthesizer could not define the lexicon for some reason, for
3323	   example because the download failed or the lexicon was in an
3324	   unsupported form, the server MUST respond with a failure status-code
3325	   of 407, and a Completion-Cause header field describing the failure
3326	   reason.

3328	9.  Speech Recognizer Resource

3330	   The speech recognizer resource receives an incoming voice stream and
3331	   provides the client with an interpretation of what was spoken in
3332	   textual form.

3334	   The recognizer resource is controlled by MRCPv2 requests from the
3335	   client.  The recognizer resource can both respond to these requests
3336	   and generate asynchronous events to the client to indicate conditions
3337	   of interest during the processing of the method.

3339	   This section applies to the following resource types.
3340	   1.  speechrecog
3341	   2.  dtmfrecog

3343	   The difference between the above two resources is in their level of
3344	   support for recognition grammars.  The "dtmfrecog" resource type is
3345	   capable of recognizing only DTMF digits and hence accepts only DTMF
3346	   grammars.  It only generates barge-in for DTMF inputs and ignores
3347	   speech.  The "speechrecog" resource type can recognize regular speech
3348	   as well as DTMF digits and hence MUST support grammars describing
3349	   either speech or DTMF.  This resource generates barge-in events for
3350	   speech and/or DTMF.  By analyzing the grammars that are activated by
3351	   the RECOGNIZE method, it determines if a barge-in should occur for
3352	   speech and/or DTMF.  When the recognizer decides it needs to generate
3353	   barge-in it also generates a START-OF-INPUT event to the client.  The
3354	   recognition resource MAY support recognition in the normal or hotword
3355	   modes or both (although note that a single speechrecog resource does
3356	   not perform normal and hotword mode recognition simultaneously).  For
3357	   implementations where a single recognition resource does not support
3358	   both modes, or simultaneous normal and hotword recognition is
3359	   desired, the two modes can be invoked through separate resources
3360	   allocated to the same SIP dialog (with different MRCP session
3361	   identifiers) and share the RTP audio feed.

3363	   The capabilities of the recognition resource are enumerated below:

3365	   Normal Mode Recognition  Normal mode recognition tries to match all
3366	      of the speech or DTMF against the grammar and returns a no-match
3367	      status if the input fails to match or the method times out.
3368	   Hotword Mode Recognition  Hotword mode is where the recognizer looks
3369	      for a match against specific speech grammar or DTMF sequence and
3370	      ignores speech or DTMF that does not match.  The recognition
3371	      completes only for a successful match of grammar or if the client
3372	      cancels the request or if there is a a non-input or recognition
3373	      timeout.
3374	   Voice Enrolled Grammars  A recognition resource MAY optionally
3375	      support Voice Enrolled Grammars.  With this functionality,
3376	      enrollment is performed using a person's voice.  For example, a
3377	      list of contacts can be created and maintained by recording the
3378	      person's names using the caller's voice.  This technique is
3379	      sometimes also called speaker-dependent recognition.
3380	   Interpretation  A recognition resource MAY be employed strictly for
3381	      its natural language interpretation capabilities by supplying it
3382	      with a text string as input instead of speech.  In this mode the
3383	      resource takes text as input and produces an "interpretation" of
3384	      the input according to the supplied grammar.

3386	   Voice Enrollment has the concept of an enrollment session.  A session
3387	   to add a new phrase to a personal grammar involves the initial
3388	   enrollment followed by a repeat of enough utterances before
3389	   committing the new phrase to the personal grammar.  Each time an
3390	   utterance is recorded, it is compared for similarity with the other
3391	   samples and a clash test is performed against other entries in the
3392	   personal grammar to ensure there are no similar and confusable
3393	   entries.

3395	   Enrollment is done using a recognizer resource.  Controlling which
3396	   utterances are to be considered for enrollment of a new phrase is
3397	   done by setting a header field (see Section 9.4.39) in the Recognize
3398	   request.

3400	   Interpretation is accomplished through the INTERPRET method
3401	   (Section 9.20) and the Interpret-Text header field (Section 9.4.30).

3403	9.1.  Recognizer State Machine

3405	   The recognizer resource maintains a state machine to process MRCPv2
3406	   requests from the client.

3408	   Idle                   Recognizing               Recognized
3409	   State                  State                     State
3410	    |                       |                          |
3411	    |---------RECOGNIZE---->|---RECOGNITION-COMPLETE-->|
3412	    |<------STOP------------|<-----RECOGNIZE-----------|
3413	    |                       |                          |
3414	    |              |--------|              |-----------|
3415	    |       START-OF-INPUT  |       GET-RESULT         |
3416	    |              |------->|              |---------->|
3417	    |------------|          |                          |
3418	    |      DEFINE-GRAMMAR   |----------|               |
3419	    |<-----------|          | START-INPUT-TIMERS       |
3420	    |                       |<---------|               |
3421	    |------|                |                          |
3422	    |  INTERPRET            |                          |
3423	    |<-----|                |------|                   |
3424	    |                       |   RECOGNIZE              |
3425	    |-------|               |<-----|                   |
3426	    |      STOP                                        |
3427	    |<------|                                          |
3428	    |<-------------------STOP--------------------------|
3429	    |<-------------------DEFINE-GRAMMAR----------------|

3431	                         Recognizer State Machine

3433	   If a recognition resource supports voice enrolled grammars, starting
3434	   an enrollment session does not change the state of the recognizer
3435	   resource.  Once an enrollment session is started, then utterances are
3436	   enrolled by calling the RECOGNIZE method repeatedly.  The state of
3437	   the speech recognizer resource goes from IDLE to RECOGNIZING state
3438	   each time RECOGNIZE is called.

3440	9.2.  Recognizer Methods

3442	   The recognizer supports the following methods.

3444	   recognizer-method    =  recog-only-method
3445	                        /  enrollment-method

3447	   recog-only-method    =  "DEFINE-GRAMMAR"
3448	                        /  "RECOGNIZE"
3449	                        /  "INTERPRET"
3450	                        /  "GET-RESULT"
3451	                        /  "START-INPUT-TIMERS"
3452	                        /  "STOP"

3454	   It is OPTIONAL for a recognizer resource to support voice enrolled
3455	   grammars.  If the recognizer resource does support voice enrolled
3456	   grammars it MUST support the following methods.

3458	   enrollment-method    =  "START-PHRASE-ENROLLMENT"
3459	                        /  "ENROLLMENT-ROLLBACK"
3460	                        /  "END-PHRASE-ENROLLMENT"
3461	                        /  "MODIFY-PHRASE"
3462	                        /  "DELETE-PHRASE"

3464	9.3.  Recognizer Events

3466	   The recognizer can generate the following events.

3468	   recognizer-event     =  "START-OF-INPUT"
3469	                        /  "RECOGNITION-COMPLETE"
3470	                        /  "INTERPRETATION-COMPLETE"

3472	9.4.  Recognizer Header Fields

3474	   A recognizer message can contain header fields containing request
3475	   options and information to augment the Method, Response or Event
3476	   message it is associated with.

3478	   recognizer-header    =  recog-only-header
3479	                        /  enrollment-header

3481	   recog-only-header    =  confidence-threshold
3482	                        /  sensitivity-level
3483	                        /  speed-vs-accuracy
3484	                        /  n-best-list-length
3485	                        /  no-input-timeout
3486	                        /  input-type
3487	                        /  recognition-timeout
3488	                        /  waveform-uri
3489	                        /  input-waveform-uri
3490	                        /  completion-cause
3491	                        /  completion-reason
3492	                        /  recognizer-context-block
3493	                        /  start-input-timers
3494	                        /  speech-complete-timeout
3495	                        /  speech-incomplete-timeout
3496	                        /  dtmf-interdigit-timeout
3497	                        /  dtmf-term-timeout
3498	                        /  dtmf-term-char
3499	                        /  failed-uri
3500	                        /  failed-uri-cause
3501	                        /  save-waveform
3502	                        /  media-type
3503	                        /  new-audio-channel
3504	                        /  speech-language
3505	                        /  ver-buffer-utterance
3506	                        /  recognition-mode
3507	                        /  cancel-if-queue
3508	                        /  hotword-max-duration
3509	                        /  hotword-min-duration
3510	                        /  interpret-text
3511	                        /  dtmf-buffer-time
3512	                        /  clear-dtmf-buffer
3513	                        /  early-no-match

3515	   If a recognition resource supports voice enrolled grammars, the
3516	   following header fields are also used.

3518	   enrollment-header    =  num-min-consistent-pronunciations
3519	                        /  consistency-threshold
3520	                        /  clash-threshold
3521	                        /  personal-grammar-uri
3522	                        /  enroll-utterance
3523	                        /  phrase-id
3524	                        /  phrase-nl
3525	                        /  weight
3526	                        /  save-best-waveform
3527	                        /  new-phrase-id
3528	                        /  confusable-phrases-uri
3529	                        /  abort-phrase-enrollment

3531	   For enrollment-specific header fields that can appear as part of SET-
3532	   PARAMS or GET-PARAMS methods, the following general rule applies: the
3533	   START-PHRASE-ENROLLMENT method MUST be invoked before these header
3534	   fields may be set through the SET-PARAMS method or retrieved through
3535	   the GET-PARAMS method.

3537	   Note that the Waveform-URI header field of the Recognizer resource
3538	   can also appear in the response to the END-PHRASE-ENROLLMENT method.

3540	9.4.1.  Confidence Threshold

3542	   When a recognition resource recognizes or matches a spoken phrase
3543	   with some portion of the grammar, it associates a confidence level
3544	   with that match.  The Confidence-Threshold header field tells the
3545	   recognizer resource what confidence level the client considers a
3546	   successful match.  This is a float value between 0.0-1.0 indicating
3547	   the recognizer's confidence in the recognition.  If the recognizer
3548	   determines that there is no candidate match with a confidence that is
3549	   greater than the confidence threshold, then it MUST return no-match
3550	   as the recognition result.  This header field MAY occur in RECOGNIZE,
3551	   SET-PARAMS or GET-PARAMS.  The default value for this header field is
3552	   implementation specific, as is the interpretation of any specific
3553	   value for this header field.  Although values for servers from
3554	   different vendors are not comparable, it is expected that clients
3555	   will tune this value over time for a given server.

3557	   confidence-threshold     =  "Confidence-Threshold" ":" FLOAT CRLF

3559	9.4.2.  Sensitivity Level

3561	   To filter out background noise and not mistake it for speech, the
3562	   recognizer resource supports a variable level of sound sensitivity.
3563	   The Sensitivity-Level header field is a float value between 0.0 and
3564	   1.0 and allows the client to set the sensitivity level for the
3565	   recognizer.  This header field MAY occur in RECOGNIZE, SET-PARAMS or
3566	   GET-PARAMS.  A higher value for this header field means higher
3567	   sensitivity.  The default value for this header field is
3568	   implementation specific, as is the interpretation of any specific
3569	   value for this header field.  Although values for servers from
3570	   different vendors are not comparable, it is expected that clients
3571	   will tune this value over time for a given server.

3573	   sensitivity-level        =  "Sensitivity-Level" ":" FLOAT CRLF

3575	9.4.3.  Speed Vs Accuracy

3577	   Depending on the implementation and capability of the recognizer
3578	   resource it may be tunable towards Performance or Accuracy.  Higher
3579	   accuracy may mean more processing and higher CPU utilization, meaning
3580	   fewer active sessions per server and vice versa.  The value is a
3581	   float between 0.0 and 1.0.  A value of 0.0 means fastest recognition.
3582	   A value of 1.0 means best accuracy.  This header field MAY occur in
3583	   RECOGNIZE, SET-PARAMS or GET-PARAMS.  The default value for this
3584	   header field is implementation specific.  Although values for servers
3585	   from different vendors are not comparable, it is expected that
3586	   clients will tune this value over time for a given server.

3588	   speed-vs-accuracy        =  "Speed-Vs-Accuracy" ":" FLOAT CRLF

3590	9.4.4.  N Best List Length

3592	   When the recognizer matches an incoming stream with the grammar, it
3593	   may come up with more than one alternative match because of
3594	   confidence levels in certain words or conversation paths.  If this
3595	   header field is not specified, by default, the recognition resource
3596	   returns only the best match above the confidence threshold.  The
3597	   client, by setting this header field, can ask the recognition
3598	   resource to send it more than 1 alternative.  All alternatives must
3599	   still be above the Confidence-Threshold.  A value greater than one
3600	   does not guarantee that the recognizer will provide the requested
3601	   number of alternatives.  This header field MAY occur in RECOGNIZE,
3602	   SET-PARAMS or GET-PARAMS.  The minimum value for this header field is
3603	   1.  The default value for this header field is 1.

3605	   n-best-list-length       =  "N-Best-List-Length" ":" 1*19DIGIT CRLF

3607	9.4.5.  Input Type

3609	   When the recognizer detects barge-in-able input and generates a
3610	   START-OF-INPUT event, that event MUST carry this header field to
3611	   specify whether the input that caused the barge-in was DTMF or
3612	   speech.

3614	   input-type         =  "Input-Type" ":"  inputs CRLF
3615	   inputs             =  "speech" / "dtmf"

3617	9.4.6.  No Input Timeout

3619	   When recognition is started and there is no speech detected for a
3620	   certain period of time, the recognizer can send a RECOGNITION-
3621	   COMPLETE event to the client with a Completion-Cause of "no-input-
3622	   timeout" and terminate the recognition operation.  The client can use
3623	   the No-Input-Timeout header field to set this timeout.  The value is
3624	   in milliseconds and can range from 0 to an implementation specific
3625	   maximum value.  This header field MAY occur in RECOGNIZE, SET-PARAMS
3626	   or GET-PARAMS.  The default value is implementation specific.

3628	   no-input-timeout         =  "No-Input-Timeout" ":" 1*19DIGIT CRLF

3630	9.4.7.  Recognition Timeout

3632	   When recognition is started and there is no match for a certain
3633	   period of time, the recognizer can send a RECOGNITION-COMPLETE event
3634	   to the client and terminate the recognition operation.  The
3635	   Recognition-Timeout header field allows the client to set this
3636	   timeout value.  The value is in milliseconds.  The value for this
3637	   header field ranges from 0 to an implementation specific maximum
3638	   value.  The default value is 10 seconds.  This header field MAY occur
3639	   in RECOGNIZE, SET-PARAMS or GET-PARAMS.

3641	   recognition-timeout      =  "Recognition-Timeout" ":" 1*19DIGIT CRLF

3643	9.4.8.  Waveform URI

3645	   If the Save-Waveform header field is set to true, the recognizer MUST
3646	   record the incoming audio stream of the recognition into a stored
3647	   form and provide a URI for the client to access it.  This header
3648	   field MUST be present in the RECOGNITION-COMPLETE event if the Save-
3649	   Waveform header field was set to true.  The value of the header field
3650	   MUST be empty if there was some error condition preventing the server
3651	   from recording.  Otherwise, the URI generated by the server MUST be
3652	   unambiguous across the server and all its recognition sessions.  The
3653	   content associated with the URI MUST be available to the client until
3654	   the MRCPv2 session terminates.

3656	   Similarly, if the Save-Best-Waveform header field is set to true, the
3657	   recognizer MUST save the audio stream for the best repetition of the
3658	   phrase that was used during the enrollment session.  The recognizer
3659	   MUST then record the recognized audio and make it available to the
3660	   client by returning a URI in the Waveform-URI header field in the
3661	   response to the END-PHRASE-ENROLLMENT method.  The value of the
3662	   header field MUST be empty if there was some error condition
3663	   preventing the server from recording.  Otherwise, the URI generated
3664	   by the server MUST be unambiguous across the server and all its
3665	   recognition sessions.  The content associated with the URI MUST be
3666	   available to the client until the MRCPv2 session terminates.  See the
3667	   discussion on the sensitivity of saved waveforms in Section 12.

3669	   The server MUST also return the size in octets and the duration in
3670	   milliseconds of the recorded audio waveform as parameters associated
3671	   with the header field.

3673	   waveform-uri             =  "Waveform-URI" ":" ["<" uri ">"
3674	                               ";" "size" "=" 1*19DIGIT
3675	                               ";" "duration" "=" 1*19DIGIT] CRLF

3677	9.4.9.  Media Type

3679	   This header field MAY be specified in the SET-PARAMS, GET-PARAMS or
3680	   the RECOGNIZE methods and tells the server resource the Media Type in
3681	   which to store captured audio or video such as the one captured and
3682	   returned by the Waveform-URI header field.

3684	   media-type               =  "Media-Type" ":" media-type-value
3685	                               CRLF

3687	9.4.10.  Input-Waveform-URI

3689	   This optional header field specifies a URI pointing to audio content
3690	   to be processed by the RECOGNIZE operation.  This enables the client
3691	   to request recognition from a specified buffer or audio file.

3693	   input-waveform-uri       =  "Input-Waveform-URI" ":" uri CRLF

3695	9.4.11.  Completion Cause

3697	   This header field MUST be part of a RECOGNITION-COMPLETE event coming
3698	   from the recognizer resource to the client.  It indicates the reason
3699	   behind the RECOGNIZE method completion.  This header field MUST be
3700	   sent in the DEFINE-GRAMMAR and RECOGNIZE responses, if they return
3701	   with a failure status and a COMPLETE state.  In the ABNF below, the
3702	   'cause-code' contains a numerical value selected from the Cause-Code
3703	   column of the following table.  The 'cause-name' contains the
3704	   corresponding token selected from the Cause-Name column.

3706	   completion-cause         =  "Completion-Cause" ":" cause-code SP
3707	                               cause-name CRLF
3708	   cause-code               =  3DIGIT
3709	   cause-name               =  *VCHAR
3710	   +---------+--------------------------+------------------------------+
3711	   | Cause-C | Cause-Name               | Description                  |
3712	   | ode     |                          |                              |
3713	   +---------+--------------------------+------------------------------+
3714	   | 000     | success                  | RECOGNIZE completed with a   |
3715	   |         |                          | match or DEFINE-GRAMMAR      |
3716	   |         |                          | succeeded in downloading and |
3717	   |         |                          | compiling the grammar        |
3718	   | 001     | no-match                 | RECOGNIZE completed, but no  |
3719	   |         |                          | match was found              |
3720	   | 002     | no-input-timeout         | RECOGNIZE completed without  |
3721	   |         |                          | a match due to a             |
3722	   |         |                          | no-input-timeout             |
3723	   | 003     | hotword-maxtime          | RECOGNIZE in hotword mode    |
3724	   |         |                          | completed without a match    |
3725	   |         |                          | due to a recognition-timeout |
3726	   | 004     | grammar-load-failure     | RECOGNIZE failed due grammar |
3727	   |         |                          | load failure.                |
3728	   | 005     | grammar-compilation-fail | RECOGNIZE failed due to      |
3729	   |         | ure                      | grammar compilation failure. |
3730	   | 006     | recognizer-error         | RECOGNIZE request terminated |
3731	   |         |                          | prematurely due to a         |
3732	   |         |                          | recognizer error.            |
3733	   | 007     | speech-too-early         | RECOGNIZE request terminated |
3734	   |         |                          | because speech was too       |
3735	   |         |                          | early.  This happens when    |
3736	   |         |                          | the audio stream is already  |
3737	   |         |                          | "in-speech" when the         |
3738	   |         |                          | RECOGNIZE request was        |
3739	   |         |                          | received.                    |
3740	   | 008     | success-maxtime          | RECOGNIZE request terminated |
3741	   |         |                          | because speech was too long  |
3742	   |         |                          | but whatever was spoken till |
3743	   |         |                          | that point was a full match. |
3744	   | 009     | uri-failure              | Failure accessing a URI.     |
3745	   | 010     | language-unsupported     | Language not supported.      |
3746	   | 011     | cancelled                | A new RECOGNIZE cancelled    |
3747	   |         |                          | this one, or a prior         |
3748	   |         |                          | RECOGNIZE failed while this  |
3749	   |         |                          | one was still in the queue.  |
3750	   | 012     | semantics-failure        | Recognition succeeded but    |
3751	   |         |                          | semantic interpretation of   |
3752	   |         |                          | the recognized input failed. |
3753	   |         |                          | The RECOGNITION-COMPLETE     |
3754	   |         |                          | event MUST contain the       |
3755	   |         |                          | Recognition result with only |
3756	   |         |                          | input text and no            |
3757	   |         |                          | interpretation.              |
3758	   | 013     | partial-match            | Speech Incomplete timeout    |
3759	   |         |                          | expired before there was a   |
3760	   |         |                          | full match.  But whatever    |
3761	   |         |                          | that was spoken till that    |
3762	   |         |                          | point was a partial match to |
3763	   |         |                          | one or more grammars.        |
3764	   | 014     | partial-match-maxtime    | The Recognition-Timer        |
3765	   |         |                          | expired before full match    |
3766	   |         |                          | was achieved.  But whatever  |
3767	   |         |                          | was spoken till that point   |
3768	   |         |                          | was a partial match to one   |
3769	   |         |                          | or more grammars.            |
3770	   | 015     | no-match-maxtime         | The Recognition-Timer        |
3771	   |         |                          | expired.  Whatever was       |
3772	   |         |                          | spoken till that point       |
3773	   |         |                          | either did not match any of  |
3774	   |         |                          | the grammars.  This cause    |
3775	   |         |                          | could also be returned if    |
3776	   |         |                          | the recognizer does not      |
3777	   |         |                          | support detecting partial    |
3778	   |         |                          | grammar matches.             |
3779	   | 016     | grammar-definition-failu | any DEFINE-GRAMMAR error     |
3780	   |         | re                       | other than                   |
3781	   |         |                          | grammar-load-failure and     |
3782	   |         |                          | grammar-compilation-failure. |
3783	   +---------+--------------------------+------------------------------+

3785	9.4.12.  Completion Reason

3787	   This header field MAY be specified in a RECOGNITION-COMPLETE event
3788	   coming from the recognizer resource to the client.  This contains the
3789	   reason text behind the RECOGNIZE request completion.  The server uses
3790	   this header field to communicate text describing the reason for the
3791	   failure, such as the specific error encountered in parsing a grammar
3792	   markup.

3794	   The completion reason text is provided for client use in logs and for
3795	   debugging and instrumentation purposes.  Clients MUST NOT interpret
3796	   the completion reason text.

3798	   completion-reason        =  "Completion-Reason" ":"
3799	                               quoted-string CRLF

3801	9.4.13.  Recognizer Context Block

3803	   This header field MAY be sent as part of the SET-PARAMS or GET-PARAMS
3804	   request.  If the GET-PARAMS method contains this header field with no
3805	   value, then it is a request to the recognizer to return the
3806	   recognizer context block.  The response to such a message MAY contain
3807	   a recognizer context block as a typed media message body.  If the
3808	   server returns a recognizer context block, the response MUST contain
3809	   this header field and its value MUST match the Content-ID of the
3810	   corresponding media block.

3812	   If the SET-PARAMS method contains this header field, it MUST also
3813	   contain a message body containing the recognizer context data and a
3814	   Content-ID matching this header field value.  This Content-ID MUST
3815	   match the Content-ID that came with the context data during the GET-
3816	   PARAMS operation.

3818	   An implementation choosing to use this mechanism to hand off
3819	   recognizer context data between servers MUST distinguish its
3820	   implementation-specific block of data by using an IANA-registered
3821	   content type in the IANA Media Type vendor tree.

3823	   recognizer-context-block  =  "Recognizer-Context-Block" ":"
3824	                                [1*VCHAR] CRLF

3826	9.4.14.  Start Input Timers

3828	   This header field MAY be sent as part of the RECOGNIZE request.  A
3829	   value of false tells the recognizer to start recognition, but not to
3830	   start the no-input timer yet.  The recognizer MUST NOT start the
3831	   timers until the client sends a START-INPUT-TIMERS request to the
3832	   recognizer.  This is useful in the scenario when the recognizer and
3833	   synthesizer engines are not part of the same session.  In such
3834	   configurations, when a kill-on-barge-in prompt is being played (see
3835	   Section 8.4.2), the client wants the RECOGNIZE request to be
3836	   simultaneously active so that it can detect and implement kill-on-
3837	   barge-in.  However, the recognizer SHOULD NOT start the no-input
3838	   timers until the prompt is finished.  The default value is "true".

3840	   start-input-timers  =  "Start-Input-Timers" ":" BOOLEAN CRLF

3842	9.4.15.  Speech Complete Timeout

3844	   This header field specifies the length of silence required following
3845	   user speech before the speech recognizer finalizes a result (either
3846	   accepting it or generating a nomatch event).  The speech-complete-
3847	   timeout value applies when the recognizer currently has a complete
3848	   match against an active grammar, and specifies how long the
3849	   recognizer MUST wait for more input before declaring a match.  By
3850	   contrast, the incomplete timeout is used when the speech is an
3851	   incomplete match to an active grammar.  The value is in milliseconds.

3853	  speech-complete-timeout = "Speech-Complete-Timeout" ":" 1*19DIGIT CRLF
3854	   A long Speech-Complete-Timeout value delays the result to the client
3855	   and therefore makes the application's response to a user slow.  A
3856	   short Speech-Complete-Timeout may lead to an utterance being broken
3857	   up inappropriately.  Reasonable speech complete timeout values are
3858	   typically in the range of 0.3 seconds to 1.0 seconds.  The value for
3859	   this header field ranges from 0 to an implementation specific maximum
3860	   value.  The default value for this header field is implementation
3861	   specific.  This header field MAY occur in RECOGNIZE, SET-PARAMS or
3862	   GET-PARAMS.

3864	9.4.16.  Speech Incomplete Timeout

3866	   This header field specifies the required length of silence following
3867	   user speech after which a recognizer finalizes a result.  The
3868	   incomplete timeout applies when the speech prior to the silence is an
3869	   incomplete match of all active grammars.  In this case, once the
3870	   timeout is triggered, the partial result is rejected (with a
3871	   Completion-Cause of "partial-match").  The value is in milliseconds.
3872	   The value for this header field ranges from 0 to an implementation
3873	   specific maximum value.  The default value for this header field is
3874	   implementation specific.

3876	   speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":" 1*19DIGIT
3877	                                CRLF

3879	   The Speech-Incomplete-Timeout also applies when the speech prior to
3880	   the silence is a complete match of an active grammar, but where it is
3881	   possible to speak further and still match the grammar.  By contrast,
3882	   the complete timeout is used when the speech is a complete match to
3883	   an active grammar and no further spoken words can continue to
3884	   represent a match.

3886	   A long Speech-Incomplete-Timeout value delays the result to the
3887	   client and therefore makes the application's response to a user slow.
3888	   A short Speech-Incomplete-Timeout may lead to an utterance being
3889	   broken up inappropriately.

3891	   The Speech-Incomplete-Timeout is usually longer than the Speech-
3892	   Complete-Timeout to allow users to pause mid-utterance (for example,
3893	   to breathe).  This header field MAY occur in RECOGNIZE, SET-PARAMS or
3894	   GET-PARAMS.

3896	9.4.17.  DTMF Interdigit Timeout

3898	   This header field specifies the inter-digit timeout value to use when
3899	   recognizing DTMF input.  The value is in milliseconds.  The value for
3900	   this header field ranges from 0 to an implementation specific maximum
3901	   value.  The default value is 5 seconds.  This header field MAY occur
3902	   in RECOGNIZE, SET-PARAMS or GET-PARAMS.

3904	  dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":" 1*19DIGIT CRLF

3906	9.4.18.  DTMF Term Timeout

3908	   This header field specifies the terminating timeout to use when
3909	   recognizing DTMF input.  The DTMF-Term-Timeout applies only when no
3910	   additional input is allowed by the grammar; otherwise, the DTMF-
3911	   Interdigit-Timeout applies.  The value is in milliseconds.  The value
3912	   for this header field ranges from 0 to an implementation specific
3913	   maximum value.  The default value is 10 seconds.  This header field
3914	   MAY occur in RECOGNIZE, SET-PARAMS or GET-PARAMS.

3916	   dtmf-term-timeout        =  "DTMF-Term-Timeout" ":" 1*19DIGIT CRLF

3918	9.4.19.  DTMF-Term-Char

3920	   This header field specifies the terminating DTMF character for DTMF
3921	   input recognition.  The default value is NULL which is indicated by
3922	   an empty header field value.  This header field MAY occur in
3923	   RECOGNIZE, SET-PARAMS or GET-PARAMS.

3925	   dtmf-term-char           =  "DTMF-Term-Char" ":" VCHAR CRLF

3927	9.4.20.  Failed URI

3929	   When a recognizer needs to fetch or access a URI and the access
3930	   fails, the server SHOULD provide the failed URI in this header field
3931	   in the method response, unless there are multiple URI failures, in
3932	   which case one of the failed URIs MUST be provided in this header
3933	   field in the method response.

3935	   failed-uri               =  "Failed-URI" ":" absoluteURI CRLF

3937	9.4.21.  Failed URI Cause

3939	   When a recognizer method needs a recognizer to fetch or access a URI
3940	   and the access fails the server MUST provide the URI specific or
3941	   protocol specific response code for the URI in the Failed-URI header
3942	   field through this header field in the method response.  The value
3943	   encoding is UTF-8 (RFC3629 [RFC3629]) to accommodate any access
3944	   protocol, some of which might have a response string instead of a
3945	   numeric response code.

3947	   failed-uri-cause         =  "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

3949	9.4.22.  Save Waveform

3951	   This header field allows the client to request the recognizer
3952	   resource to save the audio input to the recognizer.  The recognizer
3953	   resource MUST then attempt to record the recognized audio, without
3954	   endpointing, and make it available to the client in the form of a URI
3955	   returned in the Waveform-URI header field in the RECOGNITION-COMPLETE
3956	   event.  If there was an error in recording the stream or the audio
3957	   content is otherwise not available, the recognizer MUST return an
3958	   empty Waveform-URI header field.  The default value for this field is
3959	   "false".  This header field MAY occur in RECOGNIZE, SET-PARAMS or
3960	   GET-PARAMS.  See the discussion on the sensitivity of saved waveforms
3961	   in Section 12.

3963	   save-waveform            =  "Save-Waveform" ":" BOOLEAN CRLF

3965	9.4.23.  New Audio Channel

3967	   This header field MAY be specified in a RECOGNIZE request and allows
3968	   the client to tell the server that, from this point on, further input
3969	   audio comes from a different audio source, channel or speaker.  If
3970	   the recognition resource had collected any input statistics or
3971	   adaptation state, the recognition resource MUST do what is
3972	   appropriate for the specific recognition technology, which includes
3973	   but is not limited to discarding any collected input statistics or
3974	   adaptation state before starting the RECOGNIZE request.  Note that if
3975	   there are multiple resources that are sharing a media stream and are
3976	   collecting or using this data, and the client issues this header
3977	   field to one of the resources, the reset operation applies to all
3978	   resources that use the shared media stream.  This helps in a number
3979	   of use cases, including where the client wishes to reuse an open
3980	   recognition session with an existing media session for multiple
3981	   telephone calls.

3983	   new-audio-channel        =  "New-Audio-Channel" ":" BOOLEAN
3984	                               CRLF

3986	9.4.24.  Speech-Language

3988	   This header field specifies the language of recognition grammar data
3989	   within a session or request, if it is not specified within the data.
3990	   The value of this header field MUST follow RFC 5646 [RFC5646] for its
3991	   values.  This MAY occur in DEFINE-GRAMMAR, RECOGNIZE, SET-PARAMS or
3992	   GET-PARAMS request.

3994	   speech-language          =  "Speech-Language" ":" 1*VCHAR CRLF

3996	9.4.25.  Ver-Buffer-Utterance

3998	   This header field lets the client request the server to buffer the
3999	   utterance associated with this recognition request into a buffer
4000	   available to a co-resident verifier resource.  The buffer is shared
4001	   across resources within a session and is allocated when a verifier
4002	   resource is added to this session.  The client MUST NOT send this
4003	   header field unless a verifier resource is instantiated for the
4004	   session.  The buffer is released when the verifier resource is
4005	   released from the session.

4007	9.4.26.  Recognition-Mode

4009	   This header field specifies what mode the RECOGNIZE method will
4010	   operate in.  The value choices are "normal" or "hotword".  If the
4011	   value is "normal", the RECOGNIZE starts matching speech and DTMF to
4012	   the grammars specified in the RECOGNIZE request.  If any portion of
4013	   the speech does not match the grammar, the RECOGNIZE command
4014	   completes with a no-match status.  Timers may be active to detect
4015	   speech in the audio (see Section 9.4.14), so the RECOGNIZE method may
4016	   complete because of a timeout waiting for speech.  If the value of
4017	   this header field is "hotword", the RECOGNIZE method operates in
4018	   hotword mode, where it only looks for the particular keywords or DTMF
4019	   sequences specified in the grammar and ignores silence or other
4020	   speech in the audio stream.  The default value for this header field
4021	   is "normal".  This header field MAY occur on the RECOGNIZE method.

4023	   recognition-mode         =  "Recognition-Mode" ":"
4024	                               "normal" / "hotword" CRLF

4026	9.4.27.  Cancel-If-Queue

4028	   This header field specifies what will happen if the client attempts
4029	   to invoke another RECOGNIZE method when this RECOGNIZE request is
4030	   already in progress for the resource.  The value for this header
4031	   field is Boolean.  A value of "true" means the server MUST terminate
4032	   this RECOGNIZE request, with a Completion-Cause of "cancelled", if
4033	   the client issues another RECOGNIZE request for the same resource.  A
4034	   value of "false" for this header field indicates to the server that
4035	   this RECOGNIZE request will continue to completion and if the client
4036	   issues more RECOGNIZE requests to the same resource, they are queued.
4037	   When the currently active RECOGNIZE request is stopped or completes
4038	   with a successful match, the first RECOGNIZE method in the queue
4039	   becomes active.  If the current RECOGNIZE fails, all RECOGNIZE
4040	   methods in the pending queue are cancelled and each generates a
4041	   RECOGNITION-COMPLETE event with a Completion-Cause of "cancelled".
4042	   This header field MUST be present in every RECOGNIZE request.  There
4043	   is no default value.

4045	   cancel-if-queue          =  "Cancel-If-Queue" ":" BOOLEAN CRLF

4047	9.4.28.  Hotword-Max-Duration

4049	   This header field MAY be sent in a hotword mode RECOGNIZE request.
4050	   It specifies the maximum length of an utterance (in seconds) that
4051	   will be considered for Hotword recognition.  This header field, along
4052	   with Hotword-Min-Duration, can be used to tune performance by
4053	   preventing the recognizer from evaluating utterances that are too
4054	   short or too long to be one of the hotwords in the grammar(s).  The
4055	   value is in milliseconds.  The default is implementation dependent.
4056	   If present in a RECOGNIZE request specifying a mode other than
4057	   "hotword", the header field is ignored.

4059	   hotword-max-duration     =  "Hotword-Max-Duration" ":" 1*19DIGIT
4060	                               CRLF

4062	9.4.29.  Hotword-Min-Duration

4064	   This header field MAY be sent in a hotword mode RECOGNIZE request.
4065	   It specifies the minimum length of an utterance (in seconds) that
4066	   will be considered for Hotword recognition.  This header field, along
4067	   with Hotword-Max-Duration, can be used to tune performance by
4068	   preventing the recognizer from evaluating utterances that are too
4069	   short or too long to be one of the hotwords in the grammar(s).  The
4070	   value is in milliseconds.  The default value is implementation
4071	   dependent.  If present in a RECOGNIZE request specifying a mode other
4072	   than "hotword", the header field is ignored.

4074	   hotword-min-duration     =  "Hotword-Min-Duration" ":" 1*19DIGIT CRLF

4076	9.4.30.  Interpret-Text

4078	   The value of this header field is used to provide a pointer to the
4079	   text for which a natural language interpretation is desired.  The
4080	   value is either a URI or text.  If the value is a URI, it MUST be a
4081	   Content-ID that refers to an entity of type text/plain in the body of
4082	   the message.  Otherwise, the server MUST treat the value as the text
4083	   to be interpreted.  This header field MUST be used when invoking the
4084	   INTERPRET method.

4086	   interpret-text           =  "Interpret-Text" ":" 1*VCHAR CRLF

4088	9.4.31.  DTMF-Buffer-Time

4090	   This header field MAY be specified in a GET-PARAMS or SET-PARAMS
4091	   method and is used to specify the amount of time, in milliseconds, of
4092	   the typeahead buffer for the recognizer.  This is the buffer that
4093	   collects DTMF digits as they are pressed even when there is no
4094	   RECOGNIZE command active.  When a subsequent RECOGNIZE method is
4095	   received it MUST look to this buffer to match the RECOGNIZE request.
4096	   If the digits in the buffer are not sufficient then it can continue
4097	   to listen to more digits to match the grammar.  The default size of
4098	   this DTMF buffer is platform specific.

4100	   dtmf-buffer-time  =  "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF

4102	9.4.32.  Clear-DTMF-Buffer

4104	   This header field MAY be specified in a RECOGNIZE method and is used
4105	   to tell the recognizer to clear the DTMF type-ahead buffer before
4106	   starting the recognize.  The default value of this header field is
4107	   FALSE, which does not clear the typeahead buffer before starting the
4108	   RECOGNIZE method.  If this header field is specified to be TRUE, then
4109	   the recognize will clear the DTMF buffer before starting recognition.
4110	   This means digits pressed by the caller before the RECOGNIZE command
4111	   was issued are discarded.

4113	   clear-dtmf-buffer  = "Clear-DTMF-Buffer" ":" BOOLEAN CRLF

4115	9.4.33.  Early-No-Match

4117	   This header field MAY be specified in a RECOGNIZE method and is used
4118	   to tell the recognizer that it MUST NOT wait for the end of speech
4119	   before processing the collected speech to match active grammars.  A
4120	   value of TRUE indicates the recognizer MUST do early matching.  The
4121	   default value for this header field if not specified is FALSE.  If
4122	   the recognizer does not support the processing of the collected audio
4123	   before the end of speech this header field can be safely ignored.

4125	   early-no-match  = "Early-No-Match" ":" BOOLEAN CRLF

4127	9.4.34.  Num-Min-Consistent-Pronunciations

4129	   This header field MAY be specified in a START-PHRASE-ENROLLMENT, SET-
4130	   PARAMS, or GET-PARAMS method and is used to specify the minimum
4131	   number of consistent pronunciations that must be obtained to voice
4132	   enroll a new phrase.  The minimum value is 1.  The default value is
4133	   implementation specific and MAY be greater than 1.

4135	   num-min-consistent-pronunciations  =
4136	                 "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF

4138	9.4.35.  Consistency-Threshold

4140	   This header field MAY be sent as part of the START-PHRASE-ENROLLMENT,
4141	   SET-PARAMS, or GET-PARAMS method.  Used during voice enrollment, this
4142	   header field specifies how similar to a previously enrolled
4143	   pronunciation of the same phrase an utterance needs to be in order to
4144	   be considered "consistent."  The higher the threshold, the closer the
4145	   match between an utterance and previous pronunciations must be for
4146	   the pronunciation to be considered consistent.  The range for this
4147	   threshold is a float value between is 0.0 to 1.0.  The default value
4148	   for this header field is implementation specific.

4150	   consistency-threshold    =  "Consistency-Threshold" ":" FLOAT CRLF

4152	9.4.36.  Clash-Threshold

4154	   This header field MAY be sent as part of the START-PHRASE-ENROLLMENT,
4155	   SET-PARAMS, or GET-PARAMS method.  Used during voice-enrollment, this
4156	   header field specifies how similar the pronunciations of two
4157	   different phrases can be before they are considered to be clashing.
4158	   For example, pronunciations of phrases such as "John Smith" and "Jon
4159	   Smits" may be so similar that they are difficult to distinguish
4160	   correctly.  A smaller threshold reduces the number of clashes
4161	   detected.  The range for this threshold is float value between 0.0
4162	   and 1.0.  The default value for this header field is implementation
4163	   specific.  Clash testing can be turned off completely by setting the
4164	   Clash-Threshold header field value to 0.

4166	   clash-threshold          =  "Clash-Threshold" ":" FLOAT CRLF

4168	9.4.37.  Personal-Grammar-URI

4170	   This header field specifies the speaker-trained grammar to be used or
4171	   referenced during enrollment operations.  Phrases are added to this
4172	   grammar during enrollment.  For example, a contact list for user
4173	   "Jeff" could be stored at the Personal-Grammar-URI
4174	   "http://myserver.example.com/myenrollmentdb/jeff-list".  The
4175	   generated grammar syntax MAY be implementation specific.  There is no
4176	   default value for this header field.  This header field MAY be sent
4177	   as part of the START-PHRASE-ENROLLMENT, SET-PARAMS, or GET-PARAMS
4178	   method.

4180	   personal-grammar-uri     =  "Personal-Grammar-URI" ":" uri CRLF

4182	9.4.38.  Enroll-Utterance

4184	   This header field MAY be specified in the RECOGNIZE method.  If this
4185	   header field is set to "true" and an Enrollment is active, the
4186	   RECOGNIZE command MUST add the collected utterance to the personal
4187	   grammar that is being enrolled.  The way in which this occurs is
4188	   engine-specific and may be an area of future standardization.  The
4189	   default value for this header field is "false".

4191	   enroll-utterance     =  "Enroll-Utterance" ":" BOOLEAN CRLF

4193	9.4.39.  Phrase-Id

4195	   This header field in a request identifies a phrase in an existing
4196	   personal grammar for which enrollment is desired.  It is also
4197	   returned to the client in the RECOGNIZE complete event.  This header
4198	   field MAY occur in START-PHRASE-ENROLLMENT, MODIFY-PHRASE or DELETE-
4199	   PHRASE requests.  There is no default value for this header field.

4201	   phrase-id                =  "Phrase-ID" ":" 1*VCHAR CRLF

4203	9.4.40.  Phrase-NL

4205	   This string specifies the interpreted text to be returned when the
4206	   phrase is recognized.  This header field MAY occur in START-PHRASE-
4207	   ENROLLMENT and MODIFY-PHRASE requests.  There is no default value for
4208	   this header field.

4210	   phrase-nl                =  "Phrase-NL" ":" 1*UTFCHAR CRLF

4212	9.4.41.  Weight

4214	   The value of this header field represents the occurrence likelihood
4215	   of a phrase in an enrolled grammar.  When using grammar enrollment,
4216	   the system is essentially constructing a grammar segment consisting
4217	   of a list of possible match phrases.  This can be thought of to be
4218	   similar to the dynamic construction of a <one-of> tag in the W3C
4219	   grammar specification.  Each enrolled-phrase becomes an item in the
4220	   list that can be matched against spoken input similar to the <item>
4221	   within a <one-of> list.  This header field allows you to assign a
4222	   weight to the phrase (i.e., <item> entry) in the <one-of> list that
4223	   is enrolled.  Grammar weights are normalized to a sum of one at
4224	   grammar compilation time, so a weight value of 1 for each phrase in
4225	   an enrolled grammar list indicates all items in that list have the
4226	   same weight.  This header field MAY occur in START-PHRASE-ENROLLMENT
4227	   and MODIFY-PHRASE requests.  The default value for this header field
4228	   is implementation specific.

4230	   weight                   =  "Weight" ":" FLOAT CRLF

4232	9.4.42.  Save-Best-Waveform

4234	   This header field allows the client to request the recognizer
4235	   resource to save the audio stream for the best repetition of the
4236	   phrase that was used during the enrollment session.  The recognizer
4237	   MUST attempt to record the recognized audio and make it available to
4238	   the client in the form of a URI returned in the Waveform-URI header
4239	   field in the response to the END-PHRASE-ENROLLMENT method.  If there
4240	   was an error in recording the stream or the audio data is otherwise
4241	   not available, the recognizer MUST return an empty Waveform-URI
4242	   header field.  This header field MAY occur in the START-PHRASE-
4243	   ENROLLMENT, SET-PARAMS, and GET-PARAMS methods.

4245	   save-best-waveform  =  "Save-Best-Waveform" ":" BOOLEAN CRLF

4247	9.4.43.  New-Phrase-Id

4249	   This header field replaces the id used to identify the phrase in a
4250	   personal grammar.  The recognizer returns the new id when using an
4251	   enrollment grammar.  This header field MAY occur in MODIFY-PHRASE
4252	   requests.

4254	   new-phrase-id            =  "New-Phrase-ID" ":" 1*VCHAR CRLF

4256	9.4.44.  Confusable-Phrases-URI

4258	   This header field specifies a grammar that defines invalid phrases
4259	   for enrollment.  For example, typical applications do not allow an
4260	   enrolled phrase that is also a command word.  This header field MAY
4261	   occur in RECOGNIZE requests that are part of an enrollment session.

4263	   confusable-phrases-uri   =  "Confusable-Phrases-URI" ":" uri CRLF

4265	9.4.45.  Abort-Phrase-Enrollment

4267	   This header field MAY be specified in the END-PHRASE-ENROLLMENT
4268	   method to abort the phrase enrollment, rather than committing the
4269	   phrase to the personal grammar.

4271	   abort-phrase-enrollment  =  "Abort-Phrase-Enrollment" ":"
4272	                               BOOLEAN CRLF

4274	9.5.  Recognizer Message Body

4276	   A recognizer message can carry additional data associated with the
4277	   request, response or event.  The client MAY provide the grammar to be
4278	   recognized in DEFINE-GRAMMAR or RECOGNIZE requests.  When one or more
4279	   grammars are specified using the DEFINE-GRAMMAR method, the server
4280	   MUST attempt to fetch, compile and optimize the grammar before
4281	   returning a response to the DEFINE-GRAMMAR method.  A RECOGNIZE
4282	   request MUST completely specify the grammars to be active during the
4283	   recognition operation, except when the RECOGNIZE method is being used
4284	   to enroll a grammar.  During grammar enrollment, such grammars are
4285	   OPTIONAL.  The server resource sends the recognition results in the
4286	   RECOGNITION-COMPLETE event and the GET-RESULT response.  Grammars and
4287	   recognition results are carried in the message body of the
4288	   corresponding MRCPv2 messages.

4290	9.5.1.  Recognizer Grammar Data

4292	   Recognizer grammar data from the client to the server can be provided
4293	   inline or by reference.  Either way, grammar data is carried as typed
4294	   media entities in the message body of the RECOGNIZE or DEFINE-GRAMMAR
4295	   request.  All MRCPv2 servers MUST accept grammars in the XML form
4296	   (Media Type application/srgs+xml) of the W3C's XML-based Speech
4297	   Grammar Markup Format (SRGS) [W3C.REC-speech-grammar-20040316] and
4298	   MAY accept grammars in other formats.  Examples include but are not
4299	   limited to:
4300	   o  the ABNF form (Media Type application/srgs) of SRGS
4301	   o  Sun's Java Speech Grammar Format (JSGF)
4302	      [refs.javaSpeechGrammarFormat]
4303	   Additionally, MRCPv2 servers MAY support the Semantic Interpretation
4304	   for Speech Recognition (SISR)
4305	   [W3C.REC-semantic-interpretation-20070405] specification.

4307	   When a grammar is specified inline in the request, the client MUST
4308	   provide a Content-ID for that grammar as part of the content header
4309	   fields.  If there is no space on the server to store the inline
4310	   grammar, the request MUST return with a Completion-Cause code of 016
4311	   "grammar-definition-failure".  Otherwise, the server MUST associate
4312	   the inline grammar block with that Content-ID and MUST store it on
4313	   the server for the duration of the session.  However, if the
4314	   Content-ID is redefined later in the session through a subsequent
4315	   DEFINE-GRAMMAR, the inline grammar previously associated with the
4316	   Content-ID MUST be freed.  If the Content-ID is redefined through a
4317	   subsequent DEFINE-GRAMMAR with an empty message body (i.e. no grammar
4318	   definition), then in addition to freeing any grammar previously
4319	   associated with the Content-ID the server MUST clear all bindings and
4320	   associations to the Content-ID.  Unless and until subsequently
4321	   redefined, this URI MUST be interpreted by the server as one that has
4322	   never been set.

4324	   Grammars that have been associated with a Content-ID can be
4325	   referenced through the "session" URI scheme (see Section 13.6).  For
4326	   example:
4327	   session:help@root-level.store

4329	   Grammar data MAY be specified using external URI references.  To do
4330	   so, the client uses a body of Media Type text/uri-list (see RFC 2483
4331	   [RFC2483] ) to list the one or more URIs that point to the grammar
4332	   data.  The client can use a body of Media Type text/grammar-ref-list
4333	   (see Section 13.5.1) if it wants to assign weights to the list of
4334	   grammar URI.  All MRCPv2 servers MUST support grammar access using
4335	   the "http" and "https" URI schemes.

4337	   If the grammar data the client wishes to be used on a request
4338	   consists of a mix of URI and inline grammar data the client uses the
4339	   multipart/mixed Media Type to enclose the text/uri-list, application/
4340	   srgs or application/srgs+xml content entities.  The character set and
4341	   encoding used in the grammar data are specified using to standard
4342	   Media Type definitions.

4344	   When more than one grammar URI or inline grammar block is specified
4345	   in a message body of the RECOGNIZE request, the server interprets
4346	   this as a list of grammar alternatives to match against.

4348	   Content-Type:application/srgs+xml
4349	   Content-ID:<request1@form-level.store>
4350	   Content-Length:...

4352	   <?xml version="1.0"?>

4354	   <!-- the default grammar language is US English -->
4355	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4356	            xml:lang="en-US" version="1.0" root="request">

4358	   <!-- single language attachment to tokens -->
4359	         <rule id="yes">
4360	               <one-of>
4361	                     <item xml:lang="fr-CA">oui</item>
4362	                     <item xml:lang="en-US">yes</item>
4363	               </one-of>
4364	         </rule>

4366	   <!-- single language attachment to a rule expansion -->
4367	         <rule id="request">
4368	               may I speak to
4369	               <one-of xml:lang="fr-CA">
4370	                     <item>Michel Tremblay</item>
4371	                     <item>Andre Roy</item>
4372	               </one-of>
4373	         </rule>

4375	         <!-- multiple language attachment to a token -->
4376	         <rule id="people1">
4377	               <token lexicon="en-US,fr-CA"> Robert </token>
4378	         </rule>

4380	         <!-- the equivalent single-language attachment expansion -->
4381	         <rule id="people2">
4382	               <one-of>
4383	                     <item xml:lang="en-US">Robert</item>
4384	                     <item xml:lang="fr-CA">Robert</item>
4385	               </one-of>
4386	         </rule>

4388	         </grammar>

4390	                           SRGS Grammar Example

4392	   Content-Type:text/uri-list
4393	   Content-Length:...

4395	   session:help@root-level.store
4396	   http://www.example.com/Directory-Name-List.grxml
4397	   http://www.example.com/Department-List.grxml
4398	   http://www.example.com/TAC-Contact-List.grxml
4399	   session:menu1@menu-level.store

4401	                         Grammar Reference Example

4403	   Content-Type:multipart/mixed; boundary="break"

4405	   --break
4406	   Content-Type:text/uri-list
4407	   Content-Length:...

4409	   http://www.example.com/Directory-Name-List.grxml
4410	   http://www.example.com/Department-List.grxml
4411	   http://www.example.com/TAC-Contact-List.grxml

4413	   --break
4414	   Content-Type:application/srgs+xml
4415	   Content-ID:<request1@form-level.store>
4416	   Content-Length:...

4418	   <?xml version="1.0"?>

4420	   <!-- the default grammar language is US English -->
4421	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4422	            xml:lang="en-US" version="1.0">

4424	   <!-- single language attachment to tokens -->
4425	         <rule id="yes">
4426	               <one-of>
4427	                     <item xml:lang="fr-CA">oui</item>
4428	                     <item xml:lang="en-US">yes</item>
4429	               </one-of>
4430	         </rule>

4432	   <!-- single language attachment to a rule expansion -->
4433	         <rule id="request">
4434	               may I speak to
4435	               <one-of xml:lang="fr-CA">
4436	                     <item>Michel Tremblay</item>
4437	                     <item>Andre Roy</item>
4438	               </one-of>

4440	         </rule>

4442	         <!-- multiple language attachment to a token -->
4443	         <rule id="people1">
4444	               <token lexicon="en-US,fr-CA"> Robert </token>
4445	         </rule>

4447	         <!-- the equivalent single-language attachment expansion -->
4448	         <rule id="people2">
4449	               <one-of>
4450	                     <item xml:lang="en-US">Robert</item>
4451	                     <item xml:lang="fr-CA">Robert</item>
4452	               </one-of>
4453	         </rule>

4455	         </grammar>
4456	   --break--

4458	                      Mixed Grammar Reference Example

4460	9.5.2.  Recognizer Result Data

4462	   Recognition results are returned to the client in the message body of
4463	   the RECOGNITION-COMPLETE event or the GET-RESULT response message as
4464	   described in Section 6.3).  Element and attribute descriptions for
4465	   the recognition portion of the NLSML format are provided in
4466	   Section 9.6 with a normative definition of the schema in
4467	   Section 16.1.

4469	   Content-Type:application/nlsml+xml
4470	   Content-Length:...

4472	   <?xml version="1.0"?>
4473	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
4474	           xmlns:ex="http://www.example.com/example"
4475	           grammar="http://www.example.com/theYesNoGrammar">
4476	       <interpretation>
4477	           <instance>
4478	                   <ex:response>yes</ex:response>
4479	           </instance>
4480	           <input>ok</input>
4481	       </interpretation>
4482	   </result>

4484	                              Result Example

4486	9.5.3.  Enrollment Result Data

4488	   Enrollment results are returned to the client in the message body of
4489	   the RECOGNITION-COMPLETE event as described in Section 6.3).  Element
4490	   and attribute descriptions for the enrollment portion of the NLSML
4491	   format are provided in Section 9.7 with a normative definition of the
4492	   schema in Section 16.2.

4494	9.5.4.  Recognizer Context Block

4496	   When a client changes servers while operating on the behalf of the
4497	   same incoming communication session, this header field allows the
4498	   client to collect a block of opaque data from one server and provide
4499	   it to another server.  This capability is desirable if the client
4500	   needs different language support or because the server issued a
4501	   redirect.  Here the first recognizer resource may have collected
4502	   acoustic and other data during its execution of recognition methods.
4503	   After a server switch, communicating this data may allow the
4504	   recognition resource on the new server to provide better recognition.
4505	   This block of data is implementation-specific and MUST be carried as
4506	   Media Type application/octets in the body of the message.

4508	   This block of data is communicated in the SET-PARAMS and GET-PARAMS
4509	   method/response messages.  In the GET-PARAMS method, if an empty
4510	   Recognizer-Context-Block header field is present, then the recognizer
4511	   SHOULD return its vendor-specific context block, if any, in the
4512	   message body as an entity of Media Type application/octets with a
4513	   specific Content-ID.  The Content-ID value MUST also be specified in
4514	   the Recognizer-Context-Block header field in the GET-PARAMS response.
4515	   The SET-PARAMS request wishing to provide this vendor-specific data
4516	   MUST send it in the message body as a typed entity with the same
4517	   Content-ID that it received from the GET-PARAMS.  The Content-ID MUST
4518	   also be sent in the Recognizer-Context-Block header field of the SET-
4519	   PARAMS message.

4521	   Each speech recognition implementation choosing to use this mechanism
4522	   to hand off recognizer context data among servers MUST distinguish
4523	   its implementation-specific block of data from other implementations
4524	   by choosing a Content-ID that is recognizable among the participating
4525	   servers and unlikely to collide with values chosen by another
4526	   implementation.

4528	9.6.  Recognizer Results

4530	   The recognizer portion of NLSML (see Section 6.3.1) represents
4531	   information automatically extracted from a user's utterances by a
4532	   semantic interpretation component, where "utterance" is to be taken
4533	   in the general sense of a meaningful user input in any modality
4534	   supported by the MRCPv2 implementation.

4536	9.6.1.  Markup Functions

4538	   MRCPv2 recognition resources employ the Natural Language Semantics
4539	   Markup Language (NLSML) to interpret natural language speech input
4540	   and to format the interpretation for consumption by an MRCPv2 client.

4542	   The elements of the markup fall into the following general functional
4543	   categories: Interpretation, Side Information, and Multi-Modal
4544	   Integration.

4546	9.6.1.1.  Interpretation

4548	   Elements and attributes represent the semantics of a user's
4549	   utterance, including the <result>, <interpretation>, and <instance>
4550	   elements.  The <result> element contains the full result of
4551	   processing one utterance.  It MAY contain multiple <interpretation>
4552	   elements if the interpretation of the utterance results in multiple
4553	   alternative meanings due to uncertainty in speech recognition or
4554	   natural language understanding.  There are at least two reasons for
4555	   providing multiple interpretations:
4556	   1.  the client application might have additional information, for
4557	       example, information from a database, that would allow it to
4558	       select a preferred interpretation from among the possible
4559	       interpretations returned from the semantic interpreter.
4560	   2.  a client-based dialog manager (e.g.  VoiceXML
4561	       [W3C.REC-voicexml20-20040316]) that was unable to select between
4562	       several competing interpretations could use this information to
4563	       go back to the user and find out what was intended.  For example,
4564	       it could issue a SPEAK request to a synthesizer resource to emit
4565	       "Did you say 'Boston' or 'Austin'?"

4567	9.6.1.2.  Side Information

4569	   These are elements and attributes representing additional information
4570	   about the interpretation, over and above the interpretation itself.
4571	   Side information includes:
4572	   1.  Whether an interpretation was achieved (the <nomatch> element)
4573	       and the system's confidence in an interpretation (the
4574	       "confidence" attribute of <interpretation>).
4575	   2.  Alternative interpretations (<interpretation>)
4576	   3.  Input formats and Automatic Speech Recognition (ASR) information:
4577	       the <input> element, representing the input to the semantic
4578	       interpreter.

4580	9.6.1.3.  Multi-Modal Integration

4582	   When more than one modality is available for input, the
4583	   interpretation of the inputs need to be coordinated.  The "mode"
4584	   attribute of <input> supports this by indicating whether the
4585	   utterance was input by speech, dtmf, pointing, etc.  The
4586	   "timestamp_start" and "timestamp_end" attributes of <interpretation>
4587	   also provide for temporal coordination by indicating when inputs
4588	   occurred.

4590	9.6.2.  Overview of Recognizer Result Elements and their Relationships

4592	   The recognizer elements in NLSML fall into two categories:
4593	   1.  description of the input that was processed.
4594	   2.  description of the meaning which was extracted from the input.
4595	   Next to each element are its attributes.  In addition, some elements
4596	   can contain multiple instances of other elements.  For example, a
4597	   <result> can contain multiple <interpretations>, each of which is
4598	   taken to be an alternative.  Similarly, <input> can contain multiple
4599	   child <input> elements which are taken to be cumulative.  To
4600	   illustrate the basic usage of these elements, as a simple example,
4601	   consider the utterance "ok" (interpreted as "yes").  The example
4602	   illustrates how that utterance and its interpretation would be
4603	   represented in the NLSML markup.

4605	   <?xml version="1.0"?>
4606	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
4607	           xmlns:ex="http://www.example.com/example"
4608	           grammar="http://www.example.com/theYesNoGrammar">
4609	     <interpretation>
4610	        <instance>
4611	           <ex:response>yes</ex:response>
4612	         </instance>
4613	       <input>ok</input>
4614	     </interpretation>
4615	   </result>

4617	   This example includes only the minimum required information.  There
4618	   is an overall <result> element which includes one interpretation and
4619	   an input element.  The interpretation contains the application-
4620	   specific element "<response>" which is the semantically interpreted
4621	   result.

4623	9.6.3.  Elements and Attributes
4624	9.6.3.1.  RESULT Root Element

4626	   The root element of the markup is <result>.  The <result> element
4627	   includes one or more <interpretation> elements.  Multiple
4628	   interpretations can result from ambiguities in the input or in the
4629	   semantic interpretation.  If the "grammar" attribute does not apply
4630	   to all of the interpretations in the result it can be overridden for
4631	   individual interpretations at the <interpretation> level.

4633	   Attributes:
4634	   1.  grammar: The grammar or recognition rule matched by this result.
4635	       The format of the grammar attribute will match the rule reference
4636	       semantics defined in the grammar specification.  Specifically,
4637	       the rule reference is in the external XML form for grammar rule
4638	       references.  The markup interpreter needs to know the grammar
4639	       rule that is matched by the utterance because multiple rules may
4640	       be simultaneously active.  The value is the grammar URI used by
4641	       the markup interpreter to specify the grammar.  The grammar can
4642	       be overridden by a grammar attribute in the <interpretation>
4643	       element if the input was ambiguous as to which grammar it
4644	       matched.  If all interpretation elements within the result
4645	       element contain carry their own grammar attributes, the attribute
4646	       can be dropped from the result element.

4648	   <?xml version="1.0"?>
4649	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
4650	           grammar="http://www.example.com/grammar">
4651	     <interpretation>
4652	      ....
4653	     </interpretation>
4654	   </result>

4656	9.6.3.2.  INTERPRETATION Element

4658	   An <interpretation> element contains a single semantic
4659	   interpretation.

4661	   Attributes:
4662	   1.  confidence: A float value from 0.0-1.0 indicating the semantic
4663	       analyzer's confidence in this interpretation.  A value of 1.0
4664	       indicates maximum confidence.  The values are implementation-
4665	       dependent, but are intended to align with the value
4666	       interpretation for the confidence MRCPv2 header field defined in
4667	       Section 9.4.1.  This attribute is OPTIONAL.
4668	   2.  grammar: The grammar or recognition rule matched by this
4669	       interpretation (if needed to override the grammar specification
4670	       at the <interpretation> level.)  This attribute is only needed
4671	       under <interpretation> if it is necessary to override a grammar
4672	       that was defined at the <result> level.)  Note that the grammar
4673	       attribute for the interpretation element is optional if and only
4674	       if the grammar attribute is specified in the result element.

4676	   Interpretations MUST be sorted best-first by some measure of
4677	   "goodness".  The goodness measure is "confidence" if present,
4678	   otherwise, it is some implementation-specific indication of quality.

4680	   The grammar is expected to be specified most frequently at the
4681	   <result> level.  However, it can be overridden at the
4682	   <interpretation> level because it is possible that different
4683	   interpretations may match different grammar rules.

4685	   The <interpretation> element includes an optional <input> element
4686	   which contains the input being analyzed, and an <instance> element
4687	   containing the interpretation of the utterance.

4689	   <interpretation confidence="0.75"
4690	                   grammar="http://www.example.com/grammar">
4691	       ...
4692	   </interpretation>

4694	9.6.3.3.  INSTANCE Element

4696	   The <instance> element contains the interpretation of the utterance.
4697	   When the Semantic Interpretation for Speech Recognition format is
4698	   used, the <instance> element contains the XML serialization of the
4699	   result using the approach defined in that specification.  When there
4700	   is semantic markup in the grammar that does not create semantic
4701	   objects, but instead only does a semantic translation of a portion of
4702	   the input, such as translating "coke" to "coca-cola", the instance
4703	   contains the whole input but with the translation applied.  The NLSML
4704	   looks like the markup in Figure 2 below.  If there are no semantic
4705	   objects created, nor any semantic translation the instance value is
4706	   the same as the input value.

4708	   Attributes:
4709	   1.  confidence: Each element of the instance MAY have a confidence
4710	       attribute, defined in the NLSML namespace.  The confidence
4711	       attribute contains a float value in the range from 0.0-1.0
4712	       reflecting the system's confidence in the analysis of that slot.
4713	       A value of 1.0 indicates maximum confidence.  The values are
4714	       implementation-dependent, but are intended to align with the
4715	       value interpretation for the confidence MRCPv2 header field
4716	       defined in Section 9.4.1.  This attribute is OPTIONAL.

4718	   <instance>
4719	     <nameAddress>
4720	         <street confidence="0.75">123 Maple Street</street>
4721	         <city>Mill Valley</city>
4722	         <state>CA</state>
4723	         <zip>90952</zip>
4724	     </nameAddress>
4725	   </instance>
4726	   <input>
4727	     My address is 123 Maple Street,
4728	     Mill Valley, California, 90952
4729	   </input>

4731	   <instance>
4732	       I would like to buy a coca-cola
4733	   </instance>
4734	   <input>
4735	     I would like to buy a coke
4736	   </input>

4738	                          Figure 2: NSLML Example

4740	9.6.3.4.  INPUT Element

4742	   The <input> element is the text representation of a user's input.  It
4743	   includes an optional "confidence" attribute which indicates the
4744	   recognizer's confidence in the recognition result (as opposed to the
4745	   confidence in the interpretation, which is indicated by the
4746	   "confidence" attribute of <interpretation>).  Optional "timestamp-
4747	   start" and "timestamp-end" attributes indicate the start and end
4748	   times of a spoken utterance, in ISO 8601 format [ISO.8601.1988].

4750	   Attributes:
4751	   1.  timestamp-start: The time at which the input began. (optional)
4752	   2.  timestamp-end: The time at which the input ended. (optional)
4753	   3.  mode: The modality of the input, for example, speech, dtmf, etc.
4754	       (optional)
4755	   4.  confidence: the confidence of the recognizer in the correctness
4756	       of the input in the range 0.0 to 1.0 (optional)
4757	   Note that it may not make sense for temporally overlapping inputs to
4758	   have the same mode; however, this constraint is not expected to be
4759	   enforced by implementations.

4761	   When there is no time zone designator, ISO 8601 time representations
4762	   default to local time.

4764	   There are three possible formats for the <input> element.

4766	   1.  The <input> element can contain simple text:
4767	   <input>onions</input>
4768	       A future possibility is for <input> to contain not only text but
4769	       additional markup that represents prosodic information that was
4770	       contained in the original utterance and extracted by the speech
4771	       recognizer.  This depends on the availability of ASR's that are
4772	       capable of producing prosodic information.  MRCPv2 clients MUST
4773	       be prepared to receive such markup and MAY make use of it.
4774	   2.  An <input> tag can also contain additional <input> tags.  Having
4775	       additional input elements allows the representation to support
4776	       future multi-modal inputs as well as finer-grained speech
4777	       information, such as timestamps for individual words and word-
4778	       level confidences.
4779	   <input>
4780	        <input mode="speech" confidence="0.5"
4781	            timestamp-start="2000-04-03T0:00:00"
4782	            timestamp-end="2000-04-03T0:00:00.2">fried</input>
4783	        <input mode="speech" confidence="1.0"
4784	            timestamp-start="2000-04-03T0:00:00.25"
4785	            timestamp-end="2000-04-03T0:00:00.6">onions</input>
4786	   </input>
4787	   3.  Finally, the <input> element can contain <nomatch> and <noinput>
4788	       elements, which describe situations in which the speech
4789	       recognizer received input that it was unable to process, or did
4790	       not receive any input at all, respectively.

4792	9.6.3.5.  NOMATCH Element

4794	   The <nomatch> element under <input> is used to indicate that the
4795	   semantic interpreter was unable to successfully match any input with
4796	   confidence above the threshold.  It can optionally contain the text
4797	   of the best of the (rejected) matches.

4799	   <interpretation>
4800	      <instance/>
4801	         <input confidence="0.1">
4802	            <nomatch/>
4803	         </input>
4804	   </interpretation>
4805	   <interpretation>
4806	      <instance/>
4807	      <input mode="speech" confidence="0.1">
4808	        <nomatch>I want to go to New York</nomatch>
4809	      </input>
4810	   </interpretation>

4812	9.6.3.6.  NOINPUT Element

4814	   <noinput> indicates that there was no input - a timeout occurred in
4815	   the speech recognizer due to silence.
4816	   <interpretation>
4817	      <instance/>
4818	      <input>
4819	         <noinput/>
4820	      </input>
4821	   </interpretation>

4823	   If there are multiple levels of inputs, the most natural place for
4824	   <nomatch> and <noinput> elements to appear is under the highest level
4825	   of <input> for <noinput>, and under the appropriate level of
4826	   <interpretation> for <nomatch>.  So <noinput> means "no input at all"
4827	   and <nomatch> means "no match in speech modality" or "no match in
4828	   dtmf modality".  For example, to represent garbled speech combined
4829	   with dtmf "1 2 3 4", the markup would be:
4830	   <input>
4831	      <input mode="speech"><nomatch/></input>
4832	      <input mode="dtmf">1 2 3 4</input>
4833	   </input>

4835	   Note: while <noinput> could be represented as an attribute of input,
4836	   <nomatch> cannot, since it could potentially include PCDATA content
4837	   with the best match.  For parallelism, <noinput> is also an element.

4839	9.7.  Enrollment Results

4841	   All enrollment elements are contained within a single <enrollment-
4842	   result> element under <result>.  The elements are described below and
4843	   have the schema defined in Section 16.2.  The following elements are
4844	   defined:

4846	   1.  num-clashes
4847	   2.  num-good-repetitions
4848	   3.  num-repetitions-still-needed
4849	   4.  consistency-status
4850	   5.  clash-phrase-ids
4851	   6.  transcriptions
4852	   7.  confusable-phrases

4854	9.7.1.  NUM-CLASHES Element

4856	   The <num-clashes> element contains the number of clashes that this
4857	   pronunciation has with other pronunciations in an active enrollment
4858	   session.  The associated Clash-Threshold header field determines the
4859	   sensitivity of the clash measurement.  Note that clash testing can be
4860	   turned off completely by setting the Clash-Threshold header field
4861	   value to 0.

4863	9.7.2.  NUM-GOOD-REPETITIONS Element

4865	   The <num-good-repetitions> element contains the number of consistent
4866	   pronunciations obtained so far in an active enrollment session.

4868	9.7.3.  NUM-REPETITIONS-STILL-NEEDED Element

4870	   The <num-repetitions-still-needed> element contains the number of
4871	   consistent pronunciations that must still be obtained before the new
4872	   phrase can be added to the enrollment grammar.  The number of
4873	   consistent pronunciations required is specified by the client in the
4874	   request header field Num-Min-Consistent-Pronunciations.  The returned
4875	   value must be 0 before the client can successfully commit a phrase to
4876	   the grammar by ending the enrollment session.

4878	9.7.4.  CONSISTENCY-STATUS Element

4880	   The <consistency-status> element is used to indicate how consistent
4881	   the repetitions are when learning a new phrase.  It can have the
4882	   values of consistent, inconsistent, and undecided.

4884	9.7.5.  CLASH-PHRASE-IDS Element

4886	   The <clash-phrase-ids> element contains the phrase ids of clashing
4887	   pronunciation(s), if any.  This element is absent if there are no
4888	   clashes.

4890	9.7.6.  TRANSCRIPTIONS Element

4892	   The <transcriptions> element contains the transcriptions returned in
4893	   the last repetition of the phrase being enrolled.

4895	9.7.7.  CONFUSABLE-PHRASES Element

4897	   The <confusable-phrases> element contains a list of phrases from a
4898	   command grammar that are confusable with the phrase being added to
4899	   the personal grammar.  This element MAY be absent if there are no
4900	   confusable phrases.

4902	9.8.  DEFINE-GRAMMAR

4904	   The DEFINE-GRAMMAR method, from the client to the server, provides
4905	   one or more grammars and requests the server to access, fetch, and
4906	   compile the grammars as needed.  The DEFINE-GRAMMAR method
4907	   implementation MUST do a fetch of all external URIs that are part of
4908	   that operation.  If caching is implemented, this URI fetching MUST
4909	   conform to the cache control hints and parameter header fields
4910	   associated with the method in deciding whether it should be fetched
4911	   from cache or from the external server.  If these hints/parameters
4912	   are not specified in the method, the values set for the session using
4913	   SET-PARAMS/GET-PARAMS apply.  If it was not set for the session their
4914	   default values apply.

4916	   If the server resource is in the recognition state, the DEFINE-
4917	   GRAMMAR request MUST respond with a failure status.

4919	   If the resource is in the idle state and is able to successfully
4920	   process the supplied grammars, the server MUST return a success code
4921	   status and the request-state MUST be COMPLETE.

4923	   If the recognizer resource could not define the grammar for some
4924	   reason, for example if the download failed, the grammar failed to
4925	   compile, or the grammar was in an unsupported form, the MRCPv2
4926	   response for the DEFINE-GRAMMAR method MUST contain a failure status-
4927	   code of 407, and contain a Completion-Cause header field describing
4928	   the failure reason.

4930	   C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543257
4931	   Channel-Identifier:32AECB23433801@speechrecog
4932	   Content-Type:application/srgs+xml
4933	   Content-ID:<request1@form-level.store>
4934	   Content-Length:...

4936	   <?xml version="1.0"?>

4938	   <!-- the default grammar language is US English -->
4939	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4940	            xml:lang="en-US" version="1.0">

4942	   <!-- single language attachment to tokens -->
4943	   <rule id="yes">
4944	               <one-of>
4945	                     <item xml:lang="fr-CA">oui</item>
4946	                     <item xml:lang="en-US">yes</item>
4947	               </one-of>
4948	         </rule>

4950	   <!-- single language attachment to a rule expansion -->
4951	         <rule id="request">
4952	               may I speak to
4953	               <one-of xml:lang="fr-CA">
4954	                     <item>Michel Tremblay</item>
4955	                     <item>Andre Roy</item>

4957	               </one-of>
4958	         </rule>

4960	         </grammar>

4962	   S->C:MRCP/2.0 ... 543257 200 COMPLETE
4963	   Channel-Identifier:32AECB23433801@speechrecog
4964	           Completion-Cause:000 success

4966	   C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543258
4967	   Channel-Identifier:32AECB23433801@speechrecog
4968	   Content-Type:application/srgs+xml
4969	   Content-ID:<helpgrammar@root-level.store>
4970	   Content-Length:...

4972	   <?xml version="1.0"?>

4974	   <!-- the default grammar language is US English -->
4975	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
4976	            xml:lang="en-US" version="1.0">

4978	         <rule id="request">
4979	               I need help
4980	         </rule>

4982	   S->C:MRCP/2.0 ... 543258 200 COMPLETE
4983	   Channel-Identifier:32AECB23433801@speechrecog
4984	           Completion-Cause:000 success

4986	   C->S:MRCP/2.0 ... DEFINE-GRAMMAR 543259
4987	   Channel-Identifier:32AECB23433801@speechrecog
4988	   Content-Type:application/srgs+xml
4989	   Content-ID:<request2@field-level.store>
4990	   Content-Length:...

4992	   <?xml version="1.0" encoding="UTF-8"?>

4994	   <!DOCTYPE grammar PUBLIC "-//W3C//DTD GRAMMAR 1.0//EN"
4995	                     "http://www.w3.org/TR/speech-grammar/grammar.dtd">

4997	   <grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en"
4998	   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4999	          xsi:schemaLocation="http://www.w3.org/2001/06/grammar
5000	              http://www.w3.org/TR/speech-grammar/grammar.xsd"
5001	              version="1.0" mode="voice" root="basicCmd">

5003	   <meta name="author" content="Stephanie Williams"/>
5004	   <rule id="basicCmd" scope="public">
5005	     <example> please move the window </example>
5006	     <example> open a file </example>

5008	     <ruleref
5009	       uri="http://grammar.example.com/politeness.grxml#startPolite"/>

5011	     <ruleref uri="#command"/>
5012	     <ruleref
5013	       uri="http://grammar.example.com/politeness.grxml#endPolite"/>
5014	   </rule>

5016	   <rule id="command">
5017	     <ruleref uri="#action"/> <ruleref uri="#object"/>
5018	   </rule>

5020	   <rule id="action">
5021	      <one-of>
5022	         <item weight="10"> open   <tag>open</tag>   </item>
5023	         <item weight="2">  close  <tag>close</tag>  </item>
5024	         <item weight="1">  delete <tag>delete</tag> </item>
5025	         <item weight="1">  move   <tag>move</tag>   </item>
5026	      </one-of>
5027	   </rule>

5029	   <rule id="object">
5030	     <item repeat="0-1">
5031	       <one-of>
5032	         <item> the </item>
5033	         <item> a </item>
5034	       </one-of>
5035	     </item>

5037	     <one-of>
5038	         <item> window </item>
5039	         <item> file </item>
5040	         <item> menu </item>
5041	     </one-of>
5042	   </rule>

5044	   </grammar>

5046	   S->C:MRCP/2.0 ... 543259 200 COMPLETE
5047	   Channel-Identifier:32AECB23433801@speechrecog
5048	           Completion-Cause:000 success

5050	   C->S:MRCP/2.0 ... RECOGNIZE 543260
5051	   Channel-Identifier:32AECB23433801@speechrecog
5052	           N-Best-List-Length:2
5053	   Content-Type:text/uri-list
5054	   Content-Length:...

5056	   session:request1@form-level.store
5057	   session:request2@field-level.store
5058	   session:helpgramar@root-level.store

5060	   S->C:MRCP/2.0 ... 543260 200 IN-PROGRESS
5061	   Channel-Identifier:32AECB23433801@speechrecog

5063	   S->C:MRCP/2.0 ... START-OF-INPUT 543260 IN-PROGRESS
5064	   Channel-Identifier:32AECB23433801@speechrecog

5066	   S->C:MRCP/2.0 ... RECOGNITION-COMPLETE 543260 COMPLETE
5067	   Channel-Identifier:32AECB23433801@speechrecog
5068	   Completion-Cause:000 success
5069	   Waveform-URI:<http://web.media.com/session123/audio.wav>;
5070	                size=124535;duration=2340
5071	   Content-Type:application/x-nlsml
5072	   Content-Length:...

5074	   <?xml version="1.0"?>
5075	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5076	           xmlns:ex="http://www.example.com/example"
5077	           grammar="session:request1@form-level.store">
5078	           <interpretation>
5079	               <instance name="Person">
5080	               <ex:Person>
5081	                   <ex:Name> Andre Roy </ex:Name>
5082	               </ex:Person>
5083	            </instance>
5084	            <input>   may I speak to Andre Roy </input>
5085	       </interpretation>
5086	   </result>

5088	                          Define Grammar Example

5090	9.9.  RECOGNIZE

5092	   The RECOGNIZE method from the client to the server requests the
5093	   recognizer to start recognition and provides it with one or more
5094	   grammar references for grammars to match against the input media.
5095	   The RECOGNIZE method can carry header fields to control the
5096	   sensitivity, confidence level and the level of detail in results
5097	   provided by the recognizer.  These header field values override the
5098	   current values set by a previous SET-PARAMS method.

5100	   The RECOGNIZE method can request the recognizer resource to operate
5101	   in normal or hotword mode as specified by the Recognition-Mode header
5102	   field.  The default value is "normal".  If the resource could not
5103	   start a recognition, the server MUST respond with a failure status-
5104	   code of 407 and a Completion-Cause header field in the response
5105	   describing the cause of failure.

5107	   The RECOGNIZE request uses the message body to specify the grammars
5108	   applicable to the request.  The active grammar(s) for the request can
5109	   be specified in one of 3 ways.  If the client needs to explicitly
5110	   control grammar weights for the recognition operation, it MUST employ
5111	   method 3 below.  The order of these grammars specifies the precedence
5112	   of the grammars which is used when more than one grammar in the list
5113	   matches the speech; in this case, the grammar with the higher
5114	   precedence is returned as a match.  This precedence capability is
5115	   useful in applications like VoiceXML browsers to order grammars
5116	   specified at the dialog, document and root level of a VoiceXML
5117	   application.
5118	   1.  The grammar MAY be placed directly in the message body as typed
5119	       content.  If more than one grammar is included in the body, the
5120	       order of inclusion controls the corresponding precedence for the
5121	       grammars during recognition, with earlier grammars in the body
5122	       having a higher precedence than later ones.
5123	   2.  The body MAY contain a list of grammar URIs specified in content
5124	       of Media Type text/uri-list RFC2483 [RFC2483].  The order of the
5125	       URIs determines the corresponding precedence for the grammars
5126	       during recognition, with highest-precedence first and decreasing
5127	       for each URI thereafter.
5128	   3.  The body MAY contain a list of grammar URIs specified in content
5129	       of Media Type text/grammar-ref-list.  This type defines a list of
5130	       grammar URIs and allows each grammar URI to be assigned a weight
5131	       in the list.  This weight has the same meaning as the weights
5132	       described in section 2.4.1 of the Speech Grammar Markup Format
5133	       (SRGS) [W3C.REC-speech-grammar-20040316].
5134	   In addition to performing recognition on the input, the recognizer
5135	   MUST also enroll the collected utterance in a personal grammar if the
5136	   Enroll-Utterance header field is set to true and an Enrollment is
5137	   active (via an earlier execution of the START-PHRASE-ENROLLMENT
5138	   method).  If so, and if the RECOGNIZE request contains a Content-ID
5139	   header field, then the resulting grammar (which includes the personal
5140	   grammar as a sub-grammar) can be referenced through the "session" URI
5141	   scheme (see Section 13.6).

5143	   If the resource was able to successfully start the recognition, the
5144	   server MUST return a success status-code and a request-state of IN-
5145	   PROGRESS.  This means that the recognizer is active and that the
5146	   client MUST be prepared to receive further events with this
5147	   request-id.

5149	   If the resource was able to queue the request the server MUST return
5150	   a success code and request-state of PENDING.  This means that the
5151	   recognizer is currently active with another request and that this
5152	   request has been queued for processing.

5154	   If the resource could not start a recognition, the server MUST
5155	   respond with a failure status-code of 407 and a Completion-Cause
5156	   header field in the response describing the cause of failure.

5158	   For the recognizer resource, RECOGNIZE and INTERPRET are the only
5159	   requests that return a request-state of IN-PROGRESS, meaning that
5160	   recognition is in progress.  When the recognition completes by
5161	   matching one of the grammar alternatives or by a time-out without a
5162	   match or for some other reason, the recognizer resource MUST send the
5163	   client a RECOGNITION-COMPLETE event (or INTERPRETATION-COMPLETE, if
5164	   INTERPRET was the request) with the result of the recognition and a
5165	   request-state of COMPLETE.

5167	   Large grammars can take a long time for the server to compile.  For
5168	   grammars which are used repeatedly, the client can improve server
5169	   performance by issuing a DEFINE-GRAMMAR request with the grammar
5170	   ahead of time.  In such a case the client can issue the RECOGNIZE
5171	   request and reference the grammar through the "session:" URI scheme
5172	   (see Section 13.6).  This also applies in general if the client wants
5173	   to repeat recognition with a previous inline grammar.

5175	   The RECOGNIZE method implementation MUST do a fetch of all external
5176	   URIs that are part of that operation.  If caching is implemented,
5177	   this URI fetching MUST conform to the cache control hints and
5178	   parameter header fields associated with the method in deciding
5179	   whether it should be fetched from cache or from the external server.
5180	   If these hints/parameters are not specified in the method, the values
5181	   set for the session using SET-PARAMS/GET-PARAMS apply.  If it was not
5182	   set for the session their default values apply.

5184	   Note that since the audio and the messages are carried over separate
5185	   communication paths there may be a race condition between the start
5186	   of the flow of audio and the receipt of the RECOGNIZE method.  For
5187	   example, if an audio flow is started by the client at the same time
5188	   as the RECOGNIZE method is sent, either the audio or the RECOGNIZE
5189	   can arrive at the recognizer first.  As another example, the client
5190	   may choose to continuously send audio to the Server and signal the
5191	   Server to recognize using the RECOGNIZE method.  Mechanisms to
5192	   resolve this condition are outside the scope of this specification.
5193	   The recognizer can expect the media to start flowing when it receives
5194	   the recognize request, but MUST NOT buffer anything it receives
5195	   beforehand in order to preserve the semantics that application
5196	   authors expect with respect to the input timers.

5198	   When a RECOGNIZE method has been received the recognition is
5199	   initiated on the stream.  The No-Input-Timer MUST be started at this
5200	   time if the Start-Input-Timers header field is specified as "true".
5201	   If this header field is set to "false", the No-Input-Timer MUST be
5202	   started when it receives the START-INPUT-TIMERS method from the
5203	   client.  The Recognition-Timer MUST be started when the recognition
5204	   resource detects speech or a DTMF digit in the media stream.

5206	   Non-Hotword mode recognition:

5208	   When the recognition resource detects speech or a DTMF digit in the
5209	   media stream it MUST send the START-OF-INPUT event.  When enough
5210	   speech has been collected for the server to process, the recognizer
5211	   can try to match the collected speech with the active grammars.  If
5212	   the speech collected at this point fully matches with any of the
5213	   active grammars, the Speech-Complete-Timer is started.  If it matches
5214	   partially with one or more of the active grammars, with more speech
5215	   needed before a full match is achieved, then the Speech-Incomplete-
5216	   Timer is started.

5218	   1.  When the No-Input-Timer expires, the recognizer MUST complete
5219	   with a Completion-Cause code of "no-input-timeout".

5221	   2.  The recognizer MUST support detecting a no-match condition upon
5222	   detecting end of speech.  The recognizer MAY support detecting a no-
5223	   match condition before waiting for end-of-speech.  If this is
5224	   supported, this capability is enabled by setting the Early-No-Match
5225	   header field to "true".  Upon detecting a no-match condition the
5226	   RECOGNIZE MUST return with "no-match".

5228	   3.  When the Speech-Incomplete-Timer expires the recognizer SHOULD
5229	   complete with a Completion-Cause code of "partial-match", unless the
5230	   recognizer cannot differentiate a partial-match in which case it MUST
5231	   return a Completion-Cause code of "no-match".  The recognizer MAY
5232	   return results for the partially matched grammar.

5234	   4.  When the Speech-Complete-Timer expires the recognizer MUST
5235	   complete with a Completion-Cause code of "success".

5237	   5.  When the Recognition-Timer expires one of the following MUST
5238	   happen:

5240	   5.1 If there was a partial-match the recognizer SHOULD complete with
5241	   a Completion-Cause code of "partial-match-maxtime", unless the
5242	   recognizer cannot differentiate a partial-match in which case it MUST
5243	   complete with a Completion-Cause code of "no-match-maxtime".  The
5244	   recognizer MAY return results for the partially matched grammar.

5246	   5.2 If there was a full-match the recognizer MUST complete with a
5247	   Completion-Cause code of "success-maxtime".

5249	   5.3 If there was a no match the recognizer MUST complete with a
5250	   Completion-Cause code of "no-match-maxtime".

5252	   For the Hotword mode recognition:

5254	   Note that for Hotword mode recognition the START-OF-INPUT event is
5255	   not generated when speech or a DTMF digit is detected.

5257	   1.  When the No-Input-Timer expires, the recognizer MUST complete
5258	   with a Completion-Cause code of "no-input-timeout".

5260	   2.  If at any point a match occurs, the RECOGNIZE MUST complete with
5261	   a Completion-Cause code of "success".

5263	   3.  When the Recognition-Timer expires and there is not a match, the
5264	   RECOGNIZE MUST complete with a Completion-Cause code of "hotword-
5265	   maxtime".

5267	   4.  When the Recognition-Timer expires and there is a match, the
5268	   RECOGNIZE MUST complete with a Completion-Cause code of "success-
5269	   maxtime".

5271	   5.  When the Recognition-Timer is running but the detected speech/
5272	   DTMF has not resulted in a match, the Recognition-Timer MUST be
5273	   stopped and reset.  It MUST then be restarted when speech/DTMF is
5274	   again detected.

5276	   C->S:MRCP/2.0 ... RECOGNIZE 543257
5277	   Channel-Identifier:32AECB23433801@speechrecog
5278	           Confidence-Threshold:0.9
5279	   Content-Type:application/srgs+xml
5280	   Content-ID:<request1@form-level.store>
5281	   Content-Length:...

5283	   <?xml version="1.0"?>

5285	   <!-- the default grammar language is US English -->
5286	   <grammar xmlns="http://www.w3.org/2001/06/grammar"
5287	            xml:lang="en-US" version="1.0" root="request">

5289	   <!-- single language attachment to tokens -->
5290	       <rule id="yes">
5291	               <one-of>
5292	                     <item xml:lang="fr-CA">oui</item>
5293	                     <item xml:lang="en-US">yes</item>

5295	               </one-of>
5296	         </rule>

5298	   <!-- single language attachment to a rule expansion -->
5299	         <rule id="request">
5300	               may I speak to
5301	               <one-of xml:lang="fr-CA">
5302	                     <item>Michel Tremblay</item>
5303	                     <item>Andre Roy</item>
5304	               </one-of>
5305	         </rule>

5307	     </grammar>

5309	   S->C: MRCP/2.0 ... 543257 200 IN-PROGRESS
5310	   Channel-Identifier:32AECB23433801@speechrecog

5312	   S->C:MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
5313	   Channel-Identifier:32AECB23433801@speechrecog

5315	   S->C:MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
5316	   Channel-Identifier:32AECB23433801@speechrecog
5317	   Completion-Cause:000 success
5318	   Waveform-URI:<http://web.media.com/session123/audio.wav>;
5319	                 size=424252;duration=2543
5320	   Content-Type:application/nlsml+xml
5321	   Content-Length:...

5323	   <?xml version="1.0"?>
5324	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5325	           xmlns:ex="http://www.example.com/example"
5326	           grammar="session:request1@form-level.store">
5327	       <interpretation>
5328	           <instance name="Person">
5329	               <ex:Person>
5330	                   <ex:Name> Andre Roy </ex:Name>
5331	               </ex:Person>
5332	           </instance>
5333	               <input>   may I speak to Andre Roy </input>
5334	       </interpretation>
5335	   </result>

5337	                             RECOGNIZE Example

5339	   C->S:   MRCP/2.0 ... RECOGNIZE 543257
5340	           Channel-Identifier:32AECB23433801@speechrecog
5341	           Confidence-Threshold:0.9
5342	           Fetch-Timeout:20
5343	           Content-Type:application/srgs+xml
5344	           Content-Length:...

5346	           <?xml version="1.0"? Version="1.0" mode="voice"
5347	                 root="Basic md">
5348	            <rule id="rule_list" scope="public">
5349	                <one-of>
5350	                    <item weight=10>
5351	                        <ruleref uri=
5352	               "http://grammar.example.com/world-cities.grxml#canada"/>
5353	                   </item>
5354	                   <item weight=1.5>
5355	                       <ruleref uri=
5356	               "http://grammar.example.com/world-cities.grxml#america"/>
5357	                   </item>
5358	                  <item weight=0.5>
5359	                       <ruleref uri=
5360	               "http://grammar.example.com/world-cities.grxml#india"/>
5361	                  </item>
5362	              </one-of>
5363	           </rule>

5365	                         Second RECOGNIZE Example

5367	9.10.  STOP

5369	   The STOP method from the client to the server tells the resource to
5370	   stop recognition if a request is active.  If a RECOGNIZE request is
5371	   active and the STOP request successfully terminated it, then the
5372	   response header section contains an Active-Request-Id-List header
5373	   field containing the request-id of the RECOGNIZE request that was
5374	   terminated.  In this case, no RECOGNITION-COMPLETE event is sent for
5375	   the terminated request.  If there was no recognition active, then the
5376	   response MUST NOT contain an Active-Request-Id-List header field.
5377	   Either way the response MUST contain a status-code of 200 (Success).

5379	   C->S:   MRCP/2.0 ... RECOGNIZE 543257
5380	           Channel-Identifier:32AECB23433801@speechrecog
5381	           Confidence-Threshold:0.9
5382	           Content-Type:application/srgs+xml
5383	           Content-ID:<request1@form-level.store>
5384	           Content-Length:...

5386	           <?xml version="1.0"?>

5388	           <!-- the default grammar language is US English -->
5389	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5390	                    xml:lang="en-US" version="1.0" root="request">

5392	           <!-- single language attachment to tokens -->
5393	               <rule id="yes">
5394	                   <one-of>
5395	                         <item xml:lang="fr-CA">oui</item>
5396	                         <item xml:lang="en-US">yes</item>
5397	                   </one-of>
5398	               </rule>

5400	           <!-- single language attachment to a rule expansion -->
5401	               <rule id="request">
5402	               may I speak to
5403	                   <one-of xml:lang="fr-CA">
5404	                         <item>Michel Tremblay</item>
5405	                         <item>Andre Roy</item>
5406	                   </one-of>
5407	               </rule>
5408	           </grammar>

5410	   S->C:   MRCP/2.0 ... 543257 200 IN-PROGRESS
5411	           Channel-Identifier:32AECB23433801@speechrecog

5413	   C->S:   MRCP/2.0 ... STOP 543258 200
5414	           Channel-Identifier:32AECB23433801@speechrecog

5416	   S->C:   MRCP/2.0 ... 543258 200 COMPLETE
5417	           Channel-Identifier:32AECB23433801@speechrecog
5418	           Active-Request-Id-List:543257

5420	9.11.  GET-RESULT

5422	   The GET-RESULT method from the client to the server MAY be issued
5423	   when the recognizer resource is in the recognized state.  This
5424	   request allows the client to retrieve results for a completed
5425	   recognition.  This is useful if the client decides it wants more
5426	   alternatives or more information.  When the server receives this
5427	   request it re-computes and returns the results according to the
5428	   recognition constraints provided in the GET-RESULT request.

5430	   The GET-RESULT request can specify constraints such as a different
5431	   confidence-threshold, or n-best-list-length.  This capability is
5432	   OPTIONAL for MRCPv2 servers and the automatic speech recognition
5433	   engine in the server MUST return a status of unsupported feature if
5434	   not supported.

5436	   C->S:   MRCP/2.0 ... GET-RESULT 543257
5437	           Channel-Identifier:32AECB23433801@speechrecog
5438	           Confidence-Threshold:0.9

5440	   S->C:   MRCP/2.0 ... 543257 200 COMPLETE
5441	           Channel-Identifier:32AECB23433801@speechrecog
5442	           Content-Type:application/nlsml+xml
5443	           Content-Length:...

5445	           <?xml version="1.0"?>
5446	           <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5447	                   xmlns:ex="http://www.example.com/example"
5448	                   grammar="session:request1@form-level.store">
5449	               <interpretation>
5450	                   <instance name="Person">
5451	                       <ex:Person>
5452	                           <ex:Name> Andre Roy </ex:Name>
5453	                       </ex:Person>
5454	                   </instance>
5455	                   <input>   may I speak to Andre Roy </input>
5456	               </interpretation>
5457	           </result>

5459	9.12.  START-OF-INPUT

5461	   This is an event from the server to the client indicating that the
5462	   recognition resource has detected speech or a DTMF digit in the media
5463	   stream.  This event is useful in implementing kill-on-barge-in
5464	   scenarios when a synthesizer resource is in a different session from
5465	   the recognizer resource and hence is not aware of an incoming audio
5466	   source (see Section 8.4.2).  In these cases, it is up to the client
5467	   to act as a intermediary and respond to this event by issuing a
5468	   BARGE-IN-OCCURRED event to the synthesizer resource.  The recognizer
5469	   resource also MUST send a Proxy-Sync-Id header field with a unique
5470	   value for this event.

5472	   This event MUST be generated by the server irrespective of whether
5473	   the synthesizer and recognizer are on the same server or not.

5475	9.13.  START-INPUT-TIMERS

5477	   This request is sent from the client to the recognition resource when
5478	   it knows that a kill-on-barge-in prompt has finished playing (see
5479	   Section 8.4.2).  This is useful in the scenario when the recognition
5480	   and synthesizer engines are not in the same session.  When a kill-on-
5481	   barge-in prompt is being played, the client may want a RECOGNIZE
5482	   request to be simultaneously active so that it can detect and
5483	   implement kill-on-barge-in.  But at the same time the client doesn't
5484	   want the recognizer to start the no-input timers until the prompt is
5485	   finished.  The Start-Input-Timers header field in the RECOGNIZE
5486	   request allows the client to say whether the timers should be started
5487	   immediately or not.  If not, the recognizer resource MUST NOT start
5488	   the timers until the client sends a START-INPUT-TIMERS method to the
5489	   recognizer.

5491	9.14.  RECOGNITION-COMPLETE

5493	   This is an Event from the recognizer resource to the client
5494	   indicating that the recognition completed.  The recognition result is
5495	   sent in the body of the MRCPv2 message.  The request-state field MUST
5496	   be COMPLETE indicating that this is the last event with that
5497	   request-id, and that the request with that request-id is now
5498	   complete.  The server MUST maintain the recognizer context containing
5499	   the results and the audio waveform input of that recognition until
5500	   the next RECOGNIZE request is issued for that resource or the session
5501	   terminates.  If the server returns a URI to the audio waveform it
5502	   MUST do so in a Waveform-URI header field in the RECOGNITION-COMPLETE
5503	   event.  The client can use this URI to retrieve or playback the
5504	   audio.

5506	   Note if an enrollment session was active, the RECOGNITION-COMPLETE
5507	   event can contain either recognition or enrollment results depending
5508	   on what was spoken.  The following example shows a complete exchange
5509	   with a recognition result.

5511	   C->S:   MRCP/2.0 ... RECOGNIZE 543257
5512	           Channel-Identifier:32AECB23433801@speechrecog
5513	           Confidence-Threshold:0.9
5514	           Content-Type:application/srgs+xml
5515	           Content-ID:<request1@form-level.store>
5516	           Content-Length:...

5518	           <?xml version="1.0"?>

5520	           <!-- the default grammar language is US English -->
5521	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5522	                    xml:lang="en-US" version="1.0" root="request">

5524	           <!-- single language attachment to tokens -->
5525	               <rule id="yes">
5526	                      <one-of>
5527	                          <item xml:lang="fr-CA">oui</item>
5528	                          <item xml:lang="en-US">yes</item>
5529	                      </one-of>
5530	                 </rule>

5532	           <!-- single language attachment to a rule expansion -->
5533	                 <rule id="request">
5534	                     may I speak to
5535	                      <one-of xml:lang="fr-CA">
5536	                             <item>Michel Tremblay</item>
5537	                             <item>Andre Roy</item>
5538	                      </one-of>
5539	                 </rule>
5540	           </grammar>

5542	   S->C:   MRCP/2.0 ... 543257 200 IN-PROGRESS
5543	           Channel-Identifier:32AECB23433801@speechrecog

5545	   S->C:   MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
5546	           Channel-Identifier:32AECB23433801@speechrecog

5548	   S->C:   MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
5549	           Channel-Identifier:32AECB23433801@speechrecog
5550	           Completion-Cause:000 success
5551	           Waveform-URI:<http://web.media.com/session123/audio.wav>;
5552	                        size=342456;duration=25435
5553	           Content-Type:application/nlsml+xml
5554	           Content-Length:...

5556	           <?xml version="1.0"?>
5557	           <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5558	                   xmlns:ex="http://www.example.com/example"
5559	                   grammar="session:request1@form-level.store">
5560	               <interpretation>
5561	                   <instance name="Person">
5562	                       <ex:Person>
5563	                           <ex:Name> Andre Roy </ex:Name>
5564	                       </ex:Person>
5565	                   </instance>
5566	                   <input>   may I speak to Andre Roy </input>
5567	               </interpretation>
5568	           </result>

5570	   If the result were instead an enrollment result, the final message
5571	   from the server above could have instead been:

5573	   S->C:   MRCP/2.0 ... RECOGNITION-COMPLETE 543257 COMPLETE
5574	           Channel-Identifier:32AECB23433801@speechrecog
5575	           Completion-Cause:000 success
5576	           Content-Type:application/nlsml+xml
5577	           Content-Length:...

5579	           <?xml version= "1.0"?>
5580	           <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5581	                   grammar="Personal-Grammar-URI">
5582	               <enrollment-result>
5583	                   <num-clashes> 2 </num-clashes>
5584	                   <num-good-repetitions> 1 </num-good-repetitions>
5585	                   <num-repetitions-still-needed>
5586	                      1
5587	                   </num-repetitions-still-needed>
5588	                   <consistency-status> consistent </consistency-status>
5589	                   <clash-phrase-ids>
5590	                       <item> Jeff </item> <item> Andre </item>
5591	                   </clash-phrase-ids>
5592	                   <transcriptions>
5593	                        <item> m ay b r ow k er </item>
5594	                        <item> m ax r aa k ah </item>
5595	                   </transcriptions>
5596	                   <confusable-phrases>
5597	                        <item>
5598	                             <phrase> call </phrase>
5599	                             <confusion-level> 10 </confusion-level>
5600	                        </item>
5601	                   </confusable-phrases>
5602	               </enrollment-result>
5603	           </result>

5605	9.15.  START-PHRASE-ENROLLMENT

5607	   The START-PHRASE-ENROLLMENT method from the client to the server
5608	   starts a new phrase enrollment session during which the client can
5609	   call RECOGNIZE multiple times to enroll a new utterance in a grammar.
5610	   An enrollment session consists of a set of calls to RECOGNIZE in
5611	   which the caller speaks a phrase several times so the system can
5612	   "learn" it.  The phrase is then added to a personal grammar (speaker-
5613	   trained grammar), so that the system can recognize it later.

5615	   Only one phrase enrollment session can be active at a time for a
5616	   resource.  The Personal-Grammar-URI identifies the grammar that is
5617	   used during enrollment to store the personal list of phrases.  Once
5618	   RECOGNIZE is called, the result is returned in a RECOGNITION-COMPLETE
5619	   event and will contain either an enrollment result OR a recognition
5620	   result for a regular recognition.

5622	   Calling END-PHRASE-ENROLLMENT ends the ongoing phrase enrollment
5623	   session, which is typically done after a sequence of successful calls
5624	   to RECOGNIZE.  This method can be called to commit the new phrase to
5625	   the personal grammar or to abort the phrase enrollment session.

5627	   The grammar to contain the new enrolled phrase, specified by
5628	   Personal-Grammar-URI, is created if it does not exist.  Also, the
5629	   personal grammar MUST ONLY contain phrases added via a phrase
5630	   enrollment session.

5632	   The Phrase-ID passed to this method is used to identify this phrase
5633	   in the grammar and will be returned as the speech input when doing a
5634	   RECOGNIZE on the grammar.  The Phrase-NL similarly is returned in a
5635	   RECOGNITION-COMPLETE event in the same manner as other Natural
5636	   Language (NL) in a grammar.  The tag-format of this NL is
5637	   implementation specific.

5639	   If the client has specified Save-Best-Waveform as true, then the
5640	   response after ending the phrase enrollment session MUST contain the
5641	   location/URI of a recording of the best repetition of the learned
5642	   phrase.

5644	   C->S:   MRCP/2.0 ... START-PHRASE-ENROLLMENT 543258
5645	           Channel-Identifier:32AECB23433801@speechrecog
5646	           Num-Min-Consistent-Pronunciations:2
5647	           Consistency-Threshold:30
5648	           Clash-Threshold:12
5649	           Personal-Grammar-URI:<personal grammar uri>
5650	           Phrase-Id:<phrase id>
5651	           Phrase-NL:<NL phrase>
5652	           Weight:1
5653	           Save-Best-Waveform:true

5655	   S->C:   MRCP/2.0 ... 543258 200 COMPLETE
5656	           Channel-Identifier:32AECB23433801@speechrecog

5658	9.16.  ENROLLMENT-ROLLBACK

5660	   The ENROLLMENT-ROLLBACK method discards the last live utterance from
5661	   the RECOGNIZE operation.  The client can invoke this method when the
5662	   caller provides undesirable input such as non-speech noises, side-
5663	   speech, commands, utterance from the RECOGNIZE grammar, etc.  Note
5664	   that this method does not provide a stack of rollback states.
5665	   Executing ENROLLMENT-ROLLBACK twice in succession without an
5666	   intervening recognition operation has no effect the second time.

5668	   C->S:   MRCP/2.0 ... ENROLLMENT-ROLLBACK 543261
5669	           Channel-Identifier:32AECB23433801@speechrecog

5671	   S->C:   MRCP/2.0 ... 543261 200 COMPLETE
5672	           Channel-Identifier:32AECB23433801@speechrecog

5674	9.17.  END-PHRASE-ENROLLMENT

5676	   The client MAY call the END-PHRASE-ENROLLMENT method ONLY during an
5677	   active phrase enrollment session.  It MUST NOT be called during an
5678	   ongoing RECOGNIZE operation.  To commit the new phrase in the
5679	   grammar, the client MAY call this method once successive calls to
5680	   RECOGNIZE have succeeded and Num-Repetitions-Still-Needed has been
5681	   returned as 0 in the RECOGNITION-COMPLETE event.  Alternatively, the
5682	   client MAY abort the phrase enrollment session by calling this method
5683	   with the Abort-Phrase-Enrollment header field.

5685	   If the client has specified Save-Best-Waveform as true in the START-
5686	   PHRASE-ENROLLMENT request, then the response MUST contain a Waveform-
5687	   URI header whose value is the location/URI of a recording of the best
5688	   repetition of the learned phrase.

5690	  C->S:   MRCP/2.0 ... END-PHRASE-ENROLLMENT 543262
5691	          Channel-Identifier:32AECB23433801@speechrecog

5693	  S->C:   MRCP/2.0 ... 543262 200 COMPLETE
5694	          Channel-Identifier:32AECB23433801@speechrecog
5695	          Waveform-URI:<http://mediaserver.com/recordings/file1324.wav>;
5696	                       size=242453;duration=25432

5698	9.18.  MODIFY-PHRASE

5700	   The MODIFY-PHRASE method sent from the client to the server is used
5701	   to change the phrase ID, NL phrase and/or weight for a given phrase
5702	   in a personal grammar.

5704	   If no fields are supplied then calling this method has no effect.

5706	   C->S:   MRCP/2.0 ... MODIFY-PHRASE 543265
5707	           Channel-Identifier:32AECB23433801@speechrecog
5708	           Personal-Grammar-URI:<personal grammar uri>
5709	           Phrase-Id:<phrase id>
5710	           New-Phrase-Id:<new phrase id>
5711	           Phrase-NL:<NL phrase>
5712	           Weight:1

5714	   S->C:   MRCP/2.0 ... 543265 200 COMPLETE
5715	           Channel-Identifier:32AECB23433801@speechrecog

5717	9.19.  DELETE-PHRASE

5719	   The DELETE-PHRASE method sent from the client to the server is used
5720	   to delete a phase in a personal grammar added through voice
5721	   enrollment or text enrollment.  If the specified phrase does not
5722	   exist, this method has no effect.

5724	   C->S:   MRCP/2.0 ... DELETE-PHRASE 543266
5725	           Channel-Identifier:32AECB23433801@speechrecog
5726	           Personal-Grammar-URI:<personal grammar uri>
5727	           Phrase-Id:<phrase id>

5729	   S->C:   MRCP/2.0 ... 543266 200 COMPLETE
5730	           Channel-Identifier:32AECB23433801@speechrecog

5732	9.20.  INTERPRET

5734	   The INTERPRET method from the client to the server takes as input an
5735	   Interpret-Text header field containing the text for which the
5736	   semantic interpretation is desired, and returns, via the
5737	   INTERPRETATION-COMPLETE event, an interpretation result which is very
5738	   similar to the one returned from a RECOGNIZE method invocation.  Only
5739	   portions of the result relevant to acoustic matching are excluded
5740	   from the result.  The Interpret-Text header field MUST be included in
5741	   the INTERPRET request.

5743	   Recognizer grammar data is treated in the same way as it is when
5744	   issuing a RECOGNIZE method call.

5746	   If a RECOGNIZE, RECORD or another INTERPRET operation is already in
5747	   progress for the resource, the server MUST reject the request with a
5748	   response having a status-code of 402 "Method not valid in this
5749	   state", and a COMPLETE request state.

5751	   C->S:   MRCP/2.0 ... INTERPRET 543266
5752	           Channel-Identifier:32AECB23433801@speechrecog
5753	           Interpret-Text:may I speak to Andre Roy
5754	           Content-Type:application/srgs+xml
5755	           Content-ID:<request1@form-level.store>
5756	           Content-Length:...

5758	           <?xml version="1.0"?>
5759	           <!-- the default grammar language is US English -->
5760	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5761	                    xml:lang="en-US" version="1.0" root="request">
5762	           <!-- single language attachment to tokens -->
5763	               <rule id="yes">
5764	                   <one-of>
5765	                       <item xml:lang="fr-CA">oui</item>
5766	                       <item xml:lang="en-US">yes</item>
5767	                   </one-of>
5768	               </rule>

5770	           <!-- single language attachment to a rule expansion -->
5771	               <rule id="request">
5772	                   may I speak to
5773	                   <one-of xml:lang="fr-CA">
5774	                       <item>Michel Tremblay</item>
5775	                       <item>Andre Roy</item>
5776	                   </one-of>
5777	               </rule>
5778	           </grammar>

5780	   S->C:   MRCP/2.0 ... 543266 200 IN-PROGRESS
5781	           Channel-Identifier:32AECB23433801@speechrecog

5783	   S->C:   MRCP/2.0 ... INTERPRETATION-COMPLETE 543266 200 COMPLETE
5784	           Channel-Identifier:32AECB23433801@speechrecog
5785	           Completion-Cause:000 success
5786	           Content-Type:application/nlsml+xml
5787	           Content-Length:...

5789	           <?xml version="1.0"?>
5790	           <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5791	                   xmlns:ex="http://www.example.com/example"
5792	                   grammar="session:request1@form-level.store">
5793	               <interpretation>
5794	                   <instance name="Person">
5795	                       <ex:Person>
5796	                           <ex:Name> Andre Roy </ex:Name>
5797	                       </ex:Person>
5798	                   </instance>
5799	                   <input>   may I speak to Andre Roy </input>
5800	               </interpretation>
5801	           </result>

5803	9.21.  INTERPRETATION-COMPLETE

5805	   This event from the recognition resource to the client indicates that
5806	   the INTERPRET operation is complete.  The interpretation result is
5807	   sent in the body of the MRCP message.  The request state MUST be set
5808	   to COMPLETE.

5810	   The Completion-Cause header field MUST be included in this event and
5811	   MUST be set to an appropriate value from the list of cause codes.

5813	   C->S:    MRCP/2.0 ... INTERPRET 543266
5814	           Channel-Identifier:32AECB23433801@speechrecog
5815	           Interpret-Text:may I speak to Andre Roy
5816	           Content-Type:application/srgs+xml
5817	           Content-ID:<request1@form-level.store>
5818	           Content-Length:...

5820	           <?xml version="1.0"?>
5821	           <!-- the default grammar language is US English -->
5822	           <grammar xmlns="http://www.w3.org/2001/06/grammar"
5823	                    xml:lang="en-US" version="1.0" root="request">
5824	           <!-- single language attachment to tokens -->
5825	               <rule id="yes">
5826	                   <one-of>
5827	                       <item xml:lang="fr-CA">oui</item>
5828	                       <item xml:lang="en-US">yes</item>
5829	                   </one-of>
5830	               </rule>

5832	           <!-- single language attachment to a rule expansion -->
5833	               <rule id="request">
5834	                   may I speak to
5835	                   <one-of xml:lang="fr-CA">
5836	                       <item>Michel Tremblay</item>
5837	                       <item>Andre Roy</item>
5838	                   </one-of>
5839	               </rule>
5840	           </grammar>

5842	   S->C:    MRCP/2.0 ... 543266 200 IN-PROGRESS
5843	           Channel-Identifier:32AECB23433801@speechrecog

5845	   S->C:    MRCP/2.0 ... INTERPRETATION-COMPLETE 543266 200 COMPLETE
5846	           Channel-Identifier:32AECB23433801@speechrecog
5847	           Completion-Cause:000 success
5848	           Content-Type:application/nlsml+xml
5849	           Content-Length:...

5851	           <?xml version="1.0"?>
5852	           <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
5853	                   xmlns:ex="http://www.example.com/example"
5854	                   grammar="session:request1@form-level.store">
5855	               <interpretation>
5856	                   <instance name="Person">
5857	                       <ex:Person>
5858	                           <ex:Name> Andre Roy </ex:Name>

5860	                       </ex:Person>
5861	                   </instance>
5862	                   <input>   may I speak to Andre Roy </input>
5863	               </interpretation>
5864	           </result>

5866	9.22.  DTMF Detection

5868	   Digits received as DTMF tones are delivered to the recognition
5869	   resource in the MRCPv2 server in the RTP stream according to RFC4733
5870	   [RFC4733].  The automatic speech recognizer (ASR) MUST support
5871	   RFC4733 to recognize digits and it MAY support recognizing DTMF tones
5872	   [Q.23] in the audio.

5874	10.  Recorder Resource

5876	   This resource captures received audio and video and stores it as
5877	   content pointed to by a URI.  The main usages of recorders are
5878	   1.  to capture speech audio that may be submitted for recognition at
5879	       a later time, and
5880	   2.  recording voice or video mails.
5881	   Both these applications require functionality above and beyond those
5882	   specified by protocols such as RTSP [RFC2326].  This includes Audio
5883	   endpointing (i.e. detecting speech or silence).  The support for
5884	   video is OPTIONAL and is mainly capturing video mails that may
5885	   require the speech or audio processing mentioned above.

5887	   A recorder MUST provide endpointing capabilities for suppressing
5888	   silence at the beginning and end of a recording, and MAY also
5889	   suppress silence in the middle of a recording.  If such suppression
5890	   is done, the recorder MUST maintain timing metadata to indicate the
5891	   actual time stamps of the recorded media.

5893	   See the discussion on the sensitivity of saved waveforms in
5894	   Section 12.

5896	10.1.  Recorder State Machine

5898	   Idle                   Recording
5899	   State                  State
5900	    |                       |
5901	    |---------RECORD------->|
5902	    |                       |
5903	    |<------STOP------------|
5904	    |                       |
5905	    |<--RECORD-COMPLETE-----|
5906	    |                       |
5907	    |              |--------|
5908	    |       START-OF-INPUT  |
5909	    |              |------->|
5910	    |                       |
5911	    |              |--------|
5912	    |    START-INPUT-TIMERS |
5913	    |              |------->|
5914	    |                       |

5916	                          Recorder State Machine

5918	10.2.  Recorder Methods

5920	   The recorder resource supports the following methods.

5922	   recorder-method      =  "RECORD"
5923	                        /  "STOP"
5924	                        /  "START-INPUT-TIMERS"

5926	10.3.  Recorder Events

5928	   The recorder resource can generate the following events.

5930	   recorder-event       =  "START-OF-INPUT"
5931	                        /  "RECORD-COMPLETE"

5933	10.4.  Recorder Header Fields

5935	   Method invocations for the recorder resource can contain resource-
5936	   specific header fields containing request options and information to
5937	   augment the Method, Response or Event message it is associated with.

5939	   recorder-header      =  sensitivity-level
5940	                        /  no-input-timeout
5941	                        /  completion-cause
5942	                        /  completion-reason
5943	                        /  failed-uri
5944	                        /  failed-uri-cause
5945	                        /  record-uri
5946	                        /  media-type
5947	                        /  max-time
5948	                        /  trim-length
5949	                        /  final-silence
5950	                        /  capture-on-speech
5951	                        /  ver-buffer-utterance
5952	                        /  start-input-timers
5953	                        /  new-audio-channel

5955	10.4.1.  Sensitivity Level

5957	   To filter out background noise and not mistake it for speech, the
5958	   recorder can support a variable level of sound sensitivity.  The
5959	   Sensitivity-Level header field is a float value between 0.0 and 1.0
5960	   and allows the client to set the sensitivity level for the recorder.
5961	   This header field MAY occur in RECORD, SET-PARAMS or GET-PARAMS.  A
5962	   higher value for this header field means higher sensitivity.  The
5963	   default value for this header field is implementation specific.

5965	   sensitivity-level    =     "Sensitivity-Level" ":" FLOAT CRLF

5967	10.4.2.  No Input Timeout

5969	   When recording is started and there is no speech detected for a
5970	   certain period of time, the recorder can send a RECORD-COMPLETE event
5971	   to the client and terminate the record operation.  The No-Input-
5972	   Timeout header field can set this timeout value.  The value is in
5973	   milliseconds.  This header field MAY occur in RECORD, SET-PARAMS or
5974	   GET-PARAMS.  The value for this header field ranges from 0 to an
5975	   implementation specific maximum value.  The default value for this
5976	   header field is implementation specific.

5978	   no-input-timeout    =     "No-Input-Timeout" ":" 1*19DIGIT CRLF

5980	10.4.3.  Completion Cause

5982	   This header field MUST be part of a RECORD-COMPLETE event from the
5983	   recorder resource to the client.  This indicates the reason behind
5984	   the RECORD method completion.  This header field MUST be sent in the
5985	   RECORD responses if they return with a failure status and a COMPLETE
5986	   state.  In the ABNF below, the 'cause-code' contains a numerical
5987	   value selected from the Cause-Code column of the following table.
5988	   The 'cause-name' contains the corresponding token selected from the
5989	   Cause-Name column.

5991	   completion-cause         =  "Completion-Cause" ":" cause-code SP
5992	                               cause-name CRLF
5993	   cause-code               =  3DIGIT
5994	   cause-name               =  *VCHAR

5996	   +------------+-----------------------+------------------------------+
5997	   | Cause-Code | Cause-Name            | Description                  |
5998	   +------------+-----------------------+------------------------------+
5999	   | 000        | success-silence       | RECORD completed with a      |
6000	   |            |                       | silence at the end           |
6001	   | 001        | success-maxtime       | RECORD completed after       |
6002	   |            |                       | reaching maximum recording   |
6003	   |            |                       | time specified in record     |
6004	   |            |                       | method.                      |
6005	   | 002        | noinput-timeout       | RECORD failed due to no      |
6006	   |            |                       | input                        |
6007	   | 003        | uri-failure           | Failure accessing the record |
6008	   |            |                       | URI.                         |
6009	   | 004        | error                 | RECORD request terminated    |
6010	   |            |                       | prematurely due to a         |
6011	   |            |                       | recorder error.              |
6012	   +------------+-----------------------+------------------------------+

6014	10.4.4.  Completion Reason

6016	   This header field MAY be present in a RECORD-COMPLETE event coming
6017	   from the recorder resource to the client.  It contains the reason
6018	   text behind the RECORD request completion.  This header field
6019	   communicates text describing the reason for the failure.

6021	   The completion reason text is provided for client use in logs and for
6022	   debugging and instrumentation purposes.  Clients MUST NOT interpret
6023	   the completion reason text.

6025	   completion-reason        =  "Completion-Reason" ":"
6026	                               quoted-string CRLF

6028	10.4.5.  Failed URI

6030	   When a recorder method needs to post the audio to a URI and access to
6031	   the URI fails, the server MUST provide the failed URI in this header
6032	   field in the method response.

6034	   failed-uri               =  "Failed-URI" ":" absoluteURI CRLF

6036	10.4.6.  Failed URI Cause

6038	   When a recorder method needs to post the audio to a URI and access to
6039	   the URI fails, the server MAY provide the URI specific or protocol
6040	   specific response code through this header field in the method
6041	   response.  The value encoding is UTF-8 (RFC3629 [RFC3629]) to
6042	   accommodate any access protocol, some of which might have a response
6043	   string instead of a numeric response code.

6045	   failed-uri-cause         =  "Failed-URI-Cause" ":" 1*UTFCHAR
6046	                               CRLF

6048	10.4.7.  Record URI

6050	   When a recorder method contains this header field the server MUST
6051	   capture the audio and store it.  If the header field is present but
6052	   specified with no value, the server MUST store the content locally
6053	   and generate a URI that points to it.  This URI is then returned in
6054	   either the STOP response or the RECORD-COMPLETE event.  If the header
6055	   field in the RECORD method specifies a URI, the server MUST attempt
6056	   to capture and store the audio at that location.  If this header
6057	   field is not specified in the RECORD request, the server MUST capture
6058	   the audio, MUST encode it, and MUST send it in the STOP response or
6059	   the RECORD-COMPLETE event as a message body.  In this case, the
6060	   response carrying the audio content MUST include a Content ID (cid)
6061	   [RFC2392] value in this header pointing to the Content-ID in the
6062	   message body.

6064	   The server MUST also return the size in octets and the duration in
6065	   milliseconds of the recorded audio waveform as parameters associated
6066	   with the header field.

6068	   Implementations MUST support 'http' [RFC2616], 'https' [RFC2818],
6069	   'file' [RFC3986], and 'cid' [RFC2392] schemes in the URI.  Note that
6070	   implementations already exist that support other schemes.

6072	   record-uri               =  "Record-URI" ":" ["<" uri ">"
6073	                               ";" "size" "=" 1*19DIGIT
6074	                               ";" "duration" "=" 1*19DIGIT] CRLF

6076	10.4.8.  Media Type

6078	   A RECORD method MUST contain this header field, which specifies to
6079	   the server the Media Type of the captured audio or video.

6081	   media-type               =  "Media-Type" ":" media-type-value
6082	                               CRLF

6084	10.4.9.  Max Time

6086	   When recording is started this specifies the maximum length of the
6087	   recording in milliseconds, calculated from the time the actual
6088	   capture and store begins and is not necessarily the time the RECORD
6089	   method is received.  It specifies the duration before silence
6090	   suppression, if any, has been applied by the recorder resource.
6091	   After this time, the recording stops and the server MUST return a
6092	   RECORD-COMPLETE event to the client having a request-state of
6093	   "COMPLETE".  This header field MAY occur in RECORD, SET-PARAMS or
6094	   GET-PARAMS.  The value for this header field ranges from 0 to an
6095	   implementation specific maximum value.  A value of zero means
6096	   infinity and hence the recording continues until one or more of the
6097	   other stop conditions are met.  The default value for this header
6098	   field is 0.

6100	   max-time                 =  "Max-Time" ":" 1*19DIGIT CRLF

6102	10.4.10.  Trim-Length

6104	   This header field MAY be sent on a STOP method and specifies the
6105	   length of audio to be trimmed from the end of the recording after the
6106	   stop.  The length is interpreted to be in milliseconds.  The default
6107	   value for this header field is 0.

6109	   trim-length                 =  "Trim-Length" ":" 1*19DIGIT CRLF

6111	10.4.11.  Final Silence

6113	   When recorder is started and the actual capture begins, this header
6114	   field specifies the length of silence in the audio that is to be
6115	   interpreted as the end of the recording.  This header field MAY occur
6116	   in RECORD, SET-PARAMS or GET-PARAMS.  The value for this header field
6117	   ranges from 0 to an implementation specific maximum value and is
6118	   interpreted to be in milliseconds.  A value of zero means infinity
6119	   and hence the recording will continue until one of the other stop
6120	   conditions are met.  The default value for this header field is
6121	   implementation specific.

6123	   final-silence            =  "Final-Silence" ":" 1*19DIGIT CRLF

6125	10.4.12.  Capture On Speech

6127	   If false, the recorder MUST start capturing immediately when started.
6128	   If true, the recorder MUST wait for the endpointing functionality to
6129	   detect speech before it starts capturing.  This header field MAY
6130	   occur in the RECORD, SET-PARAMS or GET-PARAMS.  The value for this
6131	   header field is a Boolean.  The default value for this header field
6132	   is false.

6134	   capture-on-speech        =  "Capture-On-Speech " ":" BOOLEAN CRLF

6136	10.4.13.  Ver-Buffer-Utterance

6138	   This header field is the same as the one described for the verifier
6139	   resource (see Section 11.4.14).  This tells the server to buffer the
6140	   utterance associated with this recording request into the
6141	   verification buffer.  Sending this header field is permitted only if
6142	   the verification buffer is for the session.  This buffer is shared
6143	   across resources within a session.  It gets instantiated when a
6144	   verifier resource is added to this session and is released when the
6145	   verifier resource is released from the session.

6147	10.4.14.  Start Input Timers

6149	   This header field MAY be sent as part of the RECORD request.  A value
6150	   of false tells the recorder resource to start the operation, but not
6151	   to start the no-input timer until the client sends a START-INPUT-
6152	   TIMERS request to the recorder resource.  This is useful in the
6153	   scenario when the recorder and synthesizer resources are not part of
6154	   the same session.  When a kill-on-barge-in prompt is being played,
6155	   the client may want the RECORD request to be simultaneously active so
6156	   that it can detect and implement kill-on-barge-in (see
6157	   Section 8.4.2).  But at the same time the client doesn't want the
6158	   recorder resource to start the no-input timers until the prompt is
6159	   finished.  The default value is "true".

6161	   start-input-timers       =  "Start-Input-Timers" ":"
6162	                               BOOLEAN CRLF

6164	10.4.15.  New Audio Channel

6166	   This header field is the same as the one described for the Recognizer
6167	   resource (see Section 9.4.23).

6169	10.5.  Recorder Message Body

6171	   If the RECORD request did not have a Record-Uri header field, the
6172	   STOP response or the RECORD-COMPLETE event MUST contain a message
6173	   body carrying the captured audio.  In this case, the message carrying
6174	   the audio content has a Record-Uri header field with a Content ID
6175	   value pointing to the message body entity that contains the recorded
6176	   audio.  See Section 10.4.7 for details.

6178	10.6.  RECORD

6180	   The RECORD request places the recorder resource in the Recording
6181	   state.  Depending on the header fields specified in the RECORD
6182	   method, the resource may start recording the audio immediately or
6183	   wait for the end pointing functionality to detect speech in the
6184	   audio.  The audio is then made available to the client either in the
6185	   message body or as specified by Record-URI.

6187	   The server MUST support the "https" URI scheme and MAY support other
6188	   schemes.  Note that due to the sensitive nature of voice recordings,
6189	   any protocols used for dereferencing SHOULD employ integrity and
6190	   confidentiality, unless other means, such as use of a controlled
6191	   environment (see Section 4.2), are employed.

6193	   If a RECORD operation is already in progress, invoking this method
6194	   causes the server to issue a response having a status code of 402,
6195	   "Method not valid in this state", and a COMPLETE request state.

6197	   If the Record-URI is not valid, a status code of 404, "Illegal Value
6198	   for Header Field", is returned in the response.  If it is impossible
6199	   for the server to create the requested stored content, a status code
6200	   of 407, "Method or Operation Failed", is returned.

6202	   If the type specified in the Media-Type header field is not
6203	   supported, the server MUST respond with a status code of 409,
6204	   "Unsupported Header Field Value", with the Media-Type header field in
6205	   its response.

6207	   When the recording operation is initiated, the response indicates an
6208	   IN-PROGRESS request state.  The server MAY generate a subsequent
6209	   START-OF-INPUT event when speech is detected.  Upon completion of the
6210	   recording operation, the server generates a RECORD-COMPLETE event.

6212	   C->S:  MRCP/2.0 ... RECORD 543257
6213	          Channel-Identifier:32AECB23433802@recorder
6214	          Record-URI:<file://mediaserver/recordings/myfile.wav>
6215	          Media-Type:audio/wav
6216	          Capture-On-Speech:true
6217	          Final-Silence:300
6218	          Max-Time:6000

6220	   S->C:  MRCP/2.0 ... 543257 200 IN-PROGRESS
6221	          Channel-Identifier:32AECB23433802@recorder

6223	   S->C:  MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
6224	          Channel-Identifier:32AECB23433802@recorder

6226	   S->C:  MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE
6227	          Channel-Identifier:32AECB23433802@recorder
6228	          Completion-Cause:000 success-silence
6229	          Record-URI:<file://mediaserver/recordings/myfile.wav>;
6230	                     size=242552;duration=25645

6232	                              RECORD Example

6234	10.7.  STOP

6236	   The STOP method moves the recorder from the recording state back to
6237	   the idle state.  If a RECORD request is active and the STOP request
6238	   successfully terminated it, then the STOP response MUST contain an
6239	   Active-Request-Id-List header field containing the RECORD request-id
6240	   that was terminated.  In this case, no RECORD-COMPLETE event is sent
6241	   for the terminated request.  If there was no recording active, then
6242	   the response MUST NOT contain an Active-Request-Id-List header field.
6243	   If the recording was a success the STOP response MUST contain a
6244	   Record-URI header field pointing to the recorded audio content or to
6245	   an typed entity in the body of the STOP response containing the
6246	   recorded audio.  The STOP method MAY have a Trim-Length header field,
6247	   in which case the specified length of audio is trimmed from the end
6248	   of the recording after the stop.  In any case, the response MUST
6249	   contain a status-code of 200 (Success).

6251	   C->S:  MRCP/2.0 ... RECORD 543257
6252	          Channel-Identifier:32AECB23433802@recorder
6253	          Record-URI:<file://mediaserver/recordings/myfile.wav>
6254	          Capture-On-Speech:true
6255	          Final-Silence:300
6256	          Max-Time:6000

6258	   S->C:  MRCP/2.0 ... 543257 200 IN-PROGRESS
6259	          Channel-Identifier:32AECB23433802@recorder

6261	   S->C:  MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
6262	          Channel-Identifier:32AECB23433802@recorder

6264	   C->S:  MRCP/2.0 ... STOP 543257
6265	          Channel-Identifier:32AECB23433802@recorder
6266	          Trim-Length:200

6268	   S->C:  MRCP/2.0 ... 543257 200 COMPLETE
6269	          Channel-Identifier:32AECB23433802@recorder
6270	          Record-URI:<file://mediaserver/recordings/myfile.wav>;
6271	                     size=324253;duration=24561
6272	          Active-Request-Id-List:543257

6274	                               STOP Example

6276	10.8.  RECORD-COMPLETE

6278	   If the recording completes due to no-input, silence after speech, or
6279	   max-time, the server MUST generate the RECORD-COMPLETE event to the
6280	   client with a request-state of "COMPLETE".  If the recording was a
6281	   success the RECORD-COMPLETE event contains a Record-URI header field
6282	   pointing to the recorded audio file on the server or to a typed
6283	   entity in the message body containing the recorded audio.

6285	   C->S:  MRCP/2.0 ... RECORD 543257
6286	          Channel-Identifier:32AECB23433802@recorder
6287	          Record-URI:<file://mediaserver/recordings/myfile.wav>
6288	          Capture-On-Speech:true
6289	          Final-Silence:300
6290	          Max-Time:6000

6292	   S->C:  MRCP/2.0 ... 543257 200 IN-PROGRESS
6293	          Channel-Identifier:32AECB23433802@recorder

6295	   S->C:  MRCP/2.0 ... START-OF-INPUT 543257 IN-PROGRESS
6296	          Channel-Identifier:32AECB23433802@recorder

6298	   S->C:  MRCP/2.0 ... RECORD-COMPLETE 543257 COMPLETE
6299	          Channel-Identifier:32AECB23433802@recorder
6300	          Completion-Cause:000 success
6301	          Record-URI:<file://mediaserver/recordings/myfile.wav>;
6302	                     size=325325;duration=24652

6304	                          RECORD-COMPLETE Example

6306	10.9.  START-INPUT-TIMERS

6308	   This request is sent from the client to the recorder resource when it
6309	   discovers that a kill-on-barge-in prompt has finished playing (see
6310	   Section 8.4.2).  This is useful in the scenario when the recorder and
6311	   synthesizer resources are not in the same MRCPv2 session.  When a
6312	   kill-on-barge-in prompt is being played, the client wants the RECORD
6313	   request to be simultaneously active so that it can detect and
6314	   implement kill on barge-in.  But at the same time the client doesn't
6315	   want the recorder resource to start the no-input timers until the
6316	   prompt is finished.  The Start-Input-Timers header field in the
6317	   RECORD request allows the client to say if the timers should be
6318	   started or not.  In the above case the recorder resource does not
6319	   start the timers until the client sends a START-INPUT-TIMERS method
6320	   to the recorder.

6322	10.10.  START-OF-INPUT

6324	   The START-OF-INPUT event is returned from the server to the client
6325	   once the server has detected speech.  This event is always returned
6326	   by the recording resource when speech has been detected.  The
6327	   recorder resource also MUST send a Proxy-Sync-Id header field with a
6328	   unique value for this event.

6330	   S->C:  MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS
6331	          Channel-Identifier:32AECB23433801@recorder
6332	          Proxy-Sync-Id:987654321

6334	11.  Speaker Verification and Identification

6336	   This section describes the methods, responses and events employed by
6337	   MRCPv2 for doing Speaker Verification / Identification.

6339	   Speaker verification is a voice authentication methodology that can
6340	   be used to identify the speaker in order to grant the user access to
6341	   sensitive information and transactions.  Because speech is a
6342	   biometric, a number of essential security considerations related to
6343	   biometric authentication technologies apply to its implementation and
6344	   usage.  Implementers should carefully read Section 12 in this
6345	   document and the corresponding section of Speechsc Requirements
6346	   [RFC4313].  Implementers and deployers of this technology are
6347	   strongly encouraged to check the state of the art for any new risks
6348	   and solutions that might have been developed.

6350	   In speaker verification, a recorded utterance is compared to a
6351	   previously stored voiceprint which is in turn associated with a
6352	   claimed identity for that user.  Verification typically consists of
6353	   two phases: a designation phase to establish the claimed identity of
6354	   the caller and an execution phase in which a voiceprint is either
6355	   created (training) or used to authenticate the claimed identity
6356	   (verification).

6358	   Speaker identification is the process of associating an unknown
6359	   speaker with a member in a population.  It does not employ a claim of
6360	   identity.  When an individual claims to belong to a group (e.g., one
6361	   of the owners of a joint bank account) a group authentication is
6362	   performed.  This is generally implemented as a kind of verification
6363	   involving comparison with more than one voice model.  It is sometimes
6364	   called 'multi-verification'.  If the individual speaker can be
6365	   identified from the group, this may be useful for applications where
6366	   multiple users share the same access privileges to some data or
6367	   application.  Speaker identification and group authentication are
6368	   also done in two phases, a designation phase and an execution phase.
6369	   Note that from a functionality standpoint identification can be
6370	   thought of as a special case of group authentication (if the
6371	   individual is identified) where the group is the entire population,
6372	   although the implementation of speaker identification may be
6373	   different from the way group authentication is performed.  To
6374	   accommodate single-voiceprint verification, verification against
6375	   multiple voiceprints, group authentication, and identification, this
6376	   specification provides a single set of methods that can take a list
6377	   of identifiers, called "voiceprint identifiers", and return a list of
6378	   identifiers, with a score for each representing how well the input
6379	   speech matched each identifier.  The input and output lists of
6380	   identifiers do not have to match, allowing a vendor-specific group
6381	   identifier to be used as input to indicate that identification is to
6382	   be performed.  In this specification, the terms "Identification" and
6383	   "Multi-verification" are used to indicate that the input represents a
6384	   group (potentially the entire population) and that results for
6385	   multiple voiceprints may be returned.

6387	   It is possible for a verifier resource to share the same session with
6388	   a recognizer resource or to operate independently.  In order to share
6389	   the same session, the verifier and recognizer resources MUST be
6390	   allocated from within the same SIP dialog.  Otherwise, an independent
6391	   verifier resource, running on the same physical server or a separate
6392	   one, will be set up.  Note that in addition to allowing both
6393	   resources to be allocated in the same INVITE, it is possible to
6394	   allocate one initially and the other later via a re-INVITE.

6396	   Some of the speaker verification methods, described below, apply only
6397	   to a specific mode of operation.

6399	   The verifier resource has a verification buffer associated with it
6400	   (see Section 11.4.14).  This allows the storage of speech utterances
6401	   for the purposes of verification, identification or training from the
6402	   buffered speech.  This buffer is owned by the verifier resource, but
6403	   other input resources such as the recognition resource or recorder
6404	   resource may write to it.  This allows the speech received as part of
6405	   a recognition or recording operation to be later used for
6406	   verification, identification or training.  Access to the buffer is
6407	   limited to one operation at time.  Hence when the resource is doing
6408	   read, write or delete operation such as a RECOGNIZE with ver-buffer-
6409	   utterance turned on, another operation involving the buffer fails
6410	   with a status-code of 402.  The verification buffer can be cleared by
6411	   a CLEAR-BUFFER request from the client and is freed when the verifier
6412	   resource is deallocated or the session with the server terminates.

6414	   The verification buffer is different from collecting waveforms and
6415	   processing them using either the real time audio stream or stored
6416	   audio, because this buffering mechanism does not simply accumulate
6417	   speech to a buffer.  The verification buffer MAY contain additional
6418	   information gathered by the recognition resource that serves to
6419	   improve verification performance.

6421	11.1.  Speaker Verification State Machine

6423	   Speaker verification may operate in a training or a verification
6424	   session.  Starting one of these sessions does not change the state of
6425	   the verifier resource, i.e. it remains idle.  Once a verification or
6426	   training session is started, then utterances are trained or verified
6427	   by calling the VERIFY or VERIFY-FROM-BUFFER method.  The state of the
6428	   verifier resources goes from IDLE to VERIFYING state each time VERIFY
6429	   or VERIFY-FROM-BUFFER is called.

6431	     Idle              Session Opened       Verifying/Training
6432	     State             State                State
6433	      |                   |                         |
6434	      |--START-SESSION--->|                         |
6435	      |                   |                         |
6436	      |                   |----------|              |
6437	      |                   |     START-SESSION       |
6438	      |                   |<---------|              |
6439	      |                   |                         |
6440	      |<--END-SESSION-----|                         |
6441	      |                   |                         |
6442	      |                   |---------VERIFY--------->|
6443	      |                   |                         |
6444	      |                   |---VERIFY-FROM-BUFFER--->|
6445	      |                   |                         |
6446	      |                   |----------|              |
6447	      |                   |  VERIFY-ROLLBACK        |
6448	      |                   |<---------|              |
6449	      |                   |                         |
6450	      |                   |                |--------|
6451	      |                   | GET-INTERMEDIATE-RESULT |
6452	      |                   |                |------->|
6453	      |                   |                         |
6454	      |                   |                |--------|
6455	      |                   |     START-INPUT-TIMERS  |
6456	      |                   |                |------->|
6457	      |                   |                         |
6458	      |                   |                |--------|
6459	      |                   |         START-OF-INPUT  |
6460	      |                   |                |------->|
6461	      |                   |                         |
6462	      |                   |<-VERIFICATION-COMPLETE--|
6463	      |                   |                         |
6464	      |                   |<--------STOP------------|
6465	      |                   |                         |
6466	      |                   |----------|              |
6467	      |                   |         STOP            |
6468	      |                   |<---------|              |
6469	      |                   |                         |
6470	      |----------|        |                         |
6471	      |         STOP      |                         |
6472	      |<---------|        |                         |
6473	      |                   |----------|              |
6474	      |                   |    CLEAR-BUFFER         |
6475	      |                   |<---------|              |
6476	      |                   |                         |
6477	      |----------|        |                         |
6478	      |   CLEAR-BUFFER    |                         |
6479	      |<---------|        |                         |
6480	      |                   |                         |
6481	      |                   |----------|              |
6482	      |                   |   QUERY-VOICEPRINT      |
6483	      |                   |<---------|              |
6484	      |                   |                         |
6485	      |----------|        |                         |
6486	      | QUERY-VOICEPRINT  |                         |
6487	      |<---------|        |                         |
6488	      |                   |                         |
6489	      |                   |----------|              |
6490	      |                   |  DELETE-VOICEPRINT      |
6491	      |                   |<---------|              |
6492	      |                   |                         |
6493	      |----------|        |                         |
6494	      | DELETE-VOICEPRINT |                         |
6495	      |<---------|        |                         |

6497	                      Verifier Resource State Machine

6499	11.2.  Speaker Verification Methods

6501	   The verifier resource supports the following methods.

6503	   verifier-method          =  "START-SESSION"
6504	                            / "END-SESSION"
6505	                            / "QUERY-VOICEPRINT"
6506	                            / "DELETE-VOICEPRINT"
6507	                            / "VERIFY"
6508	                            / "VERIFY-FROM-BUFFER"
6509	                            / "VERIFY-ROLLBACK"
6510	                            / "STOP"
6511	                            / "CLEAR-BUFFER"
6512	                            / "START-INPUT-TIMERS"
6513	                            / "GET-INTERMEDIATE-RESULT"

6515	   These methods allow the client to control the mode and target of
6516	   verification or identification operations within the context of a
6517	   session.  All the verification input operations that occur within a
6518	   session can be used to create, update, or validate against the
6519	   voiceprint specified during the session.  At the beginning of each
6520	   session the verifier resource is reset to the state it had prior to
6521	   any previous verification session.

6523	   Verification/identification operations can be executed against live
6524	   or buffered audio.  The verifier resource provides methods for
6525	   collecting and evaluating live audio data, and methods for
6526	   controlling the verifier resource and adjusting its configured
6527	   behavior.

6529	   There are no dedicated methods for collecting buffered audio data.
6530	   This is accomplished by calling VERIFY, RECOGNIZE or RECORD as
6531	   appropriate for the resource, with the header field Ver-Buffer-
6532	   Utterance.  Then, when the following method is called verification is
6533	   performed using the set of buffered audio.
6534	   1.  VERIFY-FROM-BUFFER

6536	   The following methods are used for verification of live audio
6537	   utterances :
6538	   1.  VERIFY
6539	   2.  START-INPUT-TIMERS

6541	   The following methods are used for configuring the verifier resource
6542	   and for establishing resource states :
6543	   1.  START-SESSION
6544	   2.  END-SESSION
6545	   3.  QUERY-VOICEPRINT
6546	   4.  DELETE-VOICEPRINT
6547	   5.  VERIFY-ROLLBACK
6548	   6.  STOP
6549	   7.  CLEAR-BUFFER

6551	   The following method allows the polling a Verification in progress
6552	   for intermediate results.
6553	   1.  GET-INTERMEDIATE-RESULT

6555	11.3.  Verification Events

6557	   The verifier resource generates the following events.

6559	   verifier-event       =  "VERIFICATION-COMPLETE"
6560	                        /  "START-OF-INPUT"

6562	11.4.  Verification Header Fields

6564	   A verifier resource message can contain header fields containing
6565	   request options and information to augment the Request, Response or
6566	   Event message it is associated with.

6568	   verification-header      =  repository-uri
6569	                            /  voiceprint-identifier
6570	                            /  verification-mode
6571	                            /  adapt-model
6572	                            /  abort-model
6573	                            /  min-verification-score
6574	                            /  num-min-verification-phrases
6575	                            /  num-max-verification-phrases
6576	                            /  no-input-timeout
6577	                            /  save-waveform
6578	                            /  media-type
6579	                            /  waveform-uri
6580	                            /  voiceprint-exists
6581	                            /  ver-buffer-utterance
6582	                            /  input-waveform-uri
6583	                            /  completion-cause
6584	                            /  completion-reason
6585	                            /  speech-complete-timeout
6586	                            /  new-audio-channel
6587	                            /  abort-verification
6588	                            /  start-input-timers

6590	11.4.1.  Repository-URI

6592	   This header field specifies the voiceprint repository to be used or
6593	   referenced during speaker verification or identification operations.
6594	   This header field is required in the START-SESSION, QUERY-VOICEPRINT
6595	   and DELETE-VOICEPRINT methods.

6597	   repository-uri           =  "Repository-URI" ":" uri CRLF

6599	11.4.2.  Voiceprint-Identifier

6601	   This header field specifies the claimed identity for verification
6602	   applications.  The claimed identity MAY be used to specify an
6603	   existing voiceprint or to establish a new voiceprint.  This header
6604	   field MUST be present in the QUERY-VOICEPRINT and DELETE-VOICEPRINT
6605	   methods.  The Voiceprint-Identifier MUST be present in the START-
6606	   SESSION method for verification operations.  For Identification or
6607	   Multi-Verification operations this header field MAY contain a list of
6608	   voiceprint identifiers separated by semi-colons.  For identification
6609	   operations the client MAY also specify a voiceprint group identifier
6610	   instead of a list of voiceprint identifiers.

6612	   voiceprint-identifier        =  "Voiceprint-Identifier" ":"
6613	                                   vid *[";" vid] CRLF
6614	   vid                          =  1*VCHAR ["." 1*VCHAR]

6616	11.4.3.  Verification-Mode

6618	   This header field specifies the mode of the verifier resource and is
6619	   set by the START-SESSION method.  Acceptable values indicate whether
6620	   the verification session will train a voiceprint ("train") or verify/
6621	   identify using an existing voiceprint ("verify").

6623	   Training and verification sessions both require the voiceprint
6624	   Repository-URI to be specified in the START-SESSION.  In many usage
6625	   scenarios, however, the system does not know the speaker's claimed
6626	   identity until a recognition operation has, for example, recognized
6627	   an account number to which the user desires access.  In order to
6628	   allow the first few utterances of a dialog to be both recognized and
6629	   verified, the verifier resource on the MRCPv2 server retains a
6630	   buffer.  In this buffer, the MRCPv2 server accumulates recognized
6631	   utterances.  The client can later execute a verification method and
6632	   apply the buffered utterances to the current verification session.

6634	   Some voice user interfaces may require additional user input that
6635	   should not be subject to verification.  For example, the user's input
6636	   may have been recognized with low confidence and thus require a
6637	   confirmation cycle.  In such cases, the client SHOULD NOT execute the
6638	   VERIFY or VERIFY-FROM-BUFFER methods to collect and analyze the
6639	   caller's input.  A separate recognizer resource can analyze the
6640	   caller's response without any participation by the verifier resource.

6642	   Once the following conditions have been met:
6643	   1.  Voiceprint identity has been successfully established through the
6644	       voiceprint identifier header fields of the START-SESSION method,
6645	       and
6646	   2.  the verification mode has been set to one of "train" or "verify",
6647	   the verifier resource can begin providing verification information
6648	   during verification operations.  If the verifier resource does not
6649	   reach one of the two major states ("train" or "verify") , it MUST
6650	   report an error condition in the MRCPv2 status code to indicate why
6651	   the verifier resource is not ready for the corresponding usage.

6653	   The value of verification-mode is persistent within a verification
6654	   session.  If the client attempts to change the mode during a
6655	   verification session, the verifier resource reports an error and the
6656	   mode retains its current value.

6658	   verification-mode            =  "Verification-Mode" ":"
6659	                                   verification-mode-string

6661	   verification-mode-string     =  "train"
6662	                                /  "verify"

6664	11.4.4.  Adapt-Model

6666	   This header field indicates the desired behavior of the verifier
6667	   resource after a successful verification operation.  If the value of
6668	   this header field is "true", the sever SHOULD use audio collected
6669	   during the verification session to update the voiceprint to account
6670	   for ongoing changes in a speaker's incoming speech characteristics,
6671	   unless local policy prohibits updating the voiceprint.  If the value
6672	   is "false" (the default), the server MUST NOT update the voiceprint.
6673	   This header field MAY occur in the START-SESSION method.

6675	   adapt-model              = "Adapt-Model" ":" BOOLEAN CRLF

6677	11.4.5.  Abort-Model

6679	   The Abort-Model header field indicates the desired behavior of the
6680	   verifier resource upon session termination.  If the value of this
6681	   header field is "true", the server MUST discard any pending changes
6682	   to a voiceprint due to verification training or verification
6683	   adaptation.  If the value is "false" (the default), the server MUST
6684	   commit any pending changes for a training session or a successful
6685	   verification session to the voiceprint repository.  A value of "true"
6686	   for Abort-Model overrides a value of "true" for the Adapt-Model
6687	   header field.  This header field MAY occur in the END-SESSION method.

6689	   abort-model             = "Abort-Model" ":" BOOLEAN CRLF

6691	11.4.6.  Min-Verification-Score

6693	   The Min-Verification-Score header field, when used with a verifier
6694	   resource through a SET-PARAMS, GET-PARAMS or START-SESSION method,
6695	   determines the minimum verification score for which a verification
6696	   decision of "accepted" may be declared by the server.  This is a
6697	   float value between -1.0 and 1.0.  The default value for this header
6698	   field is implementation specific.

6700	   min-verification-score  = "Min-Verification-Score" ":"
6701	                             [ %x2D ] FLOAT CRLF

6703	11.4.7.  Num-Min-Verification-Phrases

6705	   The Num-Min-Verification-Phrases header field is used to specify the
6706	   minimum number of valid utterances before a positive decision is
6707	   given for verification.  The value for this header field is an
6708	   integer and the default value is 1.  The verifier resource MUST NOT
6709	   declare a verification 'accepted' unless Num-Min-Verification-Phrases
6710	   valid utterances have been received.  The minimum value is 1.  This
6711	   header field MAY occur in START-SESSION, SET-PARAMS or GET-PARAMS.

6713	   num-min-verification-phrases =  "Num-Min-Verification-Phrases" ":"
6714	                                   1*19DIGIT CRLF

6716	11.4.8.  Num-Max-Verification-Phrases

6718	   The Num-Max-Verification-Phrases header field is used to specify the
6719	   number of valid utterances required before a decision is forced for
6720	   verification.  The verifier resource MUST NOT return a decision of
6721	   'undecided' once Num-Max-Verification-Phrases have been collected and
6722	   used to determine a verification score.  The value for this header
6723	   field is an integer and the minimum value is 1.  The default value is
6724	   implementation-specific.  This header field MAY occur in START-
6725	   SESSION, SET-PARAMS or GET-PARAMS.

6727	   num-max-verification-phrases =  "Num-Max-Verification-Phrases" ":"
6728	                                    1*19DIGIT CRLF

6730	11.4.9.  No-Input-Timeout

6732	   The No-Input-Timeout header field sets the length of time from the
6733	   start of the verification timers (see START-INPUT-TIMERS) until the
6734	   declaration of a no-input event in the VERIFICATION-COMPLETE server
6735	   event message.  The value is in milliseconds.  This header field MAY
6736	   occur in VERIFY, SET-PARAMS or GET-PARAMS.  The value for this header
6737	   field ranges from 0 to an implementation specific maximum value.  The
6738	   default value for this header field is implementation specific.

6740	   no-input-timeout         = "No-Input-Timeout" ":" 1*19DIGIT CRLF

6742	11.4.10.  Save-Waveform

6744	   This header field allows the client to request that the verifier
6745	   resource save the audio stream that was used for verification/
6746	   identification.  The verifier resource MUST attempt to record the
6747	   audio and make it available to the client in the form of a URI
6748	   returned in the Waveform-URI header field in the VERIFICATION-
6749	   COMPLETE event.  If there was an error in recording the stream or the
6750	   audio content is otherwise not available, the verifier resource MUST
6751	   return an empty Waveform-URI header field.  The default value for
6752	   this header field is "false".  This header field MAY appear in the
6753	   VERIFY method.  Note that this header field does not appear in the
6754	   VERIFY-FROM-BUFFER method since it only controls whether or not to
6755	   save the waveform for live verification / identification operations.

6757	   save-waveform            =  "Save-Waveform" ":" BOOLEAN CRLF

6759	11.4.11.  Media Type

6761	   This header field MAY be specified in the SET-PARAMS, GET-PARAMS or
6762	   the VERIFY methods and tells the server resource the Media Type of
6763	   the captured audio or video such as the one captured and returned by
6764	   the Waveform-URI header field.

6766	   media-type               =  "Media-Type" ":" media-type-value
6767	                               CRLF

6769	11.4.12.  Waveform-URI

6771	   If the Save-Waveform header field is set to true, the verifier
6772	   resource MUST attempt to record the incoming audio stream of the
6773	   verification into a file and provide a URI for the client to access
6774	   it.  This header field MUST be present in the VERIFICATION-COMPLETE
6775	   event if the Save-Waveform header field was set to true by the
6776	   client.  The value of the header field MUST be empty if there was
6777	   some error condition preventing the server from recording.
6778	   Otherwise, the URI generated by the server MUST be globally unique
6779	   across the server and all its verification sessions.  The content
6780	   MUST be available via the URI until the verification session ends.
6781	   Since the Save-Waveform header field applies only to live
6782	   verification / identification operations, the server can return the
6783	   Waveform-URI only in the VERIFICATION-COMPLETE event for live
6784	   verification / identification operations.

6786	   The server MUST also return the size in octets and the duration in
6787	   milliseconds of the recorded audio wave-form as parameters associated
6788	   with the header field.

6790	   waveform-uri             =  "Waveform-URI" ":" ["<" uri ">"
6791	                               ";" "size" "=" 1*19DIGIT
6792	                               ";" "duration" "=" 1*19DIGIT] CRLF

6794	11.4.13.  Voiceprint-Exists

6796	   This header field MUST be returned in QUERY-VOICEPRINT and DELETE-
6797	   VOICEPRINT responses.  This is the status of the voiceprint specified
6798	   in the QUERY-VOICEPRINT method.  For the DELETE-VOICEPRINT method
6799	   this header field indicates the status of the voiceprint at the
6800	   moment the method execution started.

6802	   voiceprint-exists    =  "Voiceprint-Exists" ":" BOOLEAN CRLF

6804	11.4.14.  Ver-Buffer-Utterance

6806	   This header field is used to indicate that this utterance could be
6807	   later considered for Speaker Verification.  This way, a client can
6808	   request the server to buffer utterances while doing regular
6809	   recognition or verification activities and speaker verification can
6810	   later be requested on the buffered utterances.  This header field is
6811	   optional in the RECOGNIZE, VERIFY and RECORD methods.  The default
6812	   value for this header field is "false".

6814	   ver-buffer-utterance     = "Ver-Buffer-Utterance" ":" BOOLEAN
6815	                              CRLF

6817	11.4.15.  Input-Waveform-Uri

6819	   This header field specifies stored audio content that the client
6820	   requests the server to fetch and process according to the current
6821	   verification mode, either to train the voiceprint or verify a claimed
6822	   identity.  This header field enables the client to implement the
6823	   buffering use case where the recognizer and verifier resources are in
6824	   different sessions and the verification buffer technique cannot be
6825	   used.  It MAY be specified on the VERIFY request.

6827	   input-waveform-uri           =  "Input-Waveform-URI" ":" uri CRLF

6829	11.4.16.  Completion-Cause

6831	   This header field MUST be part of a VERIFICATION-COMPLETE event from
6832	   the verifier resource to the client.  This indicates the cause of
6833	   VERIFY or VERIFY-FROM-BUFFER method completion.  This header field
6834	   MUST be sent in the VERIFY, VERIFY-FROM-BUFFER, and QUERY-VOICEPRINT
6835	   responses, if they return with a failure status and a COMPLETE state.
6836	   In the ABNF below, the 'cause-code' contains a numerical value
6837	   selected from the Cause-Code column of the following table.  The
6838	   'cause-name' contains the corresponding token selected from the
6839	   Cause-Name column.

6841	   completion-cause         =  "Completion-Cause" ":" cause-code SP
6842	                               cause-name CRLF
6843	   cause-code               =  3DIGIT
6844	   cause-name               =  *VCHAR
6845	   +------------+--------------------------+---------------------------+
6846	   | Cause-Code | Cause-Name               | Description               |
6847	   +------------+--------------------------+---------------------------+
6848	   | 000        | success                  | VERIFY or                 |
6849	   |            |                          | VERIFY-FROM-BUFFER        |
6850	   |            |                          | request completed         |
6851	   |            |                          | successfully.  The verify |
6852	   |            |                          | decision can be           |
6853	   |            |                          | "accepted", "rejected",   |
6854	   |            |                          | or "undecided".           |
6855	   | 001        | error                    | VERIFY or                 |
6856	   |            |                          | VERIFY-FROM-BUFFER        |
6857	   |            |                          | request terminated        |
6858	   |            |                          | prematurely due to a      |
6859	   |            |                          | verifier resource or      |
6860	   |            |                          | system error.             |
6861	   | 002        | no-input-timeout         | VERIFY request completed  |
6862	   |            |                          | with no result due to a   |
6863	   |            |                          | no-input-timeout.         |
6864	   | 003        | too-much-speech-timeout  | VERIFY request completed  |
6865	   |            |                          | with no result due to too |
6866	   |            |                          | much speech.              |
6867	   | 004        | speech-too-early         | VERIFY request completed  |
6868	   |            |                          | with no result due to     |
6869	   |            |                          | spoke too soon.           |
6870	   | 005        | buffer-empty             | VERIFY-FROM-BUFFER        |
6871	   |            |                          | request completed with no |
6872	   |            |                          | result due to empty       |
6873	   |            |                          | buffer.                   |
6874	   | 006        | out-of-sequence          | Verification operation    |
6875	   |            |                          | failed due to             |
6876	   |            |                          | out-of-sequence method    |
6877	   |            |                          | invocations.  For example |
6878	   |            |                          | calling VERIFY before     |
6879	   |            |                          | QUERY-VOICEPRINT.         |
6880	   | 007        | repository-uri-failure   | Failure accessing         |
6881	   |            |                          | Repository URI.           |
6882	   | 008        | repository-uri-missing   | Repository-uri is not     |
6883	   |            |                          | specified.                |
6884	   | 009        | voiceprint-id-missing    | Voiceprint-identification |
6885	   |            |                          | is not specified.         |
6886	   | 010        | voiceprint-id-not-exist  | Voiceprint-identification |
6887	   |            |                          | does not exist in the     |
6888	   |            |                          | voiceprint repository.    |
6889	   | 011        | speech-not-usable        | VERIFY request completed  |
6890	   |            |                          | with no result because    |
6891	   |            |                          | the speech was not usable |
6892	   |            |                          | (too noisy, too short,    |
6893	   |            |                          | etc.)                     |
6894	   +------------+--------------------------+---------------------------+

6896	11.4.17.  Completion Reason

6898	   This header field MAY be specified in a VERIFICATION-COMPLETE event
6899	   coming from the verifier resource to the client.  It contains the
6900	   reason text behind the VERIFY request completion.  This header field
6901	   communicates text describing the reason for the failure.

6903	   The completion reason text is provided for client use in logs and for
6904	   debugging and instrumentation purposes.  Clients MUST NOT interpret
6905	   the completion reason text.

6907	   completion-reason        =  "Completion-Reason" ":"
6908	                               quoted-string CRLF

6910	11.4.18.  Speech Complete Timeout

6912	   This header field is the same as the one described for the Recognizer
6913	   resource.  See Section 9.4.15.  This header field MAY occur in
6914	   VERIFY, SET-PARAMS, or GET-PARAMS.

6916	11.4.19.  New Audio Channel

6918	   This header field is the same as the one described for the Recognizer
6919	   resource.  See Section 9.4.23.  This header field MAY be specified in
6920	   a VERIFY request.

6922	11.4.20.  Abort-Verification

6924	   This header field MUST be sent in a STOP request to indicate whether
6925	   or not to abort a VERIFY method in progress.  A value of "true"
6926	   requests the server to discard the results.  A value of "false"
6927	   requests the server to return in the STOP response the verification
6928	   results obtained up to the point it received the STOP request.

6930	   Abort-verification   =  "Abort-Verification " ":" BOOLEAN CRLF

6932	11.4.21.  Start Input Timers

6934	   This header field MAY be sent as part of a VERIFY request.  A value
6935	   of false tells the verifier resource to start the VERIFY operation,
6936	   but not to start the no-input timer yet.  The verifier resource MUST
6937	   NOT start the timers until the client sends a START-INPUT-TIMERS
6938	   request to the resource.  This is useful in the scenario when the
6939	   verifier and synthesizer resources are not part of the same session.
6940	   In this scenario, when a kill-on-barge-in prompt is being played, the
6941	   client may want the VERIFY request to be simultaneously active so
6942	   that it can detect and implement kill-on-barge-in (see
6943	   Section 8.4.2).  But at the same time the client doesn't want the
6944	   verifier resource to start the no-input timers until the prompt is
6945	   finished.  The default value is "true".

6947	   start-input-timers       =  "Start-Input-Timers" ":"
6948	                               BOOLEAN CRLF

6950	11.5.  Verification Message Body

6952	   A verification response or event message can carry additional data as
6953	   described in the following subsection.

6955	11.5.1.  Verification Result Data

6957	   Verification results are returned to the client in the message body
6958	   of the VERIFICATION-COMPLETE event or the GET-INTERMEDIATE-RESULT
6959	   response message as described in Section 6.3.  Element and attribute
6960	   descriptions for the verification portion of the NLSML format are
6961	   provided in Section 11.5.2 with a normative definition of the schema
6962	   in Section 16.3.

6964	11.5.2.  Verification Result Elements

6966	   All verification elements are contained within a single
6967	   <verification-result> element under <result>.  The elements are
6968	   described below and have the schema defined in Section 16.2.  The
6969	   following elements are defined:

6971	   1.   Voiceprint
6972	   2.   Incremental
6973	   3.   Cumulative
6974	   4.   Decision
6975	   5.   Utterance-Length
6976	   6.   Device
6977	   7.   Gender
6978	   8.   Adapted
6979	   9.   Verification-Score
6980	   10.  Vendor-Specific-Results

6982	11.5.2.1.  Voiceprint

6984	   This element in the verification results provides information on how
6985	   the speech data matched a single voiceprint.  The result data
6986	   returned MAY have more than one such entity in the case of
6987	   Identification or Multi-Verification.  Each <voiceprint> element and
6988	   the XML data within the element describe verification result
6989	   information for how well the speech data matched that particular
6990	   voiceprint.  The list of voiceprint element data are ordered
6991	   according to their cumulative verification match scores, with the
6992	   highest score first.

6994	11.5.2.2.  Cumulative

6996	   Within each <voiceprint> element there MUST be a <cumulative> element
6997	   with the cumulative scores of how well multiple utterances matched
6998	   the voiceprint.

7000	11.5.2.3.  Incremental

7002	   The first <voiceprint> element MAY contain an <incremental> element
7003	   with the incremental scores of how well the last utterance matched
7004	   the voiceprint.

7006	11.5.2.4.  Decision

7008	   This element is found within the <incremental> or <cumulative>
7009	   element within the verification results.  Its value indicates the
7010	   verification decision.  It can have the values of "accepted",
7011	   "rejected" or "undecided".

7013	11.5.2.5.  Utterance-Length

7015	   This element MAY occur within either the <incremental> or
7016	   <cumulative> elements within the first <voiceprint> element.  Its
7017	   value indicates the size in milliseconds, respectively, of the last
7018	   utterance or the cumulated set of utterances.

7020	11.5.2.6.  Device

7022	   This element is found within the incremental or cumulative element
7023	   within the verification results.  Its value indicates the apparent
7024	   type of device used by the caller as determined by the verifier
7025	   resource.  It can have the values of "cellular-phone", "electret-
7026	   phone", "carbon-button-phone", or "unknown".

7028	11.5.2.7.  Gender

7030	   This element is found within the incremental or cumulative element
7031	   within the verification results.  Its value indicates the apparent
7032	   gender of the speaker as determined by the verifier resource.  It can
7033	   have the values of "male", "female" or "unknown".

7035	11.5.2.8.  Adapted

7037	   This element is found within the first <voiceprint> element within
7038	   the verification results.  When verification is trying to confirm the
7039	   voiceprint, this indicates if the voiceprint has been adapted as a
7040	   consequence of analyzing the source utterances.  It is not returned
7041	   during verification training.  The value can be "true" or "false".

7043	11.5.2.9.  Verification-Score

7045	   This element is found within the incremental or cumulative element
7046	   within the verification results.  Its value indicates the score of
7047	   the last utterance as determined by verification.

7049	   During verification, the higher the score the more likely it is that
7050	   the speaker is the same one as the one who spoke the voiceprint
7051	   utterances.  During training, the higher the score the more likely
7052	   the speaker is to have spoken all of the analyzed utterances.  The
7053	   value is a floating point between -1.0 and 1.0.  If there are no such
7054	   utterances the score is -1.  Note that the verification score is not
7055	   a probability value.

7057	11.5.2.10.  Vendor-Specific-Results

7059	   MRCPv2 servers MAY send verification results that contain
7060	   implementation specific data which augment the information provided
7061	   by the MRCPv2-defined elements.  Such data might be useful to clients
7062	   who have private knowledge of how to interpret these schema
7063	   extensions.  Implementation specific additions to the verification
7064	   results schema MUST belong to the vendor's own namespace.  In the
7065	   result structure, either they MUST be indicated by a namespace prefix
7066	   declared within the result, or they MUST be children of an element
7067	   identified as belonging to the respective namespace.

7069	   The following example shows the results of three voiceprints.  Note
7070	   that the first one has crossed the verification score threshold, and
7071	   the speaker has been accepted.  The voiceprint was also adapted with
7072	   the most recent utterance.

7074	   <?xml version="1.0"?>
7075	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
7076	           grammar="What-Grammar-URI">
7077	     <verification-result>
7078	       <voiceprint id="johnsmith">
7079	         <adapted> true </adapted>
7080	         <incremental>
7081	           <utterance-length> 500 </utterance-length>
7082	           <device> cellular-phone </device>
7083	           <gender> male </gender>
7084	           <decision> accepted </decision>
7085	           <verification-score> 0.98514 </verification-score>
7086	         </incremental>
7087	         <cumulative>
7088	           <utterance-length> 10000 </utterance-length>
7089	           <device> cellular-phone </device>
7090	           <gender> male </gender>
7091	           <decision> accepted </decision>
7092	           <verification-score> 0.96725</verification-score>
7093	         </cumulative>
7094	       </voiceprint>
7095	       <voiceprint id="marysmith">
7096	         <cumulative>
7097	           <verification-score> 0.93410 </verification-score>
7098	         </cumulative>
7099	       </voiceprint>
7100	       <voiceprint uri="juniorsmith">
7101	         <cumulative>
7102	           <verification-score> 0.74209 </verification-score>
7103	         </cumulative>
7104	       </voiceprint>
7105	     </verification-result>
7106	   </result>

7108	                      Verification Results Example 1

7110	   In this next example, the verifier has enough information to decide
7111	   to reject the speaker.

7113	   <?xml version="1.0"?>
7114	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
7115	           xmlns:xmpl="http://www.example.org/2003/12/mrcpv2"
7116	           grammar="What-Grammar-URI">
7117	     <verification-result>
7118	       <voiceprint id="johnsmith">
7119	         <incremental>
7120	           <utterance-length> 500 </utterance-length>
7121	           <device> cellular-phone </device>
7122	           <gender> male </gender>
7123	           <verification-score> 0.88514 </verification-score>
7124	           <xmpl:raspiness> high </xmpl:raspiness>
7125	           <xmpl:emotion> sadness </xmpl:emotion>
7126	         </incremental>
7127	         <cumulative>
7128	           <utterance-length> 10000 </utterance-length>
7129	           <device> cellular-phone </device>
7130	           <gender> male </gender>
7131	           <decision> rejected </decision>
7132	           <verification-score> 0.9345 </verification-score>
7133	         </cumulative>
7134	       </voiceprint>
7135	     </verification-result>
7136	   </result>

7138	                      Verification Results Example 2

7140	11.6.  START-SESSION

7142	   The START-SESSION method starts a Speaker Verification or
7143	   Identification session.  Execution of this method places the verifier
7144	   resource into its initial state.  If this method is called during an
7145	   ongoing verification session, the previous session is implicitly
7146	   aborted.  If this method is invoked when VERIFY or VERIFY-FROM-BUFFER
7147	   is active, the method fails and the server returns a status code of
7148	   402.

7150	   Upon completion of the START-SESSION method, the verifier resource
7151	   MUST have terminated any ongoing verification session, and cleared
7152	   any voiceprint designation.

7154	   A verification session is associated with the voiceprint repository
7155	   to be used during the session.  This is specified through the
7156	   Repository-URI header field (see Section 11.4.1).

7158	   The START-SESSION method also establishes, through the Voiceprint-
7159	   Identifier header field, which voiceprints are to be matched or
7160	   trained during the verification session.  If this is an
7161	   Identification session or if the client wants to do Multi-
7162	   Verification, the Voiceprint-Identifier header field contains a list
7163	   of semicolon separated voiceprint identifiers.

7165	   The Adapt-Model header field MAY also be present in the START-SESSION
7166	   request to indicate whether or not to adapt a voiceprint based on
7167	   data collected during the session (if the voiceprint verification
7168	   phase succeeds).  By default, the voiceprint model MUST NOT be
7169	   adapted with data from a verification session.

7171	   The START-SESSION also determines whether the session is for a train
7172	   or verify of a voiceprint.  Hence the Verification-Mode header field
7173	   MUST be sent in every START-SESSION request.  The value of the
7174	   Verification-Mode header field MUST be one of either "train" or
7175	   "verify".

7177	   Before a verification/identification session is started, the client
7178	   may only request that VERIFY-ROLLBACK and generic SET-PARAMS and GET-
7179	   PARAMS operations be performed on the verifier resource.  The server
7180	   MUST return status-code 402 "Method not valid in this state" for all
7181	   other verification operations.

7183	   A verifier resource MUST NOT have more than a single session active
7184	   at one time.

7186	   C->S:  MRCP/2.0 ... START-SESSION 314161
7187	          Channel-Identifier:32AECB23433801@speakverify
7188	          Repository-URI:http://www.example.com/voiceprintdbase/
7189	          Voiceprint-Mode:verify
7190	          Voiceprint-Identifier:johnsmith.voiceprint
7191	          Adapt-Model:true

7193	   S->C:  MRCP/2.0 ... 314161 200 COMPLETE
7194	          Channel-Identifier:32AECB23433801@speakverify

7196	11.7.  END-SESSION

7198	   The END-SESSION method terminates an ongoing verification session and
7199	   releases the verification voiceprint resources.  The session may
7200	   terminate in one of three ways:
7201	   1.  abort - the voiceprint adaptation or creation may be aborted so
7202	       that the voiceprint remains unchanged (or is not created).
7203	   2.  commit - when terminating a voiceprint training session, the new
7204	       voiceprint is committed to the repository.
7205	   3.  adapt - an existing voiceprint is modified using a successful
7206	       verification.

7208	   The Abort-Model header field MAY be included in the END-SESSION to
7209	   control whether or not to abort any pending changes to the
7210	   voiceprint.  The default behavior is to commit (not abort) any
7211	   pending changes to the designated voiceprint.

7213	   The END-SESSION method may be safely executed multiple times without
7214	   first executing the START-SESSION method.  Any additional executions
7215	   of this method without an intervening use of the START-SESSION method
7216	   have no effect on the verifier resource.

7218	   The following example assumes there is either a training session or a
7219	   verification session in progress.

7221	   C->S:  MRCP/2.0 ... END-SESSION 314174
7222	          Channel-Identifier:32AECB23433801@speakverify
7223	          Abort-Model:true

7225	   S->C:  MRCP/2.0 ... 314174 200 COMPLETE
7226	          Channel-Identifier:32AECB23433801@speakverify

7228	11.8.  QUERY-VOICEPRINT

7230	   The QUERY-VOICEPRINT method is used to get status information on a
7231	   particular voiceprint and can be used by the client to ascertain if a
7232	   voiceprint or repository exists and if it contains trained
7233	   voiceprints.

7235	   The response to the QUERY-VOICEPRINT request contains an indication
7236	   of the status of the designated voiceprint in the Voiceprint-Exists
7237	   header field, allowing the client to determine whether to use the
7238	   current voiceprint for verification, train a new voiceprint, or
7239	   choose a different voiceprint.

7241	   A voiceprint is completely specified by providing a repository
7242	   location and a voiceprint identifier.  The particular voiceprint or
7243	   identity within the repository is specified by a string identifier
7244	   that is unique within the repository.  The Voiceprint-Identifier
7245	   header field carries this unique voiceprint identifier within a given
7246	   repository.

7248	   The following example assumes a verification session is in progress
7249	   and the voiceprint exists in the voiceprint repository.

7251	   C->S:  MRCP/2.0 ... QUERY-VOICEPRINT 314168
7252	          Channel-Identifier:32AECB23433801@speakverify
7253	          Repository-URI:http://www.example.com/voiceprints/
7254	          Voiceprint-Identifier:johnsmith.voiceprint

7256	   S->C:  MRCP/2.0 ... 314168 200 COMPLETE
7257	          Channel-Identifier:32AECB23433801@speakverify
7258	          Repository-URI:http://www.example.com/voiceprints/
7259	          Voiceprint-Identifier:johnsmith.voiceprint
7260	          Voiceprint-Exists:true

7262	   The following example assumes that the URI provided in the
7263	   Repository-URI header field is a bad URI.

7265	   C->S:  MRCP/2.0 ... QUERY-VOICEPRINT 314168
7266	          Channel-Identifier:32AECB23433801@speakverify
7267	          Repository-URI:http://www.example.com/bad-uri/
7268	          Voiceprint-Identifier:johnsmith.voiceprint

7270	   S->C:  MRCP/2.0 ... 314168 405 COMPLETE
7271	          Channel-Identifier:32AECB23433801@speakverify
7272	          Repository-URI:http://www.example.com/bad-uri/
7273	          Voiceprint-Identifier:johnsmith.voiceprint
7274	          Completion-Cause:007 repository-uri-failure

7276	11.9.  DELETE-VOICEPRINT

7278	   The DELETE-VOICEPRINT method removes a voiceprint from a repository.
7279	   This method MUST carry the Repository-URI and Voiceprint-Identifier
7280	   header fields.

7282	   An MRCPv2 server MUST reject a DELETE-VOICEPRINT request with a 401
7283	   status code unless the MRCPv2 client has been authenticated and
7284	   authorized.  Note that MRCPv2 does not have a standard mechanism for
7285	   this.  See Section 12.8.

7287	   If the corresponding voiceprint does not exist, the DELETE-VOICEPRINT
7288	   method MUST return a 200 status code.

7290	   The following example demonstrates a DELETE-VOICEPRINT operation to
7291	   remove a specific voiceprint.

7293	   C->S:  MRCP/2.0 ... DELETE-VOICEPRINT 314168
7294	          Channel-Identifier:32AECB23433801@speakverify
7295	          Repository-URI:http://www.example.com/bad-uri/
7296	          Voiceprint-Identifier:johnsmith.voiceprint

7298	   S->C:  MRCP/2.0 ... 314168 200 COMPLETE
7299	          Channel-Identifier:32AECB23433801@speakverify

7301	11.10.  VERIFY

7303	   The VERIFY method is used to request that the verifier resource
7304	   either train/adapt the voiceprint or verify/identify a claimed
7305	   identity.  If the voiceprint is new or was deleted by a previous
7306	   DELETE-VOICEPRINT method, the VERIFY method trains the voiceprint.
7307	   If the voiceprint already exits, it is adapted and not retrained by
7308	   the VERIFY command.

7310	   C->S:  MRCP/2.0 ... VERIFY 543260
7311	          Channel-Identifier:32AECB23433801@speakverify

7313	   S->C:  MRCP/2.0 ... 543260 200 IN-PROGRESS
7314	          Channel-Identifier:32AECB23433801@speakverify

7316	   When the VERIFY request is completes, the MRCPv2 server MUST send a
7317	   VERIFICATION-COMPLETE event to the client.

7319	11.11.  VERIFY-FROM-BUFFER

7321	   The VERIFY-FROM-BUFFER method directs the verifier resource to verify
7322	   buffered audio against a voiceprint.  Only one VERIFY or VERIFY-FROM-
7323	   BUFFER method may be active for a verifier resource at a time.

7325	   The buffered audio is not consumed by this method and thus VERIFY-
7326	   FROM-BUFFER may be invoked multiple times by the client to attempt
7327	   verification against different voiceprints.

7329	   For the VERIFY-FROM-BUFFER method, the server MAY optionally return
7330	   an IN-PROGRESS response before the VERIFICATION-COMPLETE event.

7332	   When the VERIFY-FROM-BUFFER method is invoked and the verification
7333	   buffer is in use by another resource sharing it, the server MUST
7334	   return an IN-PROGRESS response and wait until the buffer is available
7335	   to it.  The verification buffer is owned by the verifier resource but
7336	   is shared with write access from other input resources on the same
7337	   session.  Hence, it is considered to be in use if there is a read or
7338	   write operation such as a RECORD or RECOGNIZE with the Ver-Buffer-
7339	   Utterance header field set to "true" on a resource that shares this
7340	   buffer.  Note that if a RECORD or RECOGNIZE method returns with a
7341	   failure cause code, the VERIFY-FROM-BUFFER request waiting to process
7342	   that buffer MUST also fail with a Completion-Cause of 005 (buffer-
7343	   empty).

7345	   The following example illustrates the usage of some buffering
7346	   methods.  In this scenario the client first performed a live
7347	   verification, but the utterance had been rejected.  In the meantime,
7348	   the utterance is also saved to the audio buffer.  Then, another
7349	   voiceprint is used to do verification against the audio buffer and
7350	   the utterance is accepted.  For the example, we assume both Num-Min-
7351	   Verification-Phrases and Num-Max-Verification-Phrases are 1.

7353	   C->S:  MRCP/2.0 ... START-SESSION 314161
7354	          Channel-Identifier:32AECB23433801@speakverify
7355	          Verification-Mode:verify
7356	          Adapt-Model:true
7357	          Repository-URI:http://www.example.com/voiceprints
7358	          Voiceprint-Identifier:johnsmith.voiceprint

7360	   S->C:  MRCP/2.0 ... 314161 200 COMPLETE
7361	          Channel-Identifier:32AECB23433801@speakverify

7363	   C->S:  MRCP/2.0 ... VERIFY 314162
7364	          Channel-Identifier:32AECB23433801@speakverify
7365	          Ver-buffer-utterance:true

7367	   S->C:  MRCP/2.0 ... 314162 200 IN-PROGRESS
7368	          Channel-Identifier:32AECB23433801@speakverify

7370	   S->C:  MRCP/2.0 ... VERIFICATION-COMPLETE 314162 COMPLETE
7371	          Channel-Identifier:32AECB23433801@speakverify
7372	          Completion-Cause:000 success
7373	          Content-Type:application/nlsml+xml
7374	          Content-Length:...

7376	          <?xml version="1.0"?>
7377	          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
7378	                  grammar="What-Grammar-URI">
7379	            <verification-result>
7380	              <voiceprint id="johnsmith">
7381	                <incremental>
7382	                  <utterance-length> 500 </utterance-length>
7383	                  <device> cellular-phone </device>
7384	                  <gender> female </gender>
7385	                  <decision> rejected </decision>
7386	                  <verification-score> 0.05465 </verification-score>
7387	                </incremental>
7388	                <cumulative>
7389	                  <utterance-length> 500 </utterance-length>
7390	                  <device> cellular-phone </device>
7391	                  <gender> female </gender>
7392	                  <decision> rejected </decision>
7393	                  <verification-score> 0.05465 </verification-score>
7394	                </cumulative>

7396	              </voiceprint>
7397	            </verification-result>
7398	          </result>

7400	   C->S:  MRCP/2.0 ... QUERY-VOICEPRINT 314163
7401	          Channel-Identifier:32AECB23433801@speakverify
7402	          Repository-URI:http://www.example.com/voiceprints/
7403	          Voiceprint-Identifier:johnsmith

7405	   S->C:  MRCP/2.0 ... 314163 200 COMPLETE
7406	          Channel-Identifier:32AECB23433801@speakverify
7407	          Repository-URI:http://www.example.com/voiceprints/
7408	          Voiceprint-Identifier:johnsmith.voiceprint
7409	          Voiceprint-Exists:true

7411	   C->S:  MRCP/2.0 ... START-SESSION 314164
7412	          Channel-Identifier:32AECB23433801@speakverify
7413	          Verification-Mode:verify
7414	          Adapt-Model:true
7415	          Repository-URI:http://www.example.com/voiceprints
7416	          Voiceprint-Identifier:marysmith.voiceprint

7418	   S->C:  MRCP/2.0 ... 314164 200 COMPLETE
7419	          Channel-Identifier:32AECB23433801@speakverify

7421	   C->S:  MRCP/2.0 ... VERIFY-FROM-BUFFER 314165
7422	          Channel-Identifier:32AECB23433801@speakverify

7424	   S->C:  MRCP/2.0 ... 314165 200 IN-PROGRESS
7425	          Channel-Identifier:32AECB23433801@speakverify

7427	   S->C:  MRCP/2.0 ... VERIFICATION-COMPLETE 314165 COMPLETE
7428	          Channel-Identifier:32AECB23433801@speakverify
7429	          Completion-Cause:000 success
7430	          Content-Type:application/nlsml+xml
7431	          Content-Length:...

7433	          <?xml version="1.0"?>
7434	          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
7435	                  grammar="What-Grammar-URI">
7436	            <verification-result>
7437	              <voiceprint id="marysmith">
7438	                <incremental>
7439	                  <utterance-length> 1000 </utterance-length>
7440	                  <device> cellular-phone </device>
7441	                  <gender> female </gender>
7442	                  <decision> accepted </decision>
7443	                  <verification-score> 0.98 </verification-score>

7445	                </incremental>
7446	                <cumulative>
7447	                  <utterance-length> 1000 </utterance-length>
7448	                  <device> cellular-phone </device>
7449	                  <gender> female </gender>
7450	                  <decision> accepted </decision>
7451	                  <verification-score> 0.98 </verification-score>
7452	                </cumulative>
7453	              </voiceprint>
7454	            </verification-result>
7455	          </result>

7457	   C->S:  MRCP/2.0 ... END-SESSION 314166
7458	          Channel-Identifier:32AECB23433801@speakverify

7460	   S->C:  MRCP/2.0 ... 314166 200 COMPLETE
7461	          Channel-Identifier:32AECB23433801@speakverify

7463	                        VERIFY-FROM-BUFFER example

7465	11.12.  VERIFY-ROLLBACK

7467	   The VERIFY-ROLLBACK method discards the last buffered utterance or
7468	   discards the last live utterances (when the mode is "train" or
7469	   "verify").  The client will likely want to invoke this method when
7470	   the user provides undesirable input such as non-speech noises, side-
7471	   speech, out-of-grammar utterances, commands, etc.  Note that this
7472	   method does not provide a stack of rollback states.  Executing
7473	   VERIFY-ROLLBACK twice in succession without an intervening
7474	   recognition operation has no effect on the second attempt.

7476	   C->S:  MRCP/2.0 ... VERIFY-ROLLBACK 314165
7477	          Channel-Identifier:32AECB23433801@speakverify

7479	   S->C:  MRCP/2.0 ... 314165 200 COMPLETE
7480	          Channel-Identifier:32AECB23433801@speakverify

7482	                          VERFY-ROLLBACK Example

7484	11.13.  STOP

7486	   The STOP method from the client to the server tells the verifier
7487	   resource to stop the VERIFY or VERIFY-FROM-BUFFER request if one is
7488	   active.  If such a request is active and the STOP request
7489	   successfully terminated it, then the response header section contains
7490	   an Active-Request-Id-List header field containing the request-id of
7491	   the VERIFY or VERIFY-FROM-BUFFER request that was terminated.  In
7492	   this case, no VERIFICATION-COMPLETE event is sent for the terminated
7493	   request.  If there was no verify request active, then the response
7494	   MUST NOT contain an Active-Request-Id-List header field.  Either way
7495	   the response MUST contain a status-code of 200 (Success).

7497	   The STOP method can carry an Abort-Verification header field which
7498	   specifies if the verification result until that point should be
7499	   discarded or returned.  If this header field is not present or if the
7500	   value is "true", the verification result is discarded and the STOP
7501	   response does not contain any result data.  If the header field is
7502	   present and its value is "false", the STOP response MUST contain a
7503	   Completion-Cause header field and carry the Verification result data
7504	   in its body.

7506	   An aborted VERIFY request does an automatic roll-back and hence does
7507	   not affect the cumulative score.  A VERIFY request that was stopped
7508	   with no Abort-Verification header field or with the Abort-
7509	   Verification header field set to "false" does affect cumulative
7510	   scores and would need to be explicitly rolled-back if the client does
7511	   not want the verification result considered in the cumulative scores.

7513	   The following example assumes a voiceprint identity has already been
7514	   established.

7516	   C->S:  MRCP/2.0 ... VERIFY 314177
7517	          Channel-Identifier:32AECB23433801@speakverify

7519	   S->C:  MRCP/2.0 ... 314177 200 IN-PROGRESS
7520	          Channel-Identifier:32AECB23433801@speakverify

7522	   C->S:  MRCP/2.0 ... STOP 314178
7523	          Channel-Identifier:32AECB23433801@speakverify

7525	   S->C:  MRCP/2.0 ... 314178 200 COMPLETE
7526	          Channel-Identifier:32AECB23433801@speakverify
7527	          Active-Request-Id-List:314177

7529	                         STOP verification Example

7531	11.14.  START-INPUT-TIMERS

7533	   This request is sent from the client to the verifier resource to
7534	   start the no-input timer, usually once the client has ascertained
7535	   that any audio prompts to the user have played to completion.

7537	   C->S:  MRCP/2.0 ... START-INPUT-TIMERS 543260
7538	          Channel-Identifier:32AECB23433801@speakverify

7540	   S->C:  MRCP/2.0 ... 543260 200 COMPLETE
7541	          Channel-Identifier:32AECB23433801@speakverify

7543	11.15.  VERIFICATION-COMPLETE

7545	   The VERIFICATION-COMPLETE event follows a call to VERIFY or VERIFY-
7546	   FROM-BUFFER and is used to communicate the verification results to
7547	   the client.  The event message body contains only verification
7548	   results.

7550	   S->C:  MRCP/2.0 ... VERIFICATION-COMPLETE 543259 COMPLETE
7551	          Completion-Cause:000 success
7552	          Content-Type:application/nlsml+xml
7553	          Content-Length:...

7555	          <?xml version="1.0"?>
7556	          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
7557	                  grammar="What-Grammar-URI">
7558	            <verification-result>
7559	              <voiceprint id="johnsmith">
7560	                <incremental>
7561	                  <utterance-length> 500 </utterance-length>
7562	                  <device> cellular-phone </device>
7563	                  <gender> male </gender>
7564	                  <decision> accepted </decision>
7565	                  <verification-score> 0.85 </verification-score>
7566	                </incremental>
7567	                <cumulative>
7568	                  <utterance-length> 1500 </utterance-length>
7569	                  <device> cellular-phone </device>
7570	                  <gender> male </gender>
7571	                  <decision> accepted </decision>
7572	                  <verification-score> 0.75 </verification-score>
7573	                </cumulative>
7574	              </voiceprint>
7575	            </verification-result>
7576	          </result>

7578	11.16.  START-OF-INPUT

7580	   The START-OF-INPUT event is returned from the server to the client
7581	   once the server has detected speech.  This event is always returned
7582	   by the verifier resource when speech has been detected, irrespective
7583	   of whether the recognizer and verifier resources share the same
7584	   session or not.

7586	   S->C:  MRCP/2.0 ... START-OF-INPUT 543259 IN-PROGRESS
7587	          Channel-Identifier:32AECB23433801@speakverify

7589	11.17.  CLEAR-BUFFER

7591	   The CLEAR-BUFFER method can be used to clear the verification buffer.
7592	   This buffer is used to buffer speech during recognition, record or
7593	   verification operations that may later be used by VERIFY-FROM-BUFFER.
7594	   As noted before, the buffer associated with the verifier resource is
7595	   shared by other input resources like recognizers and recorders.
7596	   Hence, a CLEAR-BUFFER request fails if the verification buffer is in
7597	   use.  This can happen when any one of the input resources that shares
7598	   this buffer has an active read or write operation such as RECORD,
7599	   RECOGNIZE or VERIFY with the Ver-Buffer-Utterance header field set to
7600	   "true".

7602	   C->S:  MRCP/2.0 ... CLEAR-BUFFER 543260
7603	          Channel-Identifier:32AECB23433801@speakverify

7605	   S->C:  MRCP/2.0 ... 543260 200 COMPLETE
7606	          Channel-Identifier:32AECB23433801@speakverify

7608	11.18.  GET-INTERMEDIATE-RESULT

7610	   A client can use the GET-INTERMEDIATE-RESULT method to poll for
7611	   intermediate results of a verification request that is in progress.
7612	   Invoking this method does not change the state of the resource.  The
7613	   verifier resource collects the accumulated verification results and
7614	   returns the information in the method response.  The message body in
7615	   the response to a GET-INTERMEDIATE-RESULT REQUEST contains only
7616	   verification results.  The method response MUST NOT contain a
7617	   Completion-Cause header field as the request is not yet complete.  If
7618	   the resource does not have a verification in progress the response
7619	   has a 402 failure status-code and no result in the body.

7621	   C->S:  MRCP/2.0 ... GET-INTERMEDIATE-RESULT 543260
7622	          Channel-Identifier:32AECB23433801@speakverify

7624	   S->C:  MRCP/2.0 ... 543260 200 COMPLETE
7625	          Channel-Identifier:32AECB23433801@speakverify
7626	          Content-Type:application/nlsml+xml
7627	          Content-Length:...

7629	          <?xml version="1.0"?>
7630	          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
7631	                  grammar="What-Grammar-URI">
7632	            <verification-result>
7633	              <voiceprint id="marysmith">
7634	                <incremental>
7635	                  <utterance-length> 50 </utterance-length>
7636	                  <device> cellular-phone </device>
7637	                  <gender> female </gender>
7638	                  <decision> undecided </decision>
7639	                  <verification-score> 0.85 </verification-score>
7640	                </incremental>
7641	                <cumulative>
7642	                  <utterance-length> 150 </utterance-length>
7643	                  <device> cellular-phone </device>
7644	                  <gender> female </gender>
7645	                  <decision> undecided </decision>
7646	                  <verification-score> 0.65 </verification-score>
7647	                </cumulative>
7648	              </voiceprint>
7649	            </verification-result>
7650	          </result>

7652	12.  Security Considerations

7654	   MRCPv2 is designed to comply with the security-related requirements
7655	   documented in SpeechSC Requirements [RFC4313].  Implementers and
7656	   users of MRCPv2 are strongly encouraged to read the Security
7657	   Considerations section of [RFC4313], because that document contains
7658	   discussion of a number of important security issues associated with
7659	   the utilization of speech as biometric authentication technology, and
7660	   on the threats against systems which store recorded speech, contain
7661	   large corpora of voiceprints, and send and receive sensitive
7662	   information based on voice input to a recognizer or speech output
7663	   from a synthesizer.  Specific security measures employed by MRCPv2
7664	   are summarized in the following subsections.  See the corresponding
7665	   sections of this specification for how the security-related machinery
7666	   is invoked by individual protocol operations.

7668	12.1.  Rendezvous and Session Establishment

7670	   MRCPv2 control sessions are established as media sessions described
7671	   by SDP within the context of a SIP dialog.  In order to ensure secure
7672	   rendezvous between MRCPv2 clients and servers, the following are
7673	   required:

7675	   1.  The SIP implementation in MRCPv2 clients and servers MUST support
7676	       SIP digest authentication [RFC3261] and SHOULD employ it.
7677	   2.  The SIP implementation in MRCPv2 clients and servers MUST support
7678	       'sips' URIs and SHOULD employ 'sips' URIs, including that clients
7679	       and servers SHOULD set up TLS [RFC5246] connections.
7680	   3.  If media stream cryptographic keying is done through SDP (e.g.
7681	       using [RFC4568]), the MRCPv2 clients and servers MUST employ
7682	       SIPS:.
7683	   4.  When TLS is used for SIP, the client MUST verify the identity of
7684	       the server to which it connects, following the rules and
7685	       guidelines defined in [RFC5922].

7687	12.2.  Control channel protection

7689	   Sensitive data is carried over the MRCPv2 control channel.  This
7690	   includes things like the output of speech recognition operations,
7691	   speaker verification results, input to text-to-speech conversion,
7692	   personally-identifying grammars, etc.  For this reason MRCPv2 servers
7693	   must be properly authenticated and the control channel must permit
7694	   the use of both confidentiality and integrity for the data.  To
7695	   ensure control channel protection, MRCPv2 clients and servers MUST
7696	   support TLS and SHOULD utilize it by default unless alternative
7697	   control channel protection is used.  When TLS is used, the client
7698	   MUST verify the identity of the server to which it connects,
7699	   following the rules and guidelines defined in [RFC4572].  If there
7700	   are multiple TLS-protected channels between the client and the
7701	   server, the server MUST NOT send a response to the client over a
7702	   channel for which the TLS identities of the server or client differ
7703	   from the channel over which the server received the corresponding
7704	   request.  Alternative control channel protection MAY be used if
7705	   desired (e.g.  Security Architecture for the Internet Protocol
7706	   (IPsec) [RFC4301]).

7708	12.3.  Media session protection

7710	   Sensitive data is also carried on media sessions terminating on
7711	   MRCPv2 servers (the other end of a media channel may or may not be on
7712	   the MRCPv2 client).  This data includes the user's spoken utterances
7713	   and the output of text-to-speech operations.  MRCPv2 servers MUST
7714	   support a security mechanism for protection of audio media sessions.
7715	   MRCPv2 clients that originate or consume audio similarly MUST support
7716	   a security mechanism for protection of the audio.  One such mechanism
7717	   is the Secure Real-time Transport Protocol (SRTP) [RFC3711].

7719	12.4.  Indirect Content Access

7721	   MCRPv2 employs content indirection extensively.  Content may be
7722	   fetched and/or stored based on URI-addressing on systems other than
7723	   the MRCPv2 client or server.  Not all of the stored content is
7724	   necessarily sensitive (e.g.  XML schemas), but the majority generally
7725	   needs protection, and some indirect content, such as voice recordings
7726	   and voiceprints, are extremely sensitive and must always be
7727	   protected.  MRCPv2 clients and servers MUST implement HTTPS for
7728	   indirect content access, and SHOULD employ secure access for all
7729	   sensitive indirect content.  Other secure URI schemes such as Secure
7730	   FTP (FTPS) [RFC4217] MAY also be used.  See Section 6.2.15 for the
7731	   header fields used to transfer cookie information between the MRCPv2
7732	   client and server if needed for authentication.

7734	   Access to URIs provided by servers introduces risks that need to be
7735	   considered.  Although RFC 6454 [RFC6454] discusses and focuses on a
7736	   same-origin policy, which MRCPv2 does not restrict URIs to, it still
7737	   provides an excellent description of the pitfalls of blindly
7738	   following server-provided URIs in section 3 of the RFC.  Servers also
7739	   need to be aware that clients could provide URIs to sites designed to
7740	   tie up the server in long or otherwise problematic document fetches.
7741	   MRCPv2 servers, and the services they access, MUST always be prepared
7742	   for the possibility of such a Denial of Service attack.

7744	   MRCPv2 makes no inherent assumptions about the lifetime and access
7745	   controls associated with a URI.  For example, if neither
7746	   authentication nor scheme-specific access controls are used, a leak
7747	   of the URI is equivalent to a leak of the content.  Moreover, MRCPv2
7748	   makes no specific demands on the lifetime of a URI.  If a server
7749	   offers a URI and the client takes a long, long time to access that
7750	   URI, the server may have removed the resource in the interim time
7751	   period.  MRCPv2 deals with this case by using the URI access scheme's
7752	   resource not found error, such as 404 for HTTPS.  How long a server
7753	   should keep a dynamic resource available is highly application and
7754	   context dependent.  However, the server SHOULD keep the resource
7755	   available for a reasonable amount of time to make it likely the
7756	   client will have the resource available when the client needs the
7757	   resource.  Conversely, to mitigate state exhaustion attacks, MRCPv2
7758	   servers are not obligated to keep resources and resource state in
7759	   perpetuity.  The server SHOULD delete dynamically-generated resources
7760	   associated with an MRCPv2 session when the session ends.

7762	   One method to avoid resource leakage is for the server to use
7763	   difficult-to-guess, one-time resource URIs.  In this instance, there
7764	   can be only a single access to the underlying resource using the
7765	   given URI.  A downside to this approach is if an attacker uses the
7766	   URI before the client uses the URI, then the client is denied the
7767	   resource.  Other methods would be to adopt a mechanism similar to the
7768	   URLAUTH IMAP extension [RFC4467], where the server sets cryptographic
7769	   checks on URI usage, as well as capabilities for expiration,
7770	   revocation, and so on.  Specifying such a mechanism is beyond the
7771	   scope of this document.

7773	12.5.  Protection of stored media

7775	   MRCPv2 applications often require the use of stored media.  Voice
7776	   recordings are both stored (e.g. for diagnosis and system tuning),
7777	   and fetched (for replaying utterances into multiple MRCPv2
7778	   resources).  Voiceprints are fundamental to the speaker
7779	   identification and verification functions.  This data can be
7780	   extremely sensitive and can present substantial privacy and
7781	   impersonation risks if stolen.  Systems employing MRCPv2 SHOULD be
7782	   deployed in ways that minimize these risks.  The SpeechSC
7783	   Requirements [RFC4313] contains a more extensive discussion of these
7784	   risks and ways they may be mitigated.

7786	12.6.  DTMF and recognition buffers

7788	   DTMF buffers and recognition buffers may grow large enough to exceed
7789	   the capabilities of a server, and the server MUST be prepared to
7790	   gracefully handle resource consumption.  A server MAY respond with
7791	   the appropriate recognition incomplete if the server is in danger of
7792	   running out of resources.

7794	12.7.  Client-set server parameters

7796	   In MRCPv2 there are a some tasks, such as URI resource fetches, that
7797	   the server does on behalf of the client.  To control this behavior,
7798	   MRCPv2 has a number of server parameters that a client can configure.
7799	   With one such parameter, Section 6.2.12, a malicious client could set
7800	   a very large value and then request the server to fetch a non-
7801	   existent document.  It is RECOMMENDED that servers be cautious about
7802	   accepting long timeout values or abnormally large values for other
7803	   client-set parameters.

7805	12.8.  DELETE-VOICEPRINT and authorization

7807	   Since this specification does not mandate a specific mechanism for
7808	   authentication and authorization when requesting DELETE-VOICEPRINT
7809	   (Section 11.9), there is a risk that an MRCPv2 server may not do such
7810	   a check for authentication and authorization.  In practice, each
7811	   provider of voice biometric solutions does insist on its own
7812	   authentication and authorization mechanism, outside of this
7813	   specification, so this is not likely to be a major problem.  If in
7814	   the future voice biometric providers standardize on such a mechanism,
7815	   then a future version of MRCP can mandate it.

7817	13.  IANA Considerations

7819	13.1.  New registries

7821	   This section describes the name spaces (registries) for MRCPv2 that
7822	   IANA is requested to create and maintain.  Assignment/registration
7823	   policies are described in RFC5226 [RFC5226].

7825	13.1.1.  MRCPv2 resource types

7827	   IANA SHALL create a new name space of "MRCPv2 resource types".  All
7828	   maintenance within and additions to the contents of this name space
7829	   MUST be according to the "Standards Action" registration policy.  The
7830	   initial contents of the registry, defined in Section 4.2, are given
7831	   below:
7832	   Resource type  Resource description  Reference
7833	   -------------  --------------------  ---------
7834	   speechrecog    Speech Recognizer     [RFCXXXX]
7835	   dtmfrecog      DTMF Recognizer       [RFCXXXX]
7836	   speechsynth    Speech Synthesizer    [RFCXXXX]
7837	   basicsynth     Basic Synthesizer     [RFCXXXX]
7838	   speakverify    Speaker Verifier      [RFCXXXX]
7839	   recorder       Speech Recorder       [RFCXXXX]

7841	13.1.2.  MRCPv2 methods and events

7843	   IANA SHALL create a new name space of "MRCPv2 methods and events".
7844	   All maintenance within and additions to the contents of this name
7845	   space MUST be according to the "Standards Action" registration
7846	   policy.  The initial contents of the registry, defined by the
7847	   "method-name" and "event-name" BNF in Section 15 and explained in
7848	   Section 5.2 and Section 5.5, are given below.

7850	   Name                     Resource type  Method/Event  Reference
7851	   ----                     -------------  ------------  ---------
7852	   SET-PARAMS               Generic        Method        [RFCXXXX]
7853	   GET-PARAMS               Generic        Method        [RFCXXXX]
7854	   SPEAK                    Synthesizer    Method        [RFCXXXX]
7855	   STOP                     Synthesizer    Method        [RFCXXXX]
7856	   PAUSE                    Synthesizer    Method        [RFCXXXX]
7857	   RESUME                   Synthesizer    Method        [RFCXXXX]
7858	   BARGE-IN-OCCURRED        Synthesizer    Method        [RFCXXXX]
7859	   CONTROL                  Synthesizer    Method        [RFCXXXX]
7860	   DEFINE-LEXICON           Synthesizer    Method        [RFCXXXX]
7861	   DEFINE-GRAMMAR           Recognizer     Method        [RFCXXXX]
7862	   RECOGNIZE                Recognizer     Method        [RFCXXXX]
7863	   INTERPRET                Recognizer     Method        [RFCXXXX]
7864	   GET-RESULT               Recognizer     Method        [RFCXXXX]
7865	   START-INPUT-TIMERS       Recognizer     Method        [RFCXXXX]
7866	   STOP                     Recognizer     Method        [RFCXXXX]
7867	   START-PHRASE-ENROLLMENT  Recognizer     Method        [RFCXXXX]
7868	   ENROLLMENT-ROLLBACK      Recognizer     Method        [RFCXXXX]
7869	   END-PHRASE-ENROLLMENT    Recognizer     Method        [RFCXXXX]
7870	   MODIFY-PHRASE            Recognizer     Method        [RFCXXXX]
7871	   DELETE-PHRASE            Recognizer     Method        [RFCXXXX]
7872	   RECORD                   Recorder       Method        [RFCXXXX]
7873	   STOP                     Recorder       Method        [RFCXXXX]
7874	   START-INPUT-TIMERS       Recorder       Method        [RFCXXXX]
7875	   START-SESSION            Verifier       Method        [RFCXXXX]
7876	   END-SESSION              Verifier       Method        [RFCXXXX]
7877	   QUERY-VOICEPRINT         Verifier       Method        [RFCXXXX]
7878	   DELETE-VOICEPRINT        Verifier       Method        [RFCXXXX]
7879	   VERIFY                   Verifier       Method        [RFCXXXX]
7880	   VERIFY-FROM-BUFFER       Verifier       Method        [RFCXXXX]
7881	   VERIFY-ROLLBACK          Verifier       Method        [RFCXXXX]
7882	   STOP                     Verifier       Method        [RFCXXXX]
7883	   START-INPUT-TIMERS       Verifier       Method        [RFCXXXX]
7884	   GET-INTERMEDIATE-RESULT  Verifier       Method        [RFCXXXX]
7885	   SPEECH-MARKER            Synthesizer    Event         [RFCXXXX]
7886	   SPEAK-COMPLETE           Synthesizer    Event         [RFCXXXX]
7887	   START-OF-INPUT           Recognizer     Event         [RFCXXXX]
7888	   RECOGNITION-COMPLETE     Recognizer     Event         [RFCXXXX]
7889	   INTERPRETATION-COMPLETE  Recognizer     Event         [RFCXXXX]
7890	   START-OF-INPUT           Recorder       Event         [RFCXXXX]
7891	   RECORD-COMPLETE          Recorder       Event         [RFCXXXX]
7892	   VERIFICATION-COMPLETE    Verifier       Event         [RFCXXXX]
7893	   START-OF-INPUT           Verifier       Event         [RFCXXXX]

7895	13.1.3.  MRCPv2 header fields

7897	   IANA SHALL create a new name space of "MRCPv2 header fields".  All
7898	   maintenance within and additions to the contents of this name space
7899	   MUST be according to the "Standards Action" registration policy.  The
7900	   initial contents of the registry, defined by the "message-header" BNF
7901	   in Section 15 and explained in Section 5.1, are given below.  Note
7902	   that the values permitted for the "Vendor-Specific-Parameters"
7903	   parameter are managed according to a different policy.  See
7904	   Section 13.1.6.
7905	   Name                               Resource type    Reference
7906	   ----                               -------------    ---------
7907	   Channel-Identifier                 Generic          [RFCXXXX]
7908	   Accept                             Generic          [RFC2616]
7909	   Active-Request-Id-List             Generic          [RFCXXXX]
7910	   Proxy-Sync-Id                      Generic          [RFCXXXX]
7911	   Accept-Charset                     Generic          [RFC2616]
7912	   Content-Type                       Generic          [RFCXXXX]
7913	   Content-ID               Generic  [RFC2392, RFC2046, and RFC5322]
7914	   Content-Base                       Generic          [RFCXXXX]
7915	   Content-Encoding                   Generic          [RFCXXXX]
7916	   Content-Location                   Generic          [RFCXXXX]
7917	   Content-Length                     Generic          [RFCXXXX]
7918	   Fetch-Timeout                      Generic          [RFCXXXX]
7919	   Cache-Control                      Generic          [RFCXXXX]
7920	   Logging-Tag                        Generic          [RFCXXXX]
7921	   Set-Cookie                         Generic          [RFCXXXX]
7922	   Vendor-Specific                    Generic          [RFCXXXX]
7923	   Jump-Size                          Synthesizer      [RFCXXXX]
7924	   Kill-On-Barge-In                   Synthesizer      [RFCXXXX]
7925	   Speaker-Profile                    Synthesizer      [RFCXXXX]
7926	   Completion-Cause                   Synthesizer      [RFCXXXX]
7927	   Completion-Reason                  Synthesizer      [RFCXXXX]
7928	   Voice-Parameter                    Synthesizer      [RFCXXXX]
7929	   Prosody-Parameter                  Synthesizer      [RFCXXXX]
7930	   Speech-Marker                      Synthesizer      [RFCXXXX]
7931	   Speech-Language                    Synthesizer      [RFCXXXX]
7932	   Fetch-Hint                         Synthesizer      [RFCXXXX]
7933	   Audio-Fetch-Hint                   Synthesizer      [RFCXXXX]
7934	   Failed-URI                         Synthesizer      [RFCXXXX]
7935	   Failed-URI-Cause                   Synthesizer      [RFCXXXX]
7936	   Speak-Restart                      Synthesizer      [RFCXXXX]
7937	   Speak-Length                       Synthesizer      [RFCXXXX]
7938	   Load-Lexicon                       Synthesizer      [RFCXXXX]
7939	   Lexicon-Search-Order               Synthesizer      [RFCXXXX]
7940	   Confidence-Threshold               Recognizer       [RFCXXXX]
7941	   Sensitivity-Level                  Recognizer       [RFCXXXX]
7942	   Speed-Vs-Accuracy                  Recognizer       [RFCXXXX]
7943	   N-Best-List-Length                 Recognizer       [RFCXXXX]
7944	   Input-Type                         Recognizer       [RFCXXXX]
7945	   No-Input-Timeout                   Recognizer       [RFCXXXX]
7946	   Recognition-Timeout                Recognizer       [RFCXXXX]
7947	   Waveform-URI                       Recognizer       [RFCXXXX]
7948	   Input-Waveform-URI                 Recognizer       [RFCXXXX]
7949	   Completion-Cause                   Recognizer       [RFCXXXX]
7950	   Completion-Reason                  Recognizer       [RFCXXXX]
7951	   Recognizer-Context-Block           Recognizer       [RFCXXXX]
7952	   Start-Input-Timers                 Recognizer       [RFCXXXX]
7953	   Speech-Complete-Timeout            Recognizer       [RFCXXXX]
7954	   Speech-Incomplete-Timeout          Recognizer       [RFCXXXX]
7955	   Dtmf-Interdigit-Timeout            Recognizer       [RFCXXXX]
7956	   Dtmf-Term-Timeout                  Recognizer       [RFCXXXX]
7957	   Dtmf-Term-Char                     Recognizer       [RFCXXXX]
7958	   Failed-URI                         Recognizer       [RFCXXXX]
7959	   Failed-URI-Cause                   Recognizer       [RFCXXXX]
7960	   Save-Waveform                      Recognizer       [RFCXXXX]
7961	   Media-Type                         Recognizer       [RFCXXXX]
7962	   New-Audio-Channel                  Recognizer       [RFCXXXX]
7963	   Speech-Language                    Recognizer       [RFCXXXX]
7964	   Ver-Buffer-Utterance               Recognizer       [RFCXXXX]
7965	   Recognition-Mode                   Recognizer       [RFCXXXX]
7966	   Cancel-If-Queue                    Recognizer       [RFCXXXX]
7967	   Hotword-Max-Duration               Recognizer       [RFCXXXX]
7968	   Hotword-Min-Duration               Recognizer       [RFCXXXX]
7969	   Interpret-Text                     Recognizer       [RFCXXXX]
7970	   Dtmf-Buffer-Time                   Recognizer       [RFCXXXX]
7971	   Clear-Dtmf-Buffer                  Recognizer       [RFCXXXX]
7972	   Early-No-Match                     Recognizer       [RFCXXXX]
7973	   Num-Min-Consistent-Pronunciations  Recognizer       [RFCXXXX]
7974	   Consistency-Threshold              Recognizer       [RFCXXXX]
7975	   Clash-Threshold                    Recognizer       [RFCXXXX]
7976	   Personal-Grammar-URI               Recognizer       [RFCXXXX]
7977	   Enroll-Utterance                   Recognizer       [RFCXXXX]
7978	   Phrase-ID                          Recognizer       [RFCXXXX]
7979	   Phrase-NL                          Recognizer       [RFCXXXX]
7980	   Weight                             Recognizer       [RFCXXXX]
7981	   Save-Best-Waveform                 Recognizer       [RFCXXXX]
7982	   New-Phrase-ID                      Recognizer       [RFCXXXX]
7983	   Confusable-Phrases-URI             Recognizer       [RFCXXXX]
7984	   Abort-Phrase-Enrollment            Recognizer       [RFCXXXX]
7985	   Sensitivity-Level                  Recorder         [RFCXXXX]
7986	   No-Input-Timeout                   Recorder         [RFCXXXX]
7987	   Completion-Cause                   Recorder         [RFCXXXX]
7988	   Completion-Reason                  Recorder         [RFCXXXX]
7989	   Failed-URI                         Recorder         [RFCXXXX]
7990	   Failed-URI-Cause                   Recorder         [RFCXXXX]
7991	   Record-URI                         Recorder         [RFCXXXX]
7992	   Media-Type                         Recorder         [RFCXXXX]
7993	   Max-Time                           Recorder         [RFCXXXX]
7994	   Trim-Length                        Recorder         [RFCXXXX]
7995	   Final-Silence                      Recorder         [RFCXXXX]
7996	   Capture-On-Speech                  Recorder         [RFCXXXX]
7997	   Ver-Buffer-Utterance               Recorder         [RFCXXXX]
7998	   Start-Input-Timers                 Recorder         [RFCXXXX]
7999	   New-Audio-Channel                  Recorder         [RFCXXXX]
8000	   Repository-URI                     Verifier         [RFCXXXX]
8001	   Voiceprint-Identifier              Verifier         [RFCXXXX]
8002	   Verification-Mode                  Verifier         [RFCXXXX]
8003	   Adapt-Model                        Verifier         [RFCXXXX]
8004	   Abort-Model                        Verifier         [RFCXXXX]
8005	   Min-Verification-Score             Verifier         [RFCXXXX]
8006	   Num-Min-Verification-Phrases       Verifier         [RFCXXXX]
8007	   Num-Max-Verification-Phrases       Verifier         [RFCXXXX]
8008	   No-Input-Timeout                   Verifier         [RFCXXXX]
8009	   Save-Waveform                      Verifier         [RFCXXXX]
8010	   Media-Type                         Verifier         [RFCXXXX]
8011	   Waveform-URI                       Verifier         [RFCXXXX]
8012	   Voiceprint-Exists                  Verifier         [RFCXXXX]
8013	   Ver-Buffer-Utterance               Verifier         [RFCXXXX]
8014	   Input-Waveform-URI                 Verifier         [RFCXXXX]
8015	   Completion-Cause                   Verifier         [RFCXXXX]
8016	   Completion-Reason                  Verifier         [RFCXXXX]
8017	   Speech-Complete-Timeout            Verifier         [RFCXXXX]
8018	   New-Audio-Channel                  Verifier         [RFCXXXX]
8019	   Abort-Verification                 Verifier         [RFCXXXX]
8020	   Start-Input-Timers                 Verifier         [RFCXXXX]
8021	   Input-Type                         Verifier         [RFCXXXX]

8023	13.1.4.  MRCPv2 status codes

8025	   IANA SHALL create a new name space of "MRCPv2 status codes" with the
8026	   initial values that are defined in Section 5.4 All maintenance within
8027	   and additions to the contents of this name space MUST be according to
8028	   the "Specification Required with Expert Review" registration policy.

8030	13.1.5.  Grammar Reference List Parameters

8032	   IANA SHALL create a new name space of "Grammar Reference List
8033	   Parameters".  All maintenance within and additions to the contents of
8034	   this name space MUST be according to the "Specification Required with
8035	   Expert Review" registration policy.  There is only one initial
8036	   parameter as shown below.

8038	   Name                       Reference
8039	   ----                       -------------
8040	   weight                     [RFCXXXX]

8042	13.1.6.  MRCPv2 vendor-specific parameters

8044	   IANA SHALL create a new name space of "MRCPv2 vendor-specific
8045	   parameters".  All maintenance within and additions to the contents of
8046	   this name space MUST be according to the "Hierarchical Allocation"
8047	   registration policy as follows.  Each name (corresponding to the
8048	   "vendor-av-pair-name" ABNF production) MUST satisfy the syntax
8049	   requirements of Internet Domain Names as described in section 2.3.1
8050	   of RFC 1035 [RFC1035] (and as updated or obsoleted by successive
8051	   RFCs), with one exception, the order of the domain names is reversed.
8052	   For example, a vendor-specific parameter "foo" by example.com would
8053	   have the form "com.example.foo".  The first, or top-level domain, is
8054	   restricted to exactly the set of Top-Level Internet Domains defined
8055	   by IANA and will be updated by IANA when and only when that set
8056	   changes.  The second-level and all subdomains within the parameter
8057	   name MUST be allocated according to the "First Come First Served"
8058	   policy.  It is RECOMMENDED that assignment requests adhere to the
8059	   existing allocations of Internet domain names to organizations,
8060	   institutions, corporations, etc.

8062	   The registry contains a list of vendor-registered parameters, where
8063	   each defined parameter is associated with a contact person and
8064	   includes an optional reference to the definition of the parameter,
8065	   preferably an RFC.  The registry is initially empty.

8067	13.2.  NLSML-related registrations

8069	13.2.1.  application/nlsml+xml Media Type registration

8071	   IANA is requested to register the following Media Type according to
8072	   the process defined in RFC 4288 [RFC4288].
8073	   To:  ietf-types@iana.org
8074	   Subject:  Registration of media type application/nlsml+xml
8075	   MIME media type name:  application
8076	   MIME subtype name:  nlsml+xml
8077	   Required parameters:  none
8078	   Optional parameters:
8079	      charset:  All of the considerations described in RFC 3023
8080	         [RFC3023] also apply to the application/nlsml+xml media type.
8081	   Encoding considerations:  All of the considerations described in RFC
8082	      3023 also apply to the application/nlsml+xml media type.

8084	   Security considerations:  As with HTML, NLSML documents contain links
8085	      to other data stores (grammars, verifier resources, etc.).  Unlike
8086	      HTML, however, the data stores are not treated as media to be
8087	      rendered.  Nevertheless, linked files may themselves have security
8088	      considerations, which would be those of the individual registered
8089	      types.  Additionally, this media type has all of the security
8090	      considerations described in RFC 3023.
8091	   Interoperability considerations:  Although an NLSML document is
8092	      itself a complete XML document, for a fuller interpretation of the
8093	      content a receiver of an NLSML document may wish to access
8094	      resources linked to by the document.  The inability of an NLSML
8095	      processor to access or process such linked resources could result
8096	      in different behavior by the ultimate consumer of the data.
8097	   Published specification:  RFCXXXX
8098	   Applications which use this media type:  MRCPv2 clients and servers
8099	   Additional information:  none
8100	   Magic number(s):  There is no single initial octet sequence that is
8101	      always present for NLSML files.
8102	   Person & email address to contact for further information:  Sarvi
8103	      Shanmugham, sarvi@cisco.com
8104	   Intended usage:  This media type is expected to be used only in
8105	      conjunction with MRCPv2.

8107	13.3.  NLSML XML Schema registration

8109	   IANA is requested to register and maintain the following XML Schema.
8110	   Information provided follows the template in RFC 3688 [RFC3688].
8111	   XML element type:  schema
8112	   URI:  urn:ietf:params:xml:schema:nlsml
8113	   Registrant Contact:  IESG
8114	   XML:  See Section 16.1.

8116	13.4.  MRCPv2 XML Namespace registration

8118	   IANA is requested to register and maintain the following XML Name
8119	   space.  Information provided follows the template in RFC 3688
8120	   [RFC3688].
8121	   XML element type:  ns
8122	   URI:  urn:ietf:params:xml:ns:mrcpv2
8123	   Registrant Contact:  IESG
8124	   XML:  RFCXXXX

8126	13.5.  text Media Type Registrations

8128	   IANA is requested to register the following text Media Types
8129	   according to the process defined in RFC 4288 [RFC4288].

8131	13.5.1.  text/grammar-ref-list

8133	   To:  ietf-types@iana.org
8134	   Subject:  Registration of media type text/grammar-ref-list
8135	   MIME media type name:  application
8136	   MIME subtype name:  text/grammar-ref-list
8137	   Required parameters:  none
8138	   Optional parameters:  none
8139	   Encoding considerations:  Depending on the transfer protocol, a
8140	      transfer encoding may be necessary to deal with very long lines.
8141	   Security considerations:  This media type contains URIs which may
8142	      represent references to external resources.  As these resources
8143	      are assumed to be speech recognition grammars, similar
8144	      considerations as for the media types "application/srgs" and
8145	      "application/srgs+xml" apply.
8146	   Interoperability considerations;  '>' must be percent encoded in URIs
8147	      according to RFC 3986 [RFC3986].
8148	   Published specification:  The RECOGNIZE method of the MRCP protocol
8149	      performs a recognition operation that matches input against a set
8150	      of grammars.  When matching against more than one grammar, it is
8151	      sometimes necessary to use different weights for the individual
8152	      grammars.  These weights are not a property of the grammar
8153	      resource itself but qualify the reference to that grammar for the
8154	      particular recognition operation initiated by the RECOGNIZE
8155	      method.  The format of the proposed text/grammar-ref-list media
8156	      type is as follows: body = *reference where reference = "<" uri
8157	      ">" [parameters] CRLF parameters = ";" parameter *(";" parameter)
8158	      and parameter = attribute "=" value.  This specification currently
8159	      only defines a 'weight' parameter, but new parameters MAY be added
8160	      through the "Grammar Reference List Parameters" IANA registry
8161	      established through this specification.  Example:
8162	      <http://example.com/grammars/field1.gram>
8163	      <http://example.com/grammars/field2.gram>;weight="0.85"
8164	      <session:field3@form-level.store>;weight="0.9"
8165	      <http://example.com/grammars/universals.gram>;weight="0.75"
8166	   Applications which use this media type:  MRCPv2 clients and servers
8167	   Additional information:  none
8168	   Magic number(s):  none
8169	   Person & email address to contact for further information:  Sarvi
8170	      Shanmugham, sarvi@cisco.com
8171	   Intended usage:  This media type is expected to be used only in
8172	      conjunction with MRCPv2.

8174	13.6.  session URI scheme registration

8176	   IANA is requested to register the following new URI scheme.  The
8177	   information below follows the template given in RFC 4395 [RFC4395].

8179	   URI scheme name:  "session"
8180	   Status:  "Permanent"
8181	   URI scheme syntax:  The syntax of this scheme is identical to that
8182	      defined for the "cid" scheme in section 2 of RFC 2392 [RFC2392].
8183	   URI scheme semantics:  The URI is intended to identify a data
8184	      resource previously given to the network computing resource.  The
8185	      purpose of this scheme is to permit access to the specific
8186	      resource for the lifetime of the session with the entity storing
8187	      the resource.  The media type of the resource CAN vary.  There is
8188	      no explicit mechanism for communication of the media type.  This
8189	      scheme is currently widely used internally by existing
8190	      implementations, and the registration is intended to provide
8191	      information in the rare (and unfortunate) case that the scheme is
8192	      used elsewhere.  The scheme SHOULD NOT be used for open internet
8193	      protocols.
8194	   Encoding considerations:  There are no other encoding considerations
8195	      for the 'session' URIs not described in RFC 3986 [RFC3986]
8196	   Applications/protocols that use this URI scheme name:  This scheme
8197	      name is used by MRCPv2 clients and servers.
8198	   Interoperability considerations:  Note that none of the resources are
8199	      accessible after the MCRPv2 session ends, hence the name of the
8200	      scheme.  For clients who establish one MRCPv2 session only for the
8201	      entire speech application being implemented this is sufficient,
8202	      but clients who create, terminate, and recreate MRCP sessions for
8203	      performance or scalability reasons will lose access to resources
8204	      established in the earlier session(s).
8205	   Security considerations:  Generic security considerations for URIs
8206	      described in RFC 3986 [RFC3986] apply to this scheme as well.  The
8207	      URIs defined here provide an identification mechanism only.  Given
8208	      that the communication channel between client and server is
8209	      secure, that the server correctly accesses the resource associated
8210	      with the URI, and that the server ensures session-only lifetime
8211	      and access for each URI, the only additional security issues are
8212	      those of the types of media referred to by the URI.
8213	   Contact:  Sarvi Shanmugham, sarvi@cisco.com
8214	   Author/Change controller:  IESG, iesg@ietf.org
8215	   References:  This specification, particularly sections Section 6.2.7,
8216	      Section 8.5.2, Section 9.5.1, and Section 9.9.

8218	13.7.  SDP parameter registrations

8220	   IANA is requested to register the following SDP parameter values.
8221	   The information for each follows the template given in RFC 4566
8222	   [RFC4566], Appendix B.

8224	13.7.1.  sub-registry "proto"

8226	   "TCP/MRCPv2" value of the "proto" parameter
8227	   Contact name, email address and telephone number:  Sarvi Shanmugham,
8228	      sarvi@cisco.com, +1.408.902.3875
8229	   Name being registered (as it will appear in SDP):  TCP/MRCPv2
8230	   Long-form name in English:  MCRPv2 over TCP
8231	   Type of name:  proto
8232	   Explanation of name:  This name represents the MCRPv2 protocol
8233	      carried over TCP.
8234	   Reference to specification of name:  RFCXXXX

8236	   "TCP/TLS/MRCPv2" value of the "proto" parameter
8237	   Contact name, email address and telephone number:  Sarvi Shanmugham,
8238	      sarvi@cisco.com, +1.408.902.3875
8239	   Name being registered (as it will appear in SDP):  TCP/TLS/MRCPv2
8240	   Long-form name in English:  MCRPv2 over TLS over TCP
8241	   Type of name:  proto
8242	   Explanation of name:  This name represents the MCRPv2 protocol
8243	      carried over TLS over TCP.
8244	   Reference to specification of name:  RFCXXXX

8246	13.7.2.  sub-registry "att-field (media-level)"

8248	   "resource" value of the "att-field" parameter
8249	   Contact name, email address and telephone number:  Sarvi Shanmugham,
8250	      sarvi@cisco.com, +1.408.902.3875
8251	   Attribute name (as it will appear in SDP):  resource
8252	   Long-form attribute name in English:  MRCPv2 resource type
8253	   Type of attribute:  media-level
8254	   Subject to charset attribute?  no
8255	   Explanation of attribute:  See Section 4.2 of RFCXXXX for description
8256	      and examples.
8257	   Specification of appropriate attribute values:  See section
8258	      Section 13.1.1 of RFCXXXX.

8260	   "channel" value of the "att-field" parameter
8261	   Contact name, email address and telephone number:  Sarvi Shanmugham,
8262	      sarvi@cisco.com, +1.408.902.3875
8263	   Attribute name (as it will appear in SDP):  channel
8264	   Long-form attribute name in English:  MRCPv2 resource channel
8265	      identifier
8266	   Type of attribute:  media-level
8267	   Subject to charset attribute?  no
8268	   Explanation of attribute:  See Section 4.2 of RFCXXXX for description
8269	      and examples.
8270	   Specification of appropriate attribute values  See Section 4.2 and
8271	      the "channel-id" ABNF production rules of RFCXXXX.

8273	   "cmid" value of the "att-field" parameter
8274	   Contact name, email address and telephone number:  Sarvi Shanmugham,
8275	      sarvi@cisco.com, +1.408.902.3875
8276	   Attribute name (as it will appear in SDP):  cmid
8277	   Long-form attribute name in English:  MRCPv2 resource channel media
8278	      identifier
8279	   Type of attribute:  media-level
8280	   Subject to charset attribute?  no
8281	   Explanation of attribute:  See Section 4.4 of RFCXXXX for description
8282	      and examples.
8283	   Specification of appropriate attribute values  See Section 4.4 and
8284	      the "cmid-attribute" ABNF production rules of RFCXXXX.

8286	14.  Examples

8288	14.1.  Message Flow

8290	   The following is an example of a typical MRCPv2 session of speech
8291	   synthesis and recognition between a client and a server.  Although
8292	   the SDP "s" attribute in these examples has a text description value
8293	   to assist in understanding the examples, please keep in mind that RFC
8294	   3264 [RFC3264] recommends that messages actually put on the wire use
8295	   a space or a dash.

8297	   The figure below illustrates opening a session to the MRCPv2 server.
8298	   This exchange does not allocate a resource or setup media.  It simply
8299	   establishes a SIP session with the MRCPv2 server.

8301	   C->S:
8302	          INVITE sip:mresources@example.com SIP/2.0
8303	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8304	           branch=z9hG4bK74bg1
8305	          Max-Forwards:6
8306	          To:MediaServer <sip:mresources@example.com>
8307	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8308	          Call-ID:a84b4c76e66710
8309	          CSeq:323123 INVITE
8310	          Contact:<sip:sarvi@client.example.com>
8311	          Content-Type:application/sdp
8312	          Content-Length:...

8314	          v=0
8315	          o=sarvi 2614933546 2614933546 IN IP4 192.0.2.12
8316	          s=Set up MRCPv2 control and audio
8317	          i=Initial contact
8318	          c=IN IP4 192.0.2.12

8320	   S->C:
8321	          SIP/2.0 200 OK
8322	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8323	           branch=z9hG4bK74bg1;received=192.0.32.10
8324	          To:MediaServer <sip:mresources@example.com>;tag=62784
8325	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8326	          Call-ID:a84b4c76e66710
8327	          CSeq:323123 INVITE
8328	          Contact:<sip:mresources@server.example.com>
8329	          Content-Type:application/sdp
8330	          Content-Length:...

8332	          v=0
8333	          o=- 3000000001 3000000001 IN IP4 192.0.2.11
8334	          s=Set up MRCPv2 control and audio
8335	          i=Initial contact
8336	          c=IN IP4 192.0.2.11

8338	   C->S:
8339	          ACK sip:mresources@server.example.com SIP/2.0
8340	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8341	           branch=z9hG4bK74bg2
8342	          Max-Forwards:6
8343	          To:MediaServer <sip:mresources@example.com>;tag=62784
8344	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8345	          Call-ID:a84b4c76e66710
8346	          CSeq:323123 ACK
8347	          Content-Length:0

8349	   The client requests the server to create a synthesizer resource
8350	   control channel to do speech synthesis.  This also adds a media
8351	   stream to send the generated speech.  Note that in this example, the
8352	   client requests a new MRCPv2 TCP stream between the client and the
8353	   server.  In the following requests, the client will ask to use the
8354	   existing connection.

8356	   C->S:
8357	          INVITE sip:mresources@server.example.com SIP/2.0
8358	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8359	           branch=z9hG4bK74bg3
8360	          Max-Forwards:6
8361	          To:MediaServer <sip:mresources@example.com>;tag=62784
8362	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8363	          Call-ID:a84b4c76e66710
8364	          CSeq:323124 INVITE
8365	          Contact:<sip:sarvi@client.example.com>
8366	          Content-Type:application/sdp
8367	          Content-Length:...

8369	          v=0
8370	          o=sarvi 2614933546 2614933547 IN IP4 192.0.2.12
8371	          s=Set up MRCPv2 control and audio
8372	          i=Add TCP channel, synthesizer and one-way audio
8373	          c=IN IP4 192.0.2.12
8374	          t=0 0
8375	          m=application 9  TCP/MRCPv2 1
8376	          a=setup:active
8377	          a=connection:new
8378	          a=resource:speechsynth
8379	          a=cmid:1
8380	          m=audio 49170 RTP/AVP 0 96
8381	          a=rtpmap:0 pcmu/8000
8382	          a=rtpmap:96 telephone-event/8000
8383	          a=fmtp:96 0-15
8384	          a=recvonly
8385	          a=mid:1

8387	   S->C:
8388	          SIP/2.0 200 OK
8389	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8390	           branch=z9hG4bK74bg3;received=192.0.32.10
8391	          To:MediaServer <sip:mresources@example.com>;tag=62784
8392	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8393	          Call-ID:a84b4c76e66710
8394	          CSeq:323124 INVITE
8395	          Contact:<sip:mresources@server.example.com>
8396	          Content-Type:application/sdp
8397	          Content-Length:...

8399	          v=0
8400	          o=- 3000000001 3000000002 IN IP4 192.0.2.11
8401	          s=Set up MRCPv2 control and audio
8402	          i=Add TCP channel, synthesizer and one-way audio
8403	          c=IN IP4 192.0.2.11
8404	          t=0 0
8405	          m=application 32416  TCP/MRCPv2 1
8406	          a=setup:passive
8407	          a=connection:new
8408	          a=channel:32AECB23433801@speechsynth
8409	          a=cmid:1
8410	          m=audio 48260 RTP/AVP 0
8411	          a=rtpmap:0 pcmu/8000
8412	          a=sendonly
8413	          a=mid:1

8415	   C->S:
8416	          ACK sip:mresources@server.example.com SIP/2.0
8417	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8418	           branch=z9hG4bK74bg4
8419	          Max-Forwards:6
8420	          To:MediaServer <sip:mresources@example.com>;tag=62784
8421	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8422	          Call-ID:a84b4c76e66710
8423	          CSeq:323124 ACK
8424	          Content-Length:0

8426	   This exchange allocates an additional resource control channel for a
8427	   recognizer.  Since a recognizer would need to receive an audio stream
8428	   for recognition, this interaction also updates the audio stream to
8429	   sendrecv, making it a 2-way audio stream.

8431	   C->S:
8432	          INVITE sip:mresources@server.example.com SIP/2.0
8433	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8434	           branch=z9hG4bK74bg5
8435	          Max-Forwards:6
8436	          To:MediaServer <sip:mresources@example.com>;tag=62784
8437	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8438	          Call-ID:a84b4c76e66710
8439	          CSeq:323125 INVITE
8440	          Contact:<sip:sarvi@client.example.com>
8441	          Content-Type:application/sdp
8442	          Content-Length:...

8444	          v=0
8445	          o=sarvi 2614933546 2614933548 IN IP4 192.0.2.12
8446	          s=Set up MRCPv2 control and audio
8447	          i=Add recognizer and duplex the audio
8448	          c=IN IP4 192.0.2.12
8449	          t=0 0
8450	          m=application 9  TCP/MRCPv2 1
8451	          a=setup:active
8452	          a=connection:existing
8453	          a=resource:speechsynth
8454	          a=cmid:1
8455	          m=audio 49170 RTP/AVP 0 96
8456	          a=rtpmap:0 pcmu/8000
8457	          a=rtpmap:96 telephone-event/8000
8458	          a=fmtp:96 0-15
8459	          a=recvonly
8460	          a=mid:1
8461	          m=application 9  TCP/MRCPv2 1
8462	          a=setup:active
8463	          a=connection:existing
8464	          a=resource:speechrecog
8465	          a=cmid:2
8466	          m=audio 49180 RTP/AVP 0 96
8467	          a=rtpmap:0 pcmu/8000
8468	          a=rtpmap:96 telephone-event/8000
8469	          a=fmtp:96 0-15
8470	          a=sendonly
8471	          a=mid:2

8473	   S->C:
8474	          SIP/2.0 200 OK
8475	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8476	           branch=z9hG4bK74bg5;received=192.0.32.10
8477	          To:MediaServer <sip:mresources@example.com>;tag=62784
8478	          From:sarvi <sip:sarvi@example.com>;tag=1928301774
8479	          Call-ID:a84b4c76e66710
8480	          CSeq:323125 INVITE
8481	          Contact:<sip:mresources@server.example.com>
8482	          Content-Type:application/sdp
8483	          Content-Length:...

8485	          v=0
8486	          o=- 3000000001 3000000003 IN IP4 192.0.2.11
8487	          s=Set up MRCPv2 control and audio
8488	          i=Add recognizer and duplex the audio
8489	          c=IN IP4 192.0.2.11
8490	          t=0 0
8491	          m=application 32416  TCP/MRCPv2 1
8492	          a=channel:32AECB23433801@speechsynth
8493	          a=cmid:1
8494	          m=audio 48260 RTP/AVP 0
8495	          a=rtpmap:0 pcmu/8000
8496	          a=sendonly
8497	          a=mid:1
8498	          m=application 32416  TCP/MRCPv2 1
8499	          a=channel:32AECB23433801@speechrecog
8500	          a=cmid:2
8501	          m=audio 48260 RTP/AVP 0
8502	          a=rtpmap:0 pcmu/8000
8503	          a=rtpmap:96 telephone-event/8000
8504	          a=fmtp:96 0-15
8505	          a=recvonly
8506	          a=mid:2

8508	   C->S:
8509	          ACK sip:mresources@server.example.com SIP/2.0
8510	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8511	           branch=z9hG4bK74bg6
8512	          Max-Forwards:6
8513	          To:MediaServer <sip:mresources@example.com>;tag=62784
8514	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8515	          Call-ID:a84b4c76e66710
8516	          CSeq:323125 ACK
8517	          Content-Length:0

8519	   A MRCPv2 SPEAK request initiates speech.

8521	   C->S:
8522	          MRCP/2.0 ... SPEAK 543257
8523	          Channel-Identifier:32AECB23433801@speechsynth
8524	          Kill-On-Barge-In:false
8525	          Voice-gender:neutral
8526	          Voice-age:25
8527	          Prosody-volume:medium
8528	          Content-Type:application/ssml+xml
8529	          Content-Length:...

8531	          <?xml version="1.0"?>
8532	          <speak version="1.0"
8533	                 xmlns="http://www.w3.org/2001/10/synthesis"
8534	                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
8535	                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
8536	                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
8537	                 xml:lang="en-US">
8538	            <p>
8539	              <s>You have 4 new messages.</s>
8540	              <s>The first is from Stephanie Williams
8541	                <mark name="Stephanie"/>
8542	                and arrived at <break/>
8543	                <say-as interpret-as="vxml:time">0345p</say-as>.</s>
8544	              <s>The subject is <prosody
8545	                 rate="-20%">ski trip</prosody></s>
8546	            </p>
8547	          </speak>

8549	   S->C:
8550	          MRCP/2.0 ... 543257 200 IN-PROGRESS
8551	          Channel-Identifier:32AECB23433801@speechsynth
8552	          Speech-Marker:timestamp=857205015059

8554	   The synthesizer hits the special marker in the message to be spoken
8555	   and faithfully informs the client of the event.

8557	   S->C:  MRCP/2.0 ... SPEECH-MARKER 543257 IN-PROGRESS
8558	          Channel-Identifier:32AECB23433801@speechsynth
8559	          Speech-Marker:timestamp=857206027059;Stephanie

8561	   The synthesizer finishes with the SPEAK request.

8563	   S->C:  MRCP/2.0 ... SPEAK-COMPLETE 543257 COMPLETE
8564	          Channel-Identifier:32AECB23433801@speechsynth
8565	          Speech-Marker:timestamp=857207685213;Stephanie

8567	   The recognizer is issued a request to listen for the customer
8568	   choices.

8570	   C->S:  MRCP/2.0 ... RECOGNIZE 543258
8571	          Channel-Identifier:32AECB23433801@speechrecog
8572	          Content-Type:application/srgs+xml
8573	          Content-Length:...

8575	          <?xml version="1.0"?>
8576	          <!-- the default grammar language is US English -->
8577	          <grammar xmlns="http://www.w3.org/2001/06/grammar"
8578	                   xml:lang="en-US" version="1.0" root="request">
8579	          <!-- single language attachment to a rule expansion -->
8580	            <rule id="request">
8581	              Can I speak to
8582	              <one-of xml:lang="fr-CA">
8583	                <item>Michel Tremblay</item>
8584	                <item>Andre Roy</item>
8585	              </one-of>
8586	            </rule>
8587	          </grammar>

8589	   S->C:  MRCP/2.0 ... 543258 200 IN-PROGRESS
8590	          Channel-Identifier:32AECB23433801@speechrecog

8592	   The client issues the next MRCPv2 SPEAK method.

8594	   C->S:  MRCP/2.0 ... SPEAK 543259
8595	          Channel-Identifier:32AECB23433801@speechsynth
8596	          Kill-On-Barge-In:true
8597	          Content-Type:application/ssml+xml
8598	          Content-Length:...

8600	          <?xml version="1.0"?>
8601	          <speak version="1.0"
8602	                 xmlns="http://www.w3.org/2001/10/synthesis"
8603	                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
8604	                 xsi:schemaLocation="http://www.w3.org/2001/10/synthesis
8605	                 http://www.w3.org/TR/speech-synthesis/synthesis.xsd"
8606	                 xml:lang="en-US">
8607	            <p>
8608	              <s>Welcome to ABC corporation.</s>
8609	              <s>Who would you like Talk to.</s>
8610	            </p>
8611	          </speak>

8613	   S->C:  MRCP/2.0 ... 543259 200 IN-PROGRESS
8614	          Channel-Identifier:32AECB23433801@speechsynth
8615	          Speech-Marker:timestamp=857207696314

8617	   This next section of this ongoing example demonstrates how kill-on-
8618	   barge-in support works.  Since this last SPEAK request had Kill-On-
8619	   Barge-In set to "true", when the recognizer (the server) generated
8620	   the START-OF-INPUT event while a SPEAK was active the client
8621	   immediately issued a BARGE-IN-OCCURRED method to the synthesizer
8622	   resource.  The speech synthesizer then terminated playback and
8623	   notified the client.  The completion-cause code provided the
8624	   indication that this was a kill-on-barge-in interruption rather than
8625	   a normal completion.

8627	   Note that since the recognition and synthesizer resources are in the
8628	   same session on the same server, to obtain a faster response the
8629	   server might have internally relayed the start-of-input condition to
8630	   the synthesizer directly, before receiving the expected BARGE-IN-
8631	   OCCURRED event.  However, any such communication is outside the scope
8632	   of the MRCPv2 protocol.

8634	   S->C:  MRCP/2.0 ... START-OF-INPUT 543258 IN-PROGRESS
8635	          Channel-Identifier:32AECB23433801@speechrecog
8636	          Proxy-Sync-Id:987654321

8638	   C->S:  MRCP/2.0 ... BARGE-IN-OCCURRED 543259
8639	          Channel-Identifier:32AECB23433801@speechsynth
8640	          Proxy-Sync-Id:987654321

8642	   S->C:  MRCP/2.0 ... 543259 200 COMPLETE
8643	          Channel-Identifier:32AECB23433801@speechsynth
8644	          Active-Request-Id-List:543258
8645	          Speech-Marker:timestamp=857206096314

8647	   S->C:  MRCP/2.0 ... SPEAK-COMPLETE 543259 COMPLETE
8648	          Channel-Identifier:32AECB23433801@speechsynth
8649	          Completion-Cause:001 barge-in
8650	          Speech-Marker:timestamp=857207685213

8652	   The recognition resource matched the spoken stream to a grammar and
8653	   generated results.  The result of the recognition is returned by the
8654	   server as part of the RECOGNITION-COMPLETE event.

8656	   S->C:  MRCP/2.0 ... RECOGNITION-COMPLETE 543258 COMPLETE
8657	          Channel-Identifier:32AECB23433801@speechrecog
8658	          Completion-Cause:000 success
8659	          Waveform-URI:<http://web.media.com/session123/audio.wav>;
8660	                       size=423523;duration=25432
8661	          Content-Type:application/nlsml+xml
8662	          Content-Length:...

8664	          <?xml version="1.0"?>
8665	          <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
8666	                  xmlns:ex="http://www.example.com/example"
8667	                  grammar="session:request1@form-level.store">
8668	              <interpretation>
8669	                  <instance name="Person">
8670	                      <ex:Person>
8671	                          <ex:Name> Andre Roy </ex:Name>
8672	                      </ex:Person>
8673	                  </instance>
8674	                  <input>   may I speak to Andre Roy </input>
8675	              </interpretation>
8676	          </result>

8678	   Since the client was now finished with the session, including all
8679	   resources, it issued a SIP BYE request to close the SIP session.
8680	   This caused all control channels and resources allocated under the
8681	   session to be de-allocated.

8683	   C->S:  BYE sip:mresources@server.example.com SIP/2.0
8684	          Via:SIP/2.0/TCP client.atlanta.example.com:5060;
8685	           branch=z9hG4bK74bg7
8686	          Max-Forwards:6
8687	          From:Sarvi <sip:sarvi@example.com>;tag=1928301774
8688	          To:MediaServer <sip:mresources@example.com>;tag=62784
8689	          Call-ID:a84b4c76e66710
8690	          CSeq:323126 BYE
8691	          Content-Length:0

8693	14.2.  Recognition Result Examples

8695	14.2.1.  Simple ASR Ambiguity

8697	   System: To which city will you be traveling?
8698	   User:   I want to go to Pittsburgh.

8700	   <?xml version="1.0"?>
8701	   <result xmlns="urn:ietf:params:xml:ns:mrcpv2"
8702	           xmlns:ex="http://www.example.com/example"
8703	           grammar="http://www.example.com/flight">
8704	     <interpretation confidence="0.6">
8705	        <instance>
8706	           <ex:airline>
8707	              <ex:to_city>Pittsburgh</ex:to_city>
8708	           <ex:airline>
8709	        <instance>
8710	        <input mode="speech">
8711	           I want to go to Pittsburgh
8712	        </input>
8713	     </interpretation>
8714	     <interpretation confidence="0.4"
8715	        <instance>
8716	           <ex:airline>
8717	              <ex:to_city>Stockholm</ex:to_city>
8718	           </ex:airline>
8719	        </instance>
8720	        <input>I want to go to Stockholm</input>
8721	     </interpretation>
8722	   </result>

8724	14.2.2.  Mixed Initiative

8726	   System: What would you like?
8727	   User:   I would like 2 pizzas, one with pepperoni and cheese,
8728	           one with sausage and a bottle of coke, to go.

8730	   This example includes an order object which in turn contains objects
8731	   named "food_item", "drink_item" and "delivery_method".  The
8732	   representation assumes there are no ambiguities in the speech or
8733	   natural language processing.  Note that this representation also
8734	   assumes some level of intra-sentential anaphora resolution, i.e., to
8735	   resolve the two "one's" as "pizza".

8737	   <?xml version="1.0"?>
8738	   <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2"
8739	              xmlns="http://www.example.com/example"
8740	              grammar="http://www.example.com/foodorder">
8741	     <nl:interpretation confidence="1.0" >
8742	        <nl:instance>
8743	         <order>
8744	           <food_item confidence="1.0">
8745	             <pizza>
8746	               <ingredients confidence="1.0">
8747	                 pepperoni
8748	               </ingredients>
8749	               <ingredients confidence="1.0">
8750	                 cheese
8751	               </ingredients>
8752	             </pizza>
8753	             <pizza>
8754	               <ingredients>sausage</ingredients>
8755	             </pizza>
8756	           </food_item>
8757	           <drink_item confidence="1.0">
8758	             <size>2-liter</size>
8759	           </drink_item>
8760	           <delivery_method>to go</delivery_method>
8761	         </order>
8762	       </nl:instance>
8763	       <nl:input mode="speech">I would like 2 pizzas,
8764	            one with pepperoni and cheese, one with sausage
8765	            and a bottle of coke, to go.
8766	       </nl:input>
8767	     </nl:interpretation>
8768	   </nl:result>

8770	14.2.3.  DTMF Input

8772	   A combination of DTMF input and speech is represented using nested
8773	   input elements.  For example:
8774	   User: My pin is (dtmf 1 2 3 4)

8776	   <input>
8777	     <input mode="speech" confidence ="1.0"
8778	        timestamp-start="2000-04-03T0:00:00"
8779	        timestamp-end="2000-04-03T0:00:01.5">My pin is
8780	     </input>
8781	     <input mode="dtmf" confidence ="1.0"
8782	        timestamp-start="2000-04-03T0:00:01.5"
8783	        timestamp-end="2000-04-03T0:00:02.0">1 2 3 4
8784	     </input>
8785	   </input>

8787	   Note that grammars that recognize mixtures of speech and DTMF are not
8788	   currently possible in SRGS; however, this representation might be
8789	   needed for other applications of NLSML, and this mixture capability
8790	   might be introduced in future versions of SRGS.

8792	14.2.4.  Interpreting Meta-Dialog and Meta-Task Utterances

8794	   Natural language communication makes use of meta-dialog and meta-task
8795	   utterances.  This specification is flexible enough so that meta
8796	   utterances can be represented on an application-specific basis
8797	   without requiring other standard markup.

8799	   Here are two examples of how meta-task and meta-dialog utterances
8800	   might be represented.

8802	System: What toppings do you want on your pizza?
8803	User:   What toppings do you have?

8805	<interpretation grammar="http://www.example.com/toppings">
8806	   <instance>
8807	      <question>
8808	         <questioned_item>toppings<questioned_item>
8809	         <questioned_property>
8810	          availability
8811	         </questioned_property>
8812	      </question>
8813	   </instance>
8814	   <input mode="speech">
8815	     what toppings do you have?
8816	   </input>
8817	</interpretation>

8819	User:   slow down.

8821	<interpretation grammar="http://www.example.com/generalCommandsGrammar">
8822	   <instance>
8823	    <command>
8824	       <action>reduce speech rate</action>
8825	       <doer>system</doer>
8826	    </command>
8827	   </instance>
8828	  <input mode="speech">slow down</input>
8829	</interpretation>

8831	14.2.5.  Anaphora and Deixis

8833	   This specification can be used on an application-specific basis to
8834	   represent utterances that contain unresolved anaphoric and deictic
8835	   references.  Anaphoric references, which include pronouns and
8836	   definite noun phrases that refer to something that was mentioned in
8837	   the preceding linguistic context, and deictic references, which refer
8838	   to something that is present in the non-linguistic context, present
8839	   similar problems in that there may not be sufficient unambiguous
8840	   linguistic context to determine what their exact role in the
8841	   interpretation should be.  In order to represent unresolved anaphora
8842	   and deixis using this specification, one strategy would be for the
8843	   developer to define a more surface-oriented representation that
8844	   leaves the specific details of the interpretation of the reference
8845	   open.  (This assumes that a later component is responsible for
8846	   actually resolving the reference).

8848	   Example: (ignoring the issue of representing the input from the
8849	             pointing gesture.)

8851	   System: What do you want to drink?
8852	   User:   I want this (clicks on picture of large root beer.)

8854	   <?xml version="1.0"?>
8855	   <nl:result xmlns:nl="urn:ietf:params:xml:ns:mrcpv2"
8856	           xmlns="http://www.example.com/example"
8857	           grammar="http://www.example.com/beverages.grxml">
8858	      <nl:interpretation>
8859	         <nl:instance>
8860	          <doer>I</doer>
8861	          <action>want</action>
8862	          <object>this</object>
8863	         </nl:instance>
8864	         <nl:input mode="speech">I want this</nl:input>
8865	      </nl:interpretation>
8866	   </nl:result>

8868	14.2.6.  Distinguishing Individual Items from Sets with One Member

8870	   For programming convenience, it is useful to be able to distinguish
8871	   between individual items and sets containing one item in the XML
8872	   representation of semantic results.  For example, a pizza order might
8873	   consist of exactly one pizza, but a pizza might contain zero or more
8874	   toppings.  Since there is no standard way of marking this distinction
8875	   directly in XML, in the current framework, the developer is free to
8876	   adopt any conventions that would convey this information in the XML
8877	   markup.  One strategy would be for the developer to wrap the set of
8878	   items in a grouping element, as in the following example.

8880	   <order>
8881	      <pizza>
8882	         <topping-group>
8883	            <topping>mushrooms</topping>
8884	         </topping-group>
8885	      </pizza>
8886	      <drink>coke</drink>
8887	   </order>

8889	   In this example, the programmer can assume that there is supposed to
8890	   be exactly one pizza and one drink in the order, but the fact that
8891	   there is only one topping is an accident of this particular pizza
8892	   order.

8894	   Note that the client controls both the grammar and the semantics to
8895	   be returned upon grammar matches, so the user of the MRCPv2 protocol
8896	   is fully empowered to cause results to be returned in NLSML in such a
8897	   way that the interpretation is clear to that user.

8899	14.2.7.  Extensibility

8901	   Extensibility in NLSML is provided via result content flexibility, as
8902	   discussed in the discussions of meta utterances and anaphora.  NLSML
8903	   can easily be used in sophisticated systems to convey application-
8904	   specific information that more basic systems would not make use of,
8905	   for example defining speech acts.

8907	15.  ABNF Normative Definition

8909	   The following productions make use of the core rules defined in
8910	   Section 6.1 of RFC 5234 [RFC5234].

8912	LWS    =    [*WSP CRLF] 1*WSP ; linear whitespace

8914	SWS    =    [LWS] ; sep whitespace

8916	UTF8-NONASCII    =    %xC0-DF 1UTF8-CONT
8917	                 /    %xE0-EF 2UTF8-CONT
8918	                 /    %xF0-F7 3UTF8-CONT
8919	                 /    %xF8-FB 4UTF8-CONT
8920	                 /    %xFC-FD 5UTF8-CONT

8922	UTF8-CONT        =    %x80-BF
8923	UTFCHAR          =    %x21-7E
8924	                 /    UTF8-NONASCII
8925	param            =    *pchar

8927	quoted-string    =    SWS DQUOTE *(qdtext / quoted-pair )
8928	                      DQUOTE

8930	qdtext           =    LWS / %x21 / %x23-5B / %x5D-7E
8931	                 /    UTF8-NONASCII

8933	quoted-pair      =    "\" (%x00-09 / %x0B-0C / %x0E-7F)

8935	token            =    1*(alphanum / "-" / "." / "!" / "%" / "*"
8936	                      / "_" / "+" / "`" / "'" / "~" )

8938	reserved         =    ";" / "/" / "?" / ":" / "@" / "&" / "="
8939	                      / "+" / "$" / ","

8941	mark             =    "-" / "_" / "." / "!" / "~" / "*" / "'"
8942	                 /    "(" / ")"

8944	unreserved       =    alphanum / mark

8946	pchar            =    unreserved / escaped
8947	                 /    ":" / "@" / "&" / "=" / "+" / "$" / ","

8949	alphanum         =    ALPHA / DIGIT

8951	BOOLEAN          =    "true" / "false"

8953	FLOAT            =    *DIGIT ["." *DIGIT]

8955	escaped          =    "%" HEXDIG HEXDIG

8957	fragment         =    *uric

8959	uri              =    [ absoluteURI / relativeURI ]
8960	                      [ "#" fragment ]

8962	absoluteURI      =    scheme ":" ( hier-part / opaque-part )

8964	relativeURI      =    ( net-path / abs-path / rel-path )
8965	                      [ "?" query ]

8967	hier-part        =    ( net-path / abs-path ) [ "?" query ]

8969	net-path         =    "//" authority [ abs-path ]

8971	abs-path         =    "/" path-segments

8973	rel-path         =    rel-segment [ abs-path ]

8975	rel-segment      =    1*( unreserved / escaped / ";" / "@"
8976	                 /    "&" / "=" / "+" / "$" / "," )

8978	opaque-part      =    uric-no-slash *uric

8980	uric             =    reserved / unreserved / escaped

8982	uric-no-slash    =    unreserved / escaped / ";" / "?" / ":"
8983	                      / "@" / "&" / "=" / "+" / "$" / ","

8985	path-segments    =    segment *( "/" segment )

8987	segment          =    *pchar *( ";" param )

8989	scheme           =    ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

8991	authority        =    srvr / reg-name
8992	srvr             =    [ [ userinfo "@" ] hostport ]

8994	reg-name         =    1*( unreserved / escaped / "$" / ","
8995	                 /     ";" / ":" / "@" / "&" / "=" / "+" )

8997	query            =    *uric

8999	userinfo         =    ( user ) [ ":" password ] "@"

9001	user             =    1*( unreserved / escaped
9002	                 /    user-unreserved )

9004	user-unreserved  =    "&" / "=" / "+" / "$" / "," / ";"
9005	                 /    "?" / "/"

9007	password         =    *( unreserved / escaped
9008	                 /    "&" / "=" / "+" / "$" / "," )

9010	hostport         =    host [ ":" port ]

9012	host             =    hostname / IPv4address / IPv6reference

9014	hostname         =    *( domainlabel "." ) toplabel [ "." ]

9016	domainlabel      =    alphanum / alphanum *( alphanum / "-" )
9017	                      alphanum

9019	toplabel         =    ALPHA / ALPHA *( alphanum / "-" )
9020	                      alphanum

9022	IPv4address      =    1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "."
9023	                      1*3DIGIT

9025	IPv6reference    =    "[" IPv6address "]"

9027	IPv6address      =    hexpart [ ":" IPv4address ]

9029	hexpart          =    hexseq / hexseq "::" [ hexseq ] / "::"
9030	                      [ hexseq ]

9032	hexseq           =    hex4 *( ":" hex4)

9034	hex4             =    1*4HEXDIG

9036	port             =    1*19DIGIT

9038	; generic-message is the top-level rule
9039	generic-message  =    start-line message-header CRLF
9040	                      [ message-body ]

9042	message-body     =    *OCTET

9044	start-line       =    request-line / response-line / event-line

9046	request-line     =    mrcp-version SP message-length SP method-name
9047	                      SP request-id CRLF

9049	response-line    =    mrcp-version SP message-length SP request-id
9050	                      SP status-code SP request-state CRLF

9052	event-line       =    mrcp-version SP message-length SP event-name
9053	                      SP request-id SP request-state CRLF

9055	method-name      =    generic-method
9056	                 /    synthesizer-method
9057	                 /    recognizer-method
9058	                 /    recorder-method
9059	                 /    verifier-method

9061	generic-method   =    "SET-PARAMS"
9062	                 /    "GET-PARAMS"

9064	request-state    =    "COMPLETE"
9065	                 /    "IN-PROGRESS"
9066	                 /    "PENDING"

9068	event-name       =    synthesizer-event
9069	                 /    recognizer-event
9070	                 /    recorder-event
9071	                 /    verifier-event

9073	message-header   =  1*(generic-header / resource-header / generic-field)

9075	generic-field    =    field-name ":" [ field-value ]
9076	field-name       =    token
9077	field-value      =    *LWS field-content *( CRLF 1*LWS field-content)
9078	field-content    =    <the OCTETs making up the field-value
9079	                      and consisting of either *TEXT or combinations
9080	                      of token, separators, and quoted-string>

9082	resource-header  =    synthesizer-header
9083	                 /    recognizer-header
9084	                 /    recorder-header
9085	                 /    verifier-header

9087	generic-header   =    channel-identifier
9088	                 /    accept
9089	                 /    active-request-id-list
9090	                 /    proxy-sync-id
9091	                 /    accept-charset
9092	                 /    content-type
9093	                 /    content-id
9094	                 /    content-base
9095	                 /    content-encoding
9096	                 /    content-location
9097	                 /    content-length
9098	                 /    fetch-timeout
9099	                 /    cache-control
9100	                 /    logging-tag
9101	                 /    set-cookie
9102	                 /    vendor-specific

9104	; -- content-id is as defined in RFC2392, RFC2046 and RFC5322
9105	; -- accept and accept-charset are as defined in RFC2616

9107	mrcp-version     =    "MRCP" "/" 1*2DIGIT "." 1*2DIGIT

9109	message-length   =    1*19DIGIT

9111	request-id       =    1*10DIGIT

9113	status-code      =    3DIGIT

9115	channel-identifier =  "Channel-Identifier" ":"
9116	                      channel-id CRLF

9118	channel-id       =    1*alphanum "@" 1*alphanum

9120	active-request-id-list = "Active-Request-Id-List" ":"
9121	                         request-id *("," request-id) CRLF

9123	proxy-sync-id    =    "Proxy-Sync-Id" ":" 1*VCHAR CRLF

9125	content-base     =    "Content-Base" ":" absoluteURI CRLF

9127	content-length   =    "Content-Length" ":" 1*19DIGIT CRLF

9129	content-type     =    "Content-Type" ":" media-type-value CRLF

9131	media-type-value =    type "/" subtype *( ";" parameter )

9133	type             =    token
9134	subtype          =    token

9136	parameter        =    attribute "=" value

9138	attribute        =    token

9140	value            =    token / quoted-string

9142	content-encoding =    "Content-Encoding" ":"
9143	                      *WSP content-coding
9144	                      *(*WSP "," *WSP content-coding *WSP )
9145	                      CRLF

9147	content-coding   =    token

9149	content-location =    "Content-Location" ":"
9150	                      ( absoluteURI / relativeURI )  CRLF

9152	cache-control    =    "Cache-Control" ":"
9153	                      [*WSP cache-directive
9154	                      *( *WSP "," *WSP cache-directive *WSP )]
9155	                      CRLF

9157	fetch-timeout    =    "Fetch-Timeout" ":" 1*19DIGIT CRLF

9159	cache-directive  =    "max-age" "=" delta-seconds
9160	                 /    "max-stale" ["=" delta-seconds ]
9161	                 /    "min-fresh" "=" delta-seconds

9163	delta-seconds         =    1*19DIGIT

9165	logging-tag      =    "Logging-Tag" ":" 1*UTFCHAR CRLF

9167	vendor-specific  =    "Vendor-Specific-Parameters" ":"
9168	                      [vendor-specific-av-pair
9169	                      *(";" vendor-specific-av-pair)] CRLF

9171	vendor-specific-av-pair = vendor-av-pair-name "="
9172	                          value

9174	vendor-av-pair-name     = 1*UTFCHAR

9176	set-cookie        = "Set-Cookie:" SP set-cookie-string
9177	set-cookie-string = cookie-pair *( ";" SP cookie-av )
9178	cookie-pair       = cookie-name "=" cookie-value
9179	cookie-name       = token
9180	cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
9181	cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
9182	token             = <token, defined in [RFC2616], Section 2.2>

9184	cookie-av         = expires-av / max-age-av / domain-av /
9185	                     path-av / secure-av / httponly-av /
9186	                     extension-av / age-av
9187	expires-av        = "Expires=" sane-cookie-date
9188	sane-cookie-date  = <rfc1123-date, defined in [RFC2616], Section 3.3.1>
9189	max-age-av        = "Max-Age=" non-zero-digit *DIGIT
9190	non-zero-digit    = %x31-39
9191	domain-av         = "Domain=" domain-value
9192	domain-value      = <subdomain>
9193	path-av           = "Path=" path-value
9194	path-value        = <any CHAR except CTLs or ";">
9195	secure-av         = "Secure"
9196	httponly-av       = "HttpOnly"
9197	extension-av      = <any CHAR except CTLs or ";">
9198	age-av            = "Age=" delta-seconds

9200	; Synthesizer ABNF

9202	synthesizer-method    =    "SPEAK"
9203	                      /    "STOP"
9204	                      /    "PAUSE"
9205	                      /    "RESUME"
9206	                      /    "BARGE-IN-OCCURRED"
9207	                      /    "CONTROL"
9208	                      /    "DEFINE-LEXICON"

9210	synthesizer-event     =    "SPEECH-MARKER"
9211	                      /    "SPEAK-COMPLETE"

9213	synthesizer-header    =    jump-size
9214	                      /    kill-on-barge-in
9215	                      /    speaker-profile
9216	                      /    completion-cause
9217	                      /    completion-reason
9218	                      /    voice-parameter
9219	                      /    prosody-parameter
9220	                      /    speech-marker
9221	                      /    speech-language
9222	                      /    fetch-hint
9223	                      /    audio-fetch-hint
9224	                      /    failed-uri
9225	                      /    failed-uri-cause
9226	                      /    speak-restart
9227	                      /    speak-length
9228	                      /    load-lexicon
9229	                      /    lexicon-search-order

9231	jump-size             =    "Jump-Size" ":" speech-length-value CRLF

9233	speech-length-value   =    numeric-speech-length
9234	                      /    text-speech-length

9236	text-speech-length    =    1*UTFCHAR SP "Tag"

9238	numeric-speech-length =    ("+" / "-") positive-speech-length

9240	positive-speech-length =   1*19DIGIT SP numeric-speech-unit

9242	numeric-speech-unit   =    "Second"
9243	                      /    "Word"
9244	                      /    "Sentence"
9245	                      /    "Paragraph"

9247	kill-on-barge-in      =    "Kill-On-Barge-In" ":" BOOLEAN
9248	                           CRLF

9250	speaker-profile       =    "Speaker-Profile" ":" uri CRLF

9252	completion-cause         =  "Completion-Cause" ":" cause-code SP
9253	                            cause-name CRLF
9254	cause-code               =  3DIGIT
9255	cause-name               =  *VCHAR

9257	completion-reason     =    "Completion-Reason" ":"
9258	                           quoted-string CRLF

9260	voice-parameter       =    voice-gender
9261	                      /    voice-age
9262	                      /    voice-variant
9263	                      /    voice-name

9265	voice-gender          =    "Voice-Gender:" voice-gender-value CRLF

9267	voice-gender-value    =    "male"
9268	                      /    "female"
9269	                      /    "neutral"

9271	voice-age             =    "Voice-Age:" 1*3DIGIT CRLF

9273	voice-variant         =    "Voice-Variant:" 1*19DIGIT CRLF

9275	voice-name            =    "Voice-Name:"
9276	                           1*UTFCHAR *(1*WSP 1*UTFCHAR) CRLF

9278	prosody-parameter     =    "Prosody-" prosody-param-name ":"
9279	                           prosody-param-value CRLF

9281	prosody-param-name    =    1*VCHAR

9283	prosody-param-value   =    1*VCHAR

9285	timestamp             =    "timestamp" "=" time-stamp-value

9287	time-stamp-value      =    1*20DIGIT

9289	speech-marker         =    "Speech-Marker" ":"
9290	                           timestamp
9291	                           [";" 1*(UTFCHAR / %x20)] CRLF

9293	speech-language       =    "Speech-Language" ":" 1*VCHAR CRLF

9295	fetch-hint            =    "Fetch-Hint" ":" ("prefetch" / "safe") CRLF

9297	audio-fetch-hint      =    "Audio-Fetch-Hint" ":"
9298	                          ("prefetch" / "safe" / "stream") CRLF

9300	failed-uri            =    "Failed-URI" ":" absoluteURI CRLF

9302	failed-uri-cause      =    "Failed-URI-Cause" ":" 1*UTFCHAR CRLF

9304	speak-restart         =    "Speak-Restart" ":" BOOLEAN CRLF

9306	speak-length          =    "Speak-Length" ":" positive-length-value
9307	                           CRLF

9309	positive-length-value   =  positive-speech-length
9310	                        /  text-speech-length

9312	load-lexicon          =    "Load-Lexicon" ":" BOOLEAN CRLF

9314	lexicon-search-order  =    "Lexicon-Search-Order" ":"
9315	          "<" absoluteURI ">" *(" " "<" absoluteURI ">") CRLF

9317	; Recognizer ABNF

9319	recognizer-method     =    recog-only-method
9320	                      /    enrollment-method

9322	recog-only-method     =    "DEFINE-GRAMMAR"
9323	                      /    "RECOGNIZE"
9324	                      /    "INTERPRET"
9325	                      /    "GET-RESULT"
9326	                      /    "START-INPUT-TIMERS"
9327	                      /    "STOP"

9329	enrollment-method     =    "START-PHRASE-ENROLLMENT"
9330	                      /    "ENROLLMENT-ROLLBACK"
9331	                      /    "END-PHRASE-ENROLLMENT"
9332	                      /    "MODIFY-PHRASE"
9333	                      /    "DELETE-PHRASE"

9335	recognizer-event      =    "START-OF-INPUT"
9336	                      /    "RECOGNITION-COMPLETE"
9337	                      /    "INTERPRETATION-COMPLETE"

9339	recognizer-header     =    recog-only-header
9340	                      /    enrollment-header

9342	recog-only-header     =    confidence-threshold
9343	                      /    sensitivity-level
9344	                      /    speed-vs-accuracy
9345	                      /    n-best-list-length
9346	                      /    input-type
9347	                      /    no-input-timeout
9348	                      /    recognition-timeout
9349	                      /    waveform-uri
9350	                      /    input-waveform-uri
9351	                      /    completion-cause
9352	                      /    completion-reason
9353	                      /    recognizer-context-block
9354	                      /    start-input-timers
9355	                      /    speech-complete-timeout
9356	                      /    speech-incomplete-timeout
9357	                      /    dtmf-interdigit-timeout
9358	                      /    dtmf-term-timeout
9359	                      /    dtmf-term-char
9360	                      /    failed-uri
9361	                      /    failed-uri-cause
9362	                      /    save-waveform
9363	                      /    media-type
9364	                      /    new-audio-channel
9365	                      /    speech-language
9366	                      /    ver-buffer-utterance
9367	                      /    recognition-mode
9368	                      /    cancel-if-queue
9369	                      /    hotword-max-duration
9370	                      /    hotword-min-duration
9371	                      /    interpret-text
9372	                      /    dtmf-buffer-time
9373	                      /    clear-dtmf-buffer
9374	                      /    early-no-match

9376	enrollment-header     =    num-min-consistent-pronunciations
9377	                      /    consistency-threshold
9378	                      /    clash-threshold
9379	                      /    personal-grammar-uri
9380	                      /    enroll-utterance
9381	                      /    phrase-id
9382	                      /    phrase-nl
9383	                      /    weight
9384	                      /    save-best-waveform
9385	                      /    new-phrase-id
9386	                      /    confusable-phrases-uri
9387	                      /    abort-phrase-enrollment

9389	confidence-threshold  =    "Confidence-Threshold" ":"
9390	                           FLOAT CRLF

9392	sensitivity-level     =    "Sensitivity-Level" ":" FLOAT
9393	                           CRLF

9395	speed-vs-accuracy     =    "Speed-Vs-Accuracy" ":" FLOAT
9396	                           CRLF

9398	n-best-list-length    =    "N-Best-List-Length" ":" 1*19DIGIT
9399	                           CRLF

9401	input-type            =    "Input-Type" ":"  inputs CRLF
9402	inputs                =    "speech" / "dtmf"

9404	no-input-timeout      =    "No-Input-Timeout" ":" 1*19DIGIT
9405	                           CRLF

9407	recognition-timeout   =    "Recognition-Timeout" ":" 1*19DIGIT
9408	                           CRLF

9410	waveform-uri          =    "Waveform-URI" ":" ["<" uri ">"
9411	                           ";" "size" "=" 1*19DIGIT
9412	                           ";" "duration" "=" 1*19DIGIT] CRLF

9414	recognizer-context-block = "Recognizer-Context-Block" ":"
9415	                           [1*VCHAR] CRLF

9417	start-input-timers    =    "Start-Input-Timers" ":"
9418	                           BOOLEAN CRLF

9420	speech-complete-timeout =  "Speech-Complete-Timeout" ":"
9421	                           1*19DIGIT CRLF

9423	speech-incomplete-timeout = "Speech-Incomplete-Timeout" ":"
9424	                            1*19DIGIT CRLF

9426	dtmf-interdigit-timeout = "DTMF-Interdigit-Timeout" ":"
9427	                          1*19DIGIT CRLF

9429	dtmf-term-timeout     =    "DTMF-Term-Timeout" ":" 1*19DIGIT
9430	                           CRLF

9432	dtmf-term-char        =    "DTMF-Term-Char" ":" VCHAR CRLF

9434	save-waveform         =    "Save-Waveform" ":" BOOLEAN CRLF

9436	new-audio-channel     =    "New-Audio-Channel" ":"
9437	                           BOOLEAN CRLF

9439	recognition-mode         =  "Recognition-Mode" ":"
9440	                            "normal" / "hotword" CRLF

9442	cancel-if-queue       =    "Cancel-If-Queue" ":" BOOLEAN CRLF

9444	hotword-max-duration  =    "Hotword-Max-Duration" ":"
9445	                           1*19DIGIT CRLF

9447	hotword-min-duration  =    "Hotword-Min-Duration" ":"
9448	                           1*19DIGIT CRLF

9450	interpret-text           =  "Interpret-Text" ":" 1*VCHAR CRLF

9452	dtmf-buffer-time      =    "DTMF-Buffer-Time" ":" 1*19DIGIT CRLF

9454	clear-dtmf-buffer     =    "Clear-DTMF-Buffer" ":" BOOLEAN CRLF

9456	early-no-match        =    "Early-No-Match" ":" BOOLEAN CRLF

9458	num-min-consistent-pronunciations    =
9459	    "Num-Min-Consistent-Pronunciations" ":" 1*19DIGIT CRLF

9461	consistency-threshold =    "Consistency-Threshold" ":" FLOAT
9462	                           CRLF

9464	clash-threshold       =    "Clash-Threshold" ":" FLOAT CRLF
9465	personal-grammar-uri  =    "Personal-Grammar-URI" ":" uri CRLF

9467	enroll-utterance      =    "Enroll-Utterance" ":" BOOLEAN CRLF

9469	phrase-id             =    "Phrase-ID" ":" 1*VCHAR CRLF

9471	phrase-nl             =    "Phrase-NL" ":" 1*UTFCHAR CRLF

9473	weight                =    "Weight" ":" FLOAT CRLF

9475	save-best-waveform    =    "Save-Best-Waveform" ":"
9476	                           BOOLEAN CRLF

9478	new-phrase-id         =    "New-Phrase-ID" ":" 1*VCHAR CRLF

9480	confusable-phrases-uri =   "Confusable-Phrases-URI" ":"
9481	                           uri CRLF

9483	abort-phrase-enrollment =  "Abort-Phrase-Enrollment" ":"
9484	                           BOOLEAN CRLF

9486	; Recorder ABNF

9488	recorder-method       =    "RECORD"
9489	                      /    "STOP"
9490	                      /    "START-INPUT-TIMERS"

9492	recorder-event        =    "START-OF-INPUT"
9493	                      /    "RECORD-COMPLETE"

9495	recorder-header       =    sensitivity-level
9496	                      /    no-input-timeout
9497	                      /    completion-cause
9498	                      /    completion-reason
9499	                      /    failed-uri
9500	                      /    failed-uri-cause
9501	                      /    record-uri
9502	                      /    media-type
9503	                      /    max-time
9504	                      /    trim-length
9505	                      /    final-silence
9506	                      /    capture-on-speech
9507	                      /    ver-buffer-utterance
9508	                      /    start-input-timers
9509	                      /    new-audio-channel

9511	record-uri            =    "Record-URI" ":" [ "<" uri ">"
9512	                           ";" "size" "=" 1*19DIGIT
9513	                           ";" "duration" "=" 1*19DIGIT] CRLF

9515	media-type            =    "Media-Type" ":" media-type-value CRLF

9517	max-time              =    "Max-Time" ":" 1*19DIGIT CRLF

9519	trim-length           =    "Trim-Length" ":" 1*19DIGIT CRLF

9521	final-silence         =    "Final-Silence" ":" 1*19DIGIT CRLF

9523	capture-on-speech     =    "Capture-On-Speech " ":"
9524	                           BOOLEAN CRLF

9526	; Verifier ABNF

9528	verifier-method       =    "START-SESSION"
9529	                      /    "END-SESSION"
9530	                      /    "QUERY-VOICEPRINT"
9531	                      /    "DELETE-VOICEPRINT"
9532	                      /    "VERIFY"
9533	                      /    "VERIFY-FROM-BUFFER"
9534	                      /    "VERIFY-ROLLBACK"
9535	                      /    "STOP"
9536	                      /    "CLEAR-BUFFER"
9537	                      /    "START-INPUT-TIMERS"
9538	                      /    "GET-INTERMEDIATE-RESULT"

9540	verifier-event        =    "VERIFICATION-COMPLETE"
9541	                      /    "START-OF-INPUT"

9543	verifier-header       =    repository-uri
9544	                      /    voiceprint-identifier
9545	                      /    verification-mode
9546	                      /    adapt-model
9547	                      /    abort-model
9548	                      /    min-verification-score
9549	                      /    num-min-verification-phrases
9550	                      /    num-max-verification-phrases
9551	                      /    no-input-timeout
9552	                      /    save-waveform
9553	                      /    media-type
9554	                      /    waveform-uri
9555	                      /    voiceprint-exists
9556	                      /    ver-buffer-utterance
9557	                      /    input-waveform-uri
9558	                      /    completion-cause
9559	                      /    completion-reason
9560	                      /    speech-complete-timeout
9561	                      /    new-audio-channel
9562	                      /    abort-verification
9563	                      /    start-input-timers
9564	                      /    input-type

9566	repository-uri        =    "Repository-URI" ":" uri CRLF

9568	voiceprint-identifier        =  "Voiceprint-Identifier" ":"
9569	                                vid *[";" vid] CRLF
9570	vid                          =  1*VCHAR ["." 1*VCHAR]

9572	verification-mode     =    "Verification-Mode" ":"
9573	                           verification-mode-string

9575	verification-mode-string = "train" / "verify"

9577	adapt-model           =    "Adapt-Model" ":" BOOLEAN CRLF

9579	abort-model           =    "Abort-Model" ":" BOOLEAN CRLF

9581	min-verification-score  =  "Min-Verification-Score" ":"
9582	                           [ %x2D ] FLOAT CRLF

9584	num-min-verification-phrases = "Num-Min-Verification-Phrases"
9585	                               ":" 1*19DIGIT CRLF

9587	num-max-verification-phrases = "Num-Max-Verification-Phrases"
9588	                               ":" 1*19DIGIT CRLF

9590	voiceprint-exists     =    "Voiceprint-Exists" ":"
9591	                           BOOLEAN CRLF

9593	ver-buffer-utterance  =    "Ver-Buffer-Utterance" ":"
9594	                           BOOLEAN CRLF

9596	input-waveform-uri    =    "Input-Waveform-URI" ":" uri CRLF

9598	abort-verification    =    "Abort-Verification " ":"
9599	                           BOOLEAN CRLF

9601	   The following productions add a new SDP session-level attribute.  See
9602	   Paragraph 5.

9604	   cmid-attribute   =    "a=cmid:" identification-tag

9606	   identification-tag =    token

9608	16.  XML Schemas

9610	16.1.  NLSML Schema Definition

9612	 <?xml version="1.0" encoding="UTF-8"?>
9613	 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
9614	             targetNamespace="urn:ietf:params:xml:ns:mrcpv2"
9615	             xmlns="urn:ietf:params:xml:ns:mrcpv2"
9616	             elementFormDefault="qualified"
9617	             attributeFormDefault="unqualified" >
9618	   <xs:annotation>
9619	     <xs:documentation> Natural Language Semantic Markup Schema
9620	     </xs:documentation>
9621	   </xs:annotation>
9622	   <xs:include schemaLocation="enrollment-schema.rng"/>
9623	   <xs:include schemaLocation="verification-schema.rng"/>
9624	   <xs:element name="result">
9625	     <xs:complexType>
9626	       <xs:sequence>
9627	         <xs:element name="interpretation" maxOccurs="unbounded">
9628	           <xs:complexType>
9629	             <xs:sequence>
9630	               <xs:element name="instance" minOccurs="0">
9631	                 <xs:complexType mixed="true">
9632	                   <xs:sequence minOccurs="0">
9633	                     <xs:any namespace="##other" processContents="lax"/>
9634	                   </xs:sequence>
9635	                 </xs:complexType>
9636	               </xs:element>
9637	               <xs:element name="input">
9638	                 <xs:complexType mixed="true">
9639	                   <xs:choice>
9640	                     <xs:element name="noinput" minOccurs="0"/>
9641	                     <xs:element name="nomatch" minOccurs="0"/>
9642	                     <xs:element name="input" minOccurs="0"/>
9643	                   </xs:choice>
9644	                   <xs:attribute name="mode"
9645	                                 type="xs:string"
9646	                                 default="speech"/>
9647	                   <xs:attribute name="confidence"
9648	                                 type="confidenceinfo"
9649	                                 default="1.0"/>

9651	                   <xs:attribute name="timestamp-start"
9652	                                 type="xs:string"/>
9653	                   <xs:attribute name="timestamp-end"
9654	                                 type="xs:string"/>
9655	                 </xs:complexType>
9656	               </xs:element>
9657	             </xs:sequence>
9658	             <xs:attribute name="confidence" type="confidenceinfo"
9659	                           default="1.0"/>
9660	             <xs:attribute name="grammar" type="xs:anyURI"
9661	                           use="optional"/>
9662	           </xs:complexType>
9663	         </xs:element>
9664	         <xs:element name="enrollment-result"
9665	                     type="enrollment-contents"/>
9666	         <xs:element name="verification-result"
9667	                     type="verification-contents"/>
9668	       </xs:sequence>
9669	       <xs:attribute name="grammar" type="xs:anyURI"
9670	                     use="optional"/>
9671	     </xs:complexType>
9672	   </xs:element>

9674	   <xs:simpleType name="confidenceinfo">
9675	     <xs:restriction base="xs:float">
9676	        <xs:minInclusive value="0.0"/>
9677	        <xs:maxInclusive value="1.0"/>
9678	     </xs:restriction>
9679	   </xs:simpleType>
9680	 </xs:schema>

9682	16.2.  Enrollment Results Schema Definition
9683	   <?xml version="1.0" encoding="UTF-8"?>

9685	   <!-- MRCP Enrollment Schema
9686	   (See http://www.oasis-open.org/committees/relax-ng/spec.html)
9687	   -->

9689	   <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
9690	            ns="urn:ietf:params:xml:ns:mrcpv2"
9691	            xmlns="http://relaxng.org/ns/structure/1.0">

9693	     <start>
9694	       <element name="enrollment-result">
9695	         <ref name="enrollment-content"/>
9696	       </element>
9697	     </start>
9698	     <define name="enrollment-content">
9699	       <interleave>
9700	         <element name="num-clashes">
9701	           <data type="nonNegativeInteger"/>
9702	         </element>
9703	         <element name="num-good-repetitions">
9704	           <data type="nonNegativeInteger"/>
9705	         </element>
9706	         <element name="num-repetitions-still-needed">
9707	           <data type="nonNegativeInteger"/>
9708	         </element>
9709	         <element name="consistency-status">
9710	           <choice>
9711	             <value>consistent</value>
9712	             <value>inconsistent</value>
9713	             <value>undecided</value>
9714	           </choice>
9715	         </element>
9716	         <optional>
9717	           <element name="clash-phrase-ids">
9718	             <oneOrMore>
9719	               <element name="item">
9720	                 <data type="token"/>
9721	               </element>
9722	             </oneOrMore>
9723	           </element>
9724	         </optional>
9725	         <optional>
9726	           <element name="transcriptions">
9727	             <oneOrMore>
9728	               <element name="item">
9729	                 <text/>
9730	               </element>
9731	             </oneOrMore>
9732	           </element>
9733	         </optional>
9734	         <optional>
9735	           <element name="confusable-phrases">
9736	             <oneOrMore>
9737	               <element name="item">
9738	                 <text/>
9739	               </element>
9740	             </oneOrMore>
9741	           </element>
9742	         </optional>
9743	       </interleave>
9744	     </define>

9746	   </grammar>

9748	16.3.  Verification Results Schema Definition
9749	   <?xml version="1.0" encoding="UTF-8"?>

9751	   <!--    MRCP Verification Results Schema
9752	           (See http://www.oasis-open.org/committees/relax-ng/spec.html)
9753	      -->

9755	   <grammar datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"
9756	            ns="urn:ietf:params:xml:ns:mrcpv2"
9757	            xmlns="http://relaxng.org/ns/structure/1.0">

9759	     <start>
9760	       <element name="verification-result">
9761	         <ref name="verification-contents"/>
9762	       </element>
9763	     </start>

9765	     <define name="verification-contents">
9766	       <element name="voiceprint">
9767	         <ref name="firstVoiceprintContent"/>
9768	       </element>
9769	       <zeroOrMore>
9770	         <element name="voiceprint">
9771	           <ref name="restVoiceprintContent"/>
9772	         </element>
9773	       </zeroOrMore>
9774	     </define>

9776	     <define name="firstVoiceprintContent">
9777	       <attribute name="id">
9778	         <data type="string"/>
9779	       </attribute>
9780	       <interleave>
9781	         <optional>
9782	           <element name="adapted">
9783	             <data type="boolean"/>
9784	           </element>
9785	         </optional>
9786	         <optional>
9787	           <element name="needmoredata">
9788	             <ref name="needmoredataContent"/>
9789	           </element>
9790	         </optional>
9791	         <optional>
9792	           <element name="incremental">
9793	             <ref name="firstCommonContent"/>

9795	           </element>
9796	         </optional>
9797	         <element name="cumulative">
9798	           <ref name="firstCommonContent"/>
9799	         </element>
9800	       </interleave>
9801	     </define>

9803	     <define name="restVoiceprintContent">
9804	       <attribute name="id">
9805	         <data type="string"/>
9806	       </attribute>
9807	       <element name="cumulative">
9808	         <ref name="restCommonContent"/>
9809	       </element>
9810	     </define>

9812	     <define name="firstCommonContent">
9813	       <interleave>
9814	         <element name="decision">
9815	           <ref name="decisionContent"/>
9816	         </element>
9817	         <optional>
9818	           <element name="utterance-length">
9819	             <ref name="utterance-lengthContent"/>
9820	           </element>
9821	         </optional>
9822	         <optional>
9823	           <element name="device">
9824	             <ref name="deviceContent"/>
9825	           </element>
9826	         </optional>
9827	         <optional>
9828	           <element name="gender">
9829	             <ref name="genderContent"/>
9830	           </element>
9831	         </optional>
9832	         <zeroOrMore>
9833	           <element name="verification-score">
9834	             <ref name="verification-scoreContent"/>
9835	           </element>
9836	         </zeroOrMore>
9837	       </interleave>
9838	     </define>

9840	     <define name="restCommonContent">
9841	       <interleave>
9842	         <optional>
9843	           <element name="decision">
9844	             <ref name="decisionContent"/>
9845	           </element>
9846	         </optional>
9847	         <optional>
9848	           <element name="device">
9849	             <ref name="deviceContent"/>
9850	           </element>
9851	         </optional>
9852	         <optional>
9853	           <element name="gender">
9854	             <ref name="genderContent"/>
9855	           </element>
9856	         </optional>
9857	        <zeroOrMore>
9858	           <element name="verification-score">
9859	             <ref name="verification-scoreContent"/>
9860	           </element>
9861	        </zeroOrMore>
9862	        </interleave>
9863	     </define>

9865	     <define name="decisionContent">
9866	       <choice>
9867	         <value>accepted</value>
9868	         <value>rejected</value>
9869	         <value>undecided</value>
9870	       </choice>
9871	     </define>

9873	     <define name="needmoredataContent">
9874	       <data type="boolean"/>
9875	     </define>

9877	     <define name="utterance-lengthContent">
9878	       <data type="nonNegativeInteger"/>
9879	     </define>

9881	     <define name="deviceContent">
9882	       <choice>
9883	         <value>cellular-phone</value>
9884	         <value>electret-phone</value>
9885	         <value>carbon-button-phone</value>
9886	         <value>unknown</value>
9887	       </choice>
9888	     </define>

9890	     <define name="genderContent">
9891	       <choice>
9892	         <value>male</value>
9893	         <value>female</value>
9894	         <value>unknown</value>
9895	       </choice>
9896	     </define>

9898	     <define name="verification-scoreContent">
9899	       <data type="float">
9900	         <param name="minInclusive">-1</param>
9901	         <param name="maxInclusive">1</param>
9902	       </data>
9903	     </define>

9905	   </grammar>

9907	17.  References

9909	17.1.  Normative References

9911	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
9912	              Jacobson, "RTP: A Transport Protocol for Real-Time
9913	              Applications", STD 64, RFC 3550, July 2003.

9915	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
9916	              A., Peterson, J., Sparks, R., Handley, M., and E.
9917	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
9918	              June 2002.

9920	   [RFC2326]  Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time
9921	              Streaming Protocol (RTSP)", RFC 2326, April 1998.

9923	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
9924	              Description Protocol", RFC 4566, July 2006.

9926	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
9927	              RFC 793, September 1981.

9929	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
9930	              Requirement Levels", BCP 14, RFC 2119, March 1997.

9932	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
9933	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
9934	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

9936	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
9937	              with Session Description Protocol (SDP)", RFC 3264,
9938	              June 2002.

9940	   [RFC3629]  Yergeau, F., "UTF-8, a transformation format of ISO
9941	              10646", STD 63, RFC 3629, November 2003.

9943	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
9944	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

9946	   [RFC4145]  Yon, D. and G. Camarillo, "TCP-Based Media Transport in
9947	              the Session Description Protocol (SDP)", RFC 4145,
9948	              September 2005.

9950	   [RFC4572]  Lennox, J., "Connection-Oriented Media Transport over the
9951	              Transport Layer Security (TLS) Protocol in the Session
9952	              Description Protocol (SDP)", RFC 4572, July 2006.

9954	   [RFC5888]  Camarillo, G. and H. Schulzrinne, "The Session Description
9955	              Protocol (SDP) Grouping Framework", RFC 5888, June 2010.

9957	   [RFC5322]  Resnick, P., Ed., "Internet Message Format", RFC 5322,
9958	              October 2008.

9960	   [RFC2392]  Levinson, E., "Content-ID and Message-ID Uniform Resource
9961	              Locators", RFC 2392, August 1998.

9963	   [RFC6265]  Barth, A., "HTTP State Management Mechanism", RFC 6265,
9964	              April 2011.

9966	   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
9967	              Languages", BCP 47, RFC 5646, September 2009.

9969	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
9970	              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
9971	              May 2008.

9973	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
9974	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

9976	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
9977	              specification", STD 13, RFC 1035, November 1987.

9979	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
9980	              Resource Identifier (URI): Generic Syntax", STD 66,
9981	              RFC 3986, January 2005.

9983	   [RFC4288]  Freed, N. and J. Klensin, "Media Type Specifications and
9984	              Registration Procedures", BCP 13, RFC 4288, December 2005.

9986	   [RFC3688]  Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
9987	              January 2004.

9989	   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
9990	              Description Protocol (SDP) Security Descriptions for Media
9991	              Streams", RFC 4568, July 2006.

9993	   [W3C.REC-speech-synthesis-20040907]
9994	              Walker, M., Burnett, D., and A. Hunt, "Speech Synthesis
9995	              Markup Language (SSML) Version 1.0", World Wide Web
9996	              Consortium Recommendation REC-speech-synthesis-20040907,
9997	              September 2004,
9998	              <http://www.w3.org/TR/2004/REC-speech-synthesis-20040907>.

10000	   [RFC5905]  Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network
10001	              Time Protocol Version 4: Protocol and Algorithms
10002	              Specification", RFC 5905, June 2010.

10004	   [RFC2483]  Mealling, M. and R. Daniel, "URI Resolution Services
10005	              Necessary for URN Resolution", RFC 2483, January 1999.

10007	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
10008	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
10009	              RFC 3711, March 2004.

10011	   [RFC5922]  Gurbani, V., Lawrence, S., and A. Jeffrey, "Domain
10012	              Certificates in the Session Initiation Protocol (SIP)",
10013	              RFC 5922, June 2010.

10015	   [W3C.REC-speech-grammar-20040316]
10016	              McGlashan, S. and A. Hunt, "Speech Recognition Grammar
10017	              Specification Version 1.0", World Wide Web Consortium
10018	              Recommendation REC-speech-grammar-20040316, March 2004,
10019	              <http://www.w3.org/TR/2004/REC-speech-grammar-20040316>.

10021	   [W3C.REC-semantic-interpretation-20070405]
10022	              Tichelen, L. and D. Burke, "Semantic Interpretation for
10023	              Speech Recognition (SISR) Version 1.0", World Wide Web
10024	              Consortium REC REC-semantic-interpretation-20070405,
10025	              April 2007, <http://www.w3.org/TR/2007/
10026	              REC-semantic-interpretation-20070405>.

10028	   [W3C.REC-xml-names11-20040204]
10029	              Layman, A., Bray, T., Hollander, D., and R. Tobin,
10030	              "Namespaces in XML 1.1", World Wide Web Consortium
10031	              FirstEdition REC-xml-names11-20040204, February 2004,
10032	              <http://www.w3.org/TR/2004/REC-xml-names11-20040204>.

10034	   [RFC3023]  Murata, M., St. Laurent, S., and D. Kohn, "XML Media
10035	              Types", RFC 3023, January 2001.

10037	   [ISO.8859-1.1987]
10038	              International Organization for Standardization,
10039	              "Information technology - 8-bit single byte coded graphic
10040	              - character sets - Part 1: Latin alphabet No. 1, JTC1/
10041	              SC2", ISO Standard 8859-1, 1987.

10043	17.2.  Informative References

10045	   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
10046	              RFC 4960, September 2007.

10048	   [RFC4313]  Oran, D., "Requirements for Distributed Control of
10049	              Automatic Speech Recognition (ASR), Speaker
10050	              Identification/Speaker Verification (SI/SV), and Text-to-
10051	              Speech (TTS) Resources", RFC 4313, December 2005.

10053	   [Q.23]     International Telecommunications Union, "Technical
10054	              Features of Push-Button Telephone Sets", ITU-T Q.23, 1993.

10056	   [RFC4395]  Hansen, T., Hardie, T., and L. Masinter, "Guidelines and
10057	              Registration Procedures for New URI Schemes", BCP 35,
10058	              RFC 4395, February 2006.

10060	   [RFC4733]  Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF
10061	              Digits, Telephony Tones, and Telephony Signals", RFC 4733,
10062	              December 2006.

10064	   [W3C.REC-voicexml20-20040316]
10065	              Burnett, D., Porter, B., Tryphonas, S., McGlashan, S.,
10066	              Carter, J., Danielsen, P., Ferrans, J., Hunt, A., Rehor,
10067	              K., and B. Lucas, "Voice Extensible Markup Language
10068	              (VoiceXML) Version 2.0", World Wide Web Consortium
10069	              Recommendation REC-voicexml20-20040316, March 2004,
10070	              <http://www.w3.org/TR/2004/REC-voicexml20-20040316>.

10072	   [RFC4463]  Shanmugham, S., Monaco, P., and B. Eberman, "A Media
10073	              Resource Control Protocol (MRCP) Developed by Cisco,
10074	              Nuance, and Speechworks", RFC 4463, April 2006.

10076	   [refs.javaSpeechGrammarFormat]
10077	              Sun Microsystems, "Java Speech Grammar Format Version
10078	              1.0", October 1998.

10080	   [W3C.REC-emma-20090210]
10081	              Johnston, M., Baggia, P., Burnett, D., Carter, J., Dahl,
10082	              D., McCobb, G., and D. Raggett, "EMMA: Extensible
10083	              MultiModal Annotation markup language", World Wide Web
10084	              Consortium Recommendation REC-emma-20090210,
10085	              February 2009,
10086	              <http://www.w3.org/TR/2009/REC-emma-20090210>.

10088	   [RFC4467]  Crispin, M., "Internet Message Access Protocol (IMAP) -
10089	              URLAUTH Extension", RFC 4467, May 2006.

10091	   [W3C.REC-pronunciation-lexicon-20081014]
10092	              Baggia, P., Bagshaw, P., Burnett, D., Carter, J., and F.
10093	              Scahill, "Pronunciation Lexicon Specification (PLS)",
10094	              World Wide Web Consortium Recommendation REC-
10095	              pronunciation-lexicon-20081014, October 2008, <http://
10096	              www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014>.

10098	   [RFC2818]  Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000.

10100	   [ISO.8601.1988]
10101	              International Organization for Standardization, "Data
10102	              elements and interchange formats - Information interchange
10103	              - Representation of dates and times", ISO Standard 8601,
10104	              June 1988.

10106	   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
10107	              Internet Protocol", RFC 4301, December 2005.

10109	   [RFC4217]  Ford-Hutchinson, P., "Securing FTP with TLS", RFC 4217,
10110	              October 2005.

10112	   [RFC6454]  Barth, A., "The Web Origin Concept", RFC 6454,
10113	              December 2011.

10115	Appendix A.  Contributors

10117	   Pierre Forgues
10118	   Nuance Communications Ltd.
10119	   1500 University Street
10120	   Suite 935
10121	   Montreal, Quebec
10122	   Canada H3A 3S7

10124	   Email:  forgues@nuance.com

10126	   Charles Galles
10127	   Intervoice, Inc.
10128	   17811 Waterview Parkway
10129	   Dallas, Texas 75252

10131	   Email:  charles.galles@intervoice.com

10133	   Klaus Reifenrath
10134	   Scansoft, Inc
10135	   Guldensporenpark 32
10136	   Building D
10137	   9820 Merelbeke
10138	   Belgium

10140	   Email: klaus.reifenrath@scansoft.com

10142	Appendix B.  Acknowledgements

10144	   Andre Gillet (Nuance Communications)
10145	   Andrew Hunt (ScanSoft)
10146	   Andrew Wahbe (Genesys)
10147	   Aaron Kneiss (ScanSoft)
10148	   Brian Eberman (ScanSoft)
10149	   Corey Stohs (Cisco Systems Inc)
10150	   Dave Burke (VoxPilot)
10151	   Jeff Kusnitz (IBM Corp)
10152	   Ganesh N Ramaswamy (IBM Corp)
10153	   Klaus Reifenrath (ScanSoft)
10154	   Kristian Finlator (ScanSoft)
10155	   Magnus Westerlund (Ericsson)
10156	   Martin Dragomirecky (Cisco Systems Inc)
10157	   Paolo Baggia (Loquendo)
10158	   Peter Monaco (Nuance Communications)
10159	   Pierre Forgues (Nuance Communications)
10160	   Ran Zilca (IBM Corp)
10161	   Suresh Kaliannan (Cisco Systems Inc.)
10162	   Skip Cave (Intervoice Inc)
10163	   Thomas Gal (LumenVox)

10165	   The chairs of the speechsc work group are Eric Burger (Georgetown
10166	   University) and Dave Oran (Cisco Systems, Inc.).

10168	   Many thanks go in particular to Robert Sparks, Alex Agranovsky, and
10169	   Henry Phan, who were there at the end to dot all the i's and cross
10170	   all the t's.

10172	Authors' Addresses

10174	   Daniel C. Burnett
10175	   Voxeo
10176	   189 South Orange Avenue #1000
10177	   Orlando, FL  32801
10178	   USA

10180	   Email: dburnett@voxeo.com
10181	   Saravanan Shanmugham
10182	   Cisco Systems, Inc.
10183	   170 W. Tasman Dr.
10184	   San Jose, CA  95134
10185	   USA

10187	   Email: sarvi@cisco.com