[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[speechsc] The NLSML schema and namespaces



I would like to raise a few issues with both the NSLML schema and it's
use of namespaces.

First, SRGS and SISR allow you to define a grammar so that multiple
token sequences map to one string literal result. For example, "yes",
"ya", "sure", "yes please", and "ok" could all result in the string
literal result "yes". Thus, if you said "sure", the string literal
interpretation result would be "yes".
 
Unfortunately there doesn't seem to be a way to specify string literals
in NLSML. You would think that the example above could be expressed as
follows:
 
<?xml version="1.0" encoding="UTF-8"?>
<result xmlns="http://www.ietf.org/xml/ns/mrcpv2";>
  <interpretation confidence="0.9">
    <instance>yes</instance>
    <input mode="speech">sure</input>
  </interpretation>
</result>

However this isn't allowed by the NLSML schema in the current MRCPv2
draft. This could be allowed by changing the <instance> type to allow
"mixed" contents (see the definition of <input>). Also, we would need to
change the schema to allow <instance> to have no child elements.
Applying these changes we get the following element definition:

<xs:element name="instance" minOccurs="0">
 <xs:complexType mixed="true">
  <xs:sequence minOccurs="0">
   <xs:any/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

Of course this allows for a mix of text and elements (eg. <instance> yes
<no/> maybe </instance>) which is probably not desirable. XML schema has
no way to restrict this but the format we define could specify it this
way (in the text of the spec). The alternative would be to do what EMMA
does with the <emma:literal/> element. Either way would be fine with me.

The second issue is with the <xs:any/> portion of the instance element
definition. As currently defined, a schema validator will try to
validate it's contents even if a schema is not available. We should
probably relax this by adding a processContents attribute of "lax". This
will cause the validator to only process the contents if a schema is
available.

Also, this currently allows any elements, including those from the NLSML
namespace to be within an <instance/> element. I'm guessing that we
actually want to allow elements from other namespaces, and to restrict
it to elements from other namespaces. E.g. you shouldn't be able to do
this:

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2";>
 <interpretation>
   <instance>
     <result>
       <interpretation>
         <instance/>
       </interpretation>    
     </result>
   </instance>
 </interpretation>
</result>

However, this is ok:

<result xmlns="http://www.ietf.org/xml/ns/mrcpv2";>
 <interpretation>
   <instance>
     <result xmlns="http://example.com/myNamespace";>
       <interpretation>
         <instance/>
       </interpretation>    
     </result>
   </instance>
 </interpretation>
</result>

The final element definition for <instance/> would then be:
<xs:element name="instance" minOccurs="0">
 <xs:complexType mixed="true">
  <xs:sequence minOccurs="0">
   <xs:any namespace="##other" processContents="lax"/>
  </xs:sequence>
 </xs:complexType>
</xs:element>

Of course, this also raises the issue that all of the examples in the
spec don't declare namespaces at all. It would probably be a good idea
to do this properly.
So examples such as this:
<?xml version="1.0"?>
<result grammar="session:request1 at form-level.store">
 <interpretation>
  <instance name="Person">
   <Person>
    <Name> Andre Roy </Name>
   </Person>
  </instance>
  <input>   may I speak to Andre Roy </input>
 </interpretation>
</result>

Become this:

<?xml version="1.0"?>
<nl:result xmlns:nl="http://www.ietf.org/xml/ns/mrcpv2";
           xmlns="http://www.example.com/example";
           grammar="session:request1 at form-level.store">
    <nl:interpretation>
        <nl:instance>
            <Person>
                <Name> Andre Roy </Name>
		</Person>
        </nl:instance>
        <nl:input>   may I speak to Andre Roy </nl:input>
    </nl:interpretation>
</nl:result>

Finally, it is not clear what the namespace of the NSLML format is
supposed to be. The schema says this:
   <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema";
               targetNamespace="http://www.ietf.org/xml/schema/mrcpv2";
               xmlns="http://www.ietf.org/xml/ns/mrcpv2"; ...

I have a feeling that "http://www.ietf.org/xml/schema/mrcpv2"; is
supposed to be the location of the schema and
"http://www.ietf.org/xml/ns/mrcpv2"; is supposed to be the namespace for
NLSML. If that is the case, then the schema should be written this way:
   <xs:schema     xmlns:xs="http://www.w3.org/2001/XMLSchema";
               targetNamespace="http://www.ietf.org/xml/ns/mrcpv2";
               xmlns="http://www.ietf.org/xml/ns/mrcpv2"; ...

The schema location is not referenced in the schema content at all.
Either way, the default namespace and the targetNamespace should match
otherwise referencing the "confidenceinfo" simpleType in the definitions
of the confidence attributes does not work properly.
Thanks,

Andrew Wahbe

_______________________________________________
Speechsc mailing list
Speechsc at ietf.org
https://www1.ietf.org/mailman/listinfo/speechsc