idnits 2.17.1 

draft-davies-idntables-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 288: '...   A document MUST contain exactly one...'
     RFC 2119 keyword, line 289: '...   element MUST contain exactly one "d...'
     RFC 2119 keyword, line 305: '...icode-version" element MUST be used by...'
     RFC 2119 keyword, line 313: '...   RECOMMENDED that it be the decimal ...'
     RFC 2119 keyword, line 328: '... of this element MUST be a valid ISO 8...'
     (56 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (June 24, 2015) is 3229 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC7303' is defined on line 2045, but no explicit
     reference was found in the text


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          K. Davies
3	Internet-Draft                                                     ICANN
4	Intended status: Informational                                A. Freytag
5	Expires: December 26, 2015                                    ASMUS Inc.
6	                                                           June 24, 2015

8	            Representing Label Generation Rulesets using XML
9	                       draft-davies-idntables-10

11	Abstract

13	   This document describes a method of representing rules for validating
14	   identifier labels and alternate representations of those labels using
15	   Extensible Markup Language (XML).  These policies, known as "Label
16	   Generation Rulesets" (LGRs), are used for the implementation of
17	   Internationalized Domain Names (IDNs), for example.  The rulesets are
18	   used to implement and share that aspect of policy defining which
19	   labels and specific Unicode code points are permitted for
20	   registrations, which alternative code points are considered variants,
21	   and what actions may be performed on labels containing those
22	   variants.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on December 26, 2015.

41	Copyright Notice

43	   Copyright (c) 2015 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	Table of Contents

58	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
59	   2.  Design Goals  . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   3.  LGR Format  . . . . . . . . . . . . . . . . . . . . . . . . .   6
61	     3.1.  Namespace . . . . . . . . . . . . . . . . . . . . . . . .   6
62	     3.2.  Basic Structure . . . . . . . . . . . . . . . . . . . . .   6
63	     3.3.  Metadata  . . . . . . . . . . . . . . . . . . . . . . . .   7
64	       3.3.1.  The version Element . . . . . . . . . . . . . . . . .   7
65	       3.3.2.  The date Element  . . . . . . . . . . . . . . . . . .   7
66	       3.3.3.  The language Element  . . . . . . . . . . . . . . . .   8
67	       3.3.4.  The scope Element . . . . . . . . . . . . . . . . . .   8
68	       3.3.5.  The description Element . . . . . . . . . . . . . . .   9
69	       3.3.6.  The validity-start and validity-end Elements  . . . .   9
70	       3.3.7.  The unicode-version Element . . . . . . . . . . . . .   9
71	       3.3.8.  The references Element  . . . . . . . . . . . . . . .  10
72	   4.  Code Points and Variants  . . . . . . . . . . . . . . . . . .  11
73	     4.1.  Sequences . . . . . . . . . . . . . . . . . . . . . . . .  12
74	     4.2.  Variants  . . . . . . . . . . . . . . . . . . . . . . . .  12
75	       4.2.1.  Basic Variants  . . . . . . . . . . . . . . . . . . .  13
76	       4.2.2.  The type attribute  . . . . . . . . . . . . . . . . .  14
77	       4.2.3.  Null Variants . . . . . . . . . . . . . . . . . . . .  15
78	       4.2.4.  Variants with Reflexive Mapping . . . . . . . . . . .  15
79	       4.2.5.  Conditional Variants  . . . . . . . . . . . . . . . .  16
80	     4.3.  Annotations . . . . . . . . . . . . . . . . . . . . . . .  18
81	       4.3.1.  The ref Attribute . . . . . . . . . . . . . . . . . .  18
82	       4.3.2.  The comment Attribute . . . . . . . . . . . . . . . .  19
83	     4.4.  Code Point Tagging  . . . . . . . . . . . . . . . . . . .  19
84	   5.  Whole Label and Context Evaluation  . . . . . . . . . . . . .  20
85	     5.1.  Basic Concepts  . . . . . . . . . . . . . . . . . . . . .  20
86	     5.2.  Character Classes . . . . . . . . . . . . . . . . . . . .  20
87	       5.2.1.  Declaring and Invoking Named Classes  . . . . . . . .  21
88	       5.2.2.  Tag-based Classes . . . . . . . . . . . . . . . . . .  21
89	       5.2.3.  Unicode Property-based Classes  . . . . . . . . . . .  22
90	       5.2.4.  Explicitly Declared Classes . . . . . . . . . . . . .  23
91	       5.2.5.  Combined Classes  . . . . . . . . . . . . . . . . . .  24
92	     5.3.  Whole Label and Context Rules . . . . . . . . . . . . . .  25
93	       5.3.1.  The rule Element  . . . . . . . . . . . . . . . . . .  25
94	       5.3.2.  The Match Operators . . . . . . . . . . . . . . . . .  26
95	       5.3.3.  The count Attribute . . . . . . . . . . . . . . . . .  27
96	       5.3.4.  The name and by-ref Attributes  . . . . . . . . . . .  28
97	       5.3.5.  The choice Element  . . . . . . . . . . . . . . . . .  29
98	       5.3.6.  Literal Code Point Sequences  . . . . . . . . . . . .  29
99	       5.3.7.  The any Element . . . . . . . . . . . . . . . . . . .  29
100	       5.3.8.  The start and end Elements  . . . . . . . . . . . . .  30
101	       5.3.9.  Example rule from IDNA2008  . . . . . . . . . . . . .  30
102	     5.4.  Parameterized Context or When Rules . . . . . . . . . . .  31
103	       5.4.1.  The anchor Element  . . . . . . . . . . . . . . . . .  31
104	       5.4.2.  The look-behind and look-ahead Elements . . . . . . .  32
105	       5.4.3.  Omitting the anchor Element . . . . . . . . . . . . .  33
106	   6.  The action Element  . . . . . . . . . . . . . . . . . . . . .  34
107	     6.1.  The match and not-match Attributes  . . . . . . . . . . .  34
108	     6.2.  Actions with Variant Type Triggers  . . . . . . . . . . .  35
109	       6.2.1.  The all-, any- and only-variants Attributes . . . . .  35
110	       6.2.2.  Example from RFC 3743 Tables  . . . . . . . . . . . .  37
111	     6.3.  Recommended Disposition Values  . . . . . . . . . . . . .  38
112	     6.4.  Precedence  . . . . . . . . . . . . . . . . . . . . . . .  39
113	     6.5.  Implied Actions . . . . . . . . . . . . . . . . . . . . .  39
114	     6.6.  Default Actions . . . . . . . . . . . . . . . . . . . . .  39
115	   7.  Processing a Label Against an LGR . . . . . . . . . . . . . .  40
116	     7.1.  Determining Eligibility for a Label . . . . . . . . . . .  40
117	     7.2.  Determining Variants for a Label  . . . . . . . . . . . .  41
118	     7.3.  Determining a Disposition for a Label or Variant Label  .  41
119	   8.  Conversion to and from Other Formats  . . . . . . . . . . . .  42
120	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  43
121	     9.1.  Media Type  . . . . . . . . . . . . . . . . . . . . . . .  43
122	     9.2.  URN Registration  . . . . . . . . . . . . . . . . . . . .  43
123	   10. Security Considerations . . . . . . . . . . . . . . . . . . .  43
124	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  43
125	   Appendix A.  Example Tables . . . . . . . . . . . . . . . . . . .  44
126	   Appendix B.  How to Translate RFC 3743 based Tables into the XML
127	                Format . . . . . . . . . . . . . . . . . . . . . . .  48
128	   Appendix C.  Indic Syllable Structure Example . . . . . . . . . .  52
129	   Appendix D.  RelaxNG Compact Schema . . . . . . . . . . . . . . .  55
130	   Appendix E.  Acknowledgements . . . . . . . . . . . . . . . . . .  63
131	   Appendix F.  Editorial Notes  . . . . . . . . . . . . . . . . . .  64
132	     F.1.  Known Issues and Future Work  . . . . . . . . . . . . . .  64
133	     F.2.  Change History  . . . . . . . . . . . . . . . . . . . . .  64
134	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  66

136	1.  Introduction

138	   This memo describes a method of using Extensible Markup Language
139	   (XML) to describe the algorithm used to determine whether a given
140	   identifier label is permitted, and under which conditions, based on
141	   the code points it contains and their context.  These algorithms are
142	   comprised of a list of permissible code points, variant code point
143	   mappings, and a set of rules acting on them.  These algorithms form
144	   part of an administrator's policies, and can be referred to as Label
145	   Generation Rulesets (LGRs), or IDN tables.

147	   There are other kinds of policies relating to labels which are not
148	   normally covered by Label Generation Rulesets and are therefore not
149	   representable by the XML format described here.  These include, but
150	   are not limited to policies around trademarks, or prohibition of
151	   fraudulent or objectionable words.

153	   Administrators of the zones for top-level domain registries have
154	   historically published their LGRs using ASCII text or HTML.  The
155	   formatting of these documents has been loosely based on the format
156	   used for the Language Variant Table described in [RFC3743].
157	   [RFC4290] also provides a "model table format" that describes a
158	   similar set of functionality.  Common to these formats is that the
159	   algorithms used to evaluate the data therein are implicit or
160	   specified elsewhere.

162	   Through the first decade of IDN deployment, experience has shown that
163	   LGRs derived from these formats are difficult to consistently
164	   implement and compare due to their differing formats.  A universal
165	   format, such as one using a structured XML format, will assist by
166	   improving machine-readability, consistency, reusability and
167	   maintainability of LGRs.

169	   When used to represent simple list of permitted code points, the
170	   format is quite straightforward.  At the cost of some complexity in
171	   the resulting file, it also allows for an implementation of more
172	   sophisticated handling of conditional variants that reflects the
173	   known requirements of current zone administrator policies.

175	   Another feature of this format is that it allows many of the
176	   algorithms to be made explicit and machine implementable.  A
177	   remaining small set of implicit algorithms is described in this
178	   document to allow commonality in implementation.

180	   While the predominant usage of this specification is to represent IDN
181	   label policy, the format is not limited to IDN usage may also be used
182	   for describing ASCII domain name label rulesets, or other types of
183	   identifier labels beyond those used for domain names.

185	2.  Design Goals

187	   The following goals informed the design of this format:

189	   o  The format needs to be implementable in a reasonably
190	      straightforward manner in software.

192	   o  The format should be able to be automatically checked for
193	      formatting errors, so that common mistakes can be caught.

195	   o  An LGR needs to be able to express the set of valid code points
196	      that are allowed for registration under a specific administrator's
197	      policies.

199	   o  Provide the ability to express computed alternatives to a given
200	      identifier based on mapping relationships between code points,
201	      whether one-to-one or many-to-many.  These computed alternatives
202	      are commonly known as "variants".

204	   o  Variant code points should be able to be tagged with specific
205	      dispositions or categories that can be used to support registry
206	      policy (such as whether to allocate the computed variant, or to
207	      merely block it from usage or registration).

209	   o  Variants and code points must be able to be stipulated based on
210	      contextual information.  For example, specific variants may only
211	      be applicable when they follow another specific code point, or
212	      when the code point is displayed in a specific presentation form.

214	   o  The data contained within an LGR must be able to be interpreted
215	      unambiguously, so that independent implementations that utilize
216	      the contents will arrive at the same results.

218	   o  To the largest extent possible, policy rules should be able to be
219	      specified in the XML format without relying hidden, or built-in
220	      algorithms in implementations.

222	   o  LGRs should be suitable for comparison and re-use, such that one
223	      could easily compare the contents of two or more to see the
224	      differences, to merge them, and so on.

226	   o  As many existing IDN tables as practicable should be able to be
227	      migrated to the LGR format with all applicable interpretation
228	      logic retained.

230	   These requirements are partly derived from reviewing the existing
231	   corpus of published IDN tables, plus the requirements of ICANN's work
232	   to implement an LGR for the DNS Root Zone [LGR-PROCEDURE].  In
233	   particular, Section B of that document identifies five specific
234	   requirements for an LGR methodology.

236	   The syntax and rules in [RFC5892] and [RFC3743] were also reviewed.

238	   It is explicitly not the goal of this format to stipulate what code
239	   points should be listed in an LGR by a zone administrator.  Which
240	   registration policies are used for a particular zone is outside the
241	   scope of this memo.

243	3.  LGR Format

245	   An LGR is expressed as a well-formed XML Document [XML].

247	3.1.  Namespace

249	   The XML Namespace URI is "urn:ietf:params:xml:ns:lgr-1.0".  [Note:
250	   the examples and schemas for any non-final versions of this
251	   specification use a namespace that is not guaranteed.  Early
252	   implementors should consider the need to revise the namespace in
253	   subsequent revisions.]

255	   See Section 9.2 for more information.

257	3.2.  Basic Structure

259	   The basic XML framework of the document is as follows:

261	       <?xml version="1.0"?>
262	       <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
263	           ...
264	       </lgr>

266	   The "lgr" element contains up to three sub-elements.  First is an
267	   optional "meta" element that contains all meta-data associated with
268	   the LGR, such as its authorship, what it is used for, implementation
269	   notes and references.  This is followed by a "data" element that
270	   contains the substantive code point data.  Finally, an optional
271	   "rules" element contains information on contextual and whole-label
272	   evaluation rules, if any, along with any specific "action" elements
273	   providing for the disposition of labels and computed variant labels.

275	       <?xml version="1.0"?>
276	       <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
277	           <meta>
278	               ...
279	           </meta>
280	           <data>
281	               ...
282	           </data>
283	           <rules>
284	               ...
285	           </rules>
286	       </lgr>

288	   A document MUST contain exactly one "lgr" element.  Each "lgr"
289	   element MUST contain exactly one "data" element, optionally preceded
290	   by one "meta" element and optionally followed by one "rules" element.

292	   In the following descriptions, required, non-repeating elements or
293	   attributes are generally not called out explicitly, in contrast to
294	   optional ones or those that may be repeated.  For attributes that
295	   take lists as values the elements are space-delimited.

297	3.3.  Metadata

299	   The optional "meta" element is used to express meta-data associated
300	   within the LGR.  It can be used to identify the author or relevant
301	   contact person, explain the intended usage of the LGR, and provide
302	   implementation notes as well as references.  With the exception of
303	   "unicode-version" element, the data contained within is not required
304	   by software consuming the LGR in order to calculate valid labels, or
305	   to calculate variants.  The "unicode-version" element MUST be used by
306	   a consumer of the table to identify that it has the correct Unicode
307	   property data to perform operations on the table.

309	3.3.1.  The version Element

311	   The "version" element is optional.  It is used to uniquely identify
312	   each version of the LGR.  No specific format is required, but it is
313	   RECOMMENDED that it be the decimal representation of a single
314	   positive integer, which is incremented with each revision of the
315	   file.

317	   An example of a typical first edition of a document:

319	       <version>1</version>

321	   The "version" element may have an optional "comment" attribute.

323	       <version comment="draft">1</version>

325	3.3.2.  The date Element

327	   The optional "date" element is used to identify the date the LGR was
328	   posted.  The contents of this element MUST be a valid ISO 8601 "full-
329	   date" string as described in [RFC3339].

331	   Example of a date:

333	       <date>2009-11-01</date>

335	3.3.3.  The language Element

337	   The optional "language" element signals that the LGR is associated
338	   with a specific language or script.  The value of the "language"
339	   element MUST be a valid language tag as described in [RFC5646].  The
340	   tag may refer to a script plus undefined language if the LGR is not
341	   referring to a specific language.

343	   Example of an English language LGR:

345	      <language>en</language>

347	   If the LGR applies to a specific script, rather than a language, the
348	   "und" language tag should be used followed by the relevant [RFC5646]
349	   script subtag.  For example, for a Cyrillic script LGR:

351	      <language>und-Cyrl</language>

353	   If the LGR covers a specific set of multiple languages or scripts,
354	   the "language" element can be repeated.  However, for cases of a
355	   script-specific LGR exhibiting insignificant admixture of code points
356	   from other scripts, it is RECOMMENDED to the use a single "language"
357	   element identifying the predominant script.  In the exceptional case
358	   of a multi-script LGR where no script is predominant, use Zyyy
359	   (Common):

361	      <language>und-Zyyy</language>

363	   Note that that for the particular case of Japanese, a script tag
364	   "Jpan" exists that matches the mixture of scripts used in writing
365	   that language.  The preferred "language" element would be:

367	      <language>und-Jpan</language>

369	3.3.4.  The scope Element

371	   This optional element refers to a scope, such as a domain, to which
372	   this policy is applied.  The "type" attribute specifies the type of
373	   scope being defined.  A type of "domain" means that the scope is a
374	   domain that represents the apex of the DNS zone to which the LGR is
375	   applied.  The value must be a valid domain name, and in the case of
376	   the DNS root zone, should be represented as ".".

378	       <scope type="domain">example.com</scope>

380	   There may be multiple "scope" tags used, for example to reflect a
381	   list of domains to which the LGR is applied.  Other types of scope
382	   are application defined, with an explanation in the "description"
383	   element RECOMMENDED.

385	3.3.5.  The description Element

387	   The "description" element is an optional free-form element that
388	   contains any additional relevant description that is useful for the
389	   user in its interpretation.  Typically, this field contains
390	   authorship information, as well as additional context on how the LGR
391	   was formulated and how it applies, such as citations and references
392	   that apply to the LGR as a whole.

394	   This field should not be relied upon for providing instructions on
395	   how to parse or utilize the data contained elsewhere in the
396	   specification.  Authors of tables should expect that software
397	   applications that parse and use LGRs will not use the description
398	   field to condition the application of the LGR's data and rules.

400	   The element has an optional "type" attribute, which refers to the
401	   internet media type of the enclosed data.  Typical types would be
402	   "text/plain" or "text/html".  The attribute SHOULD be a valid MIME
403	   type.  If supplied, it will be assumed that the contents are of that
404	   media type.  If the description lacks a type field, it will be
405	   assumed to be plain text ("text/plain").

407	3.3.6.  The validity-start and validity-end Elements

409	   The "validity-start" and "validity-end" elements are optional
410	   elements that describe the time period from which the contents of the
411	   LGR become valid (i.e. are used in registry policy), and the contents
412	   of the LGR cease to be used.

414	   The dates MUST confirm to the "full-date" format described in section
415	   5.6 of [RFC3339].

417	       <validity-start>2014-03-12</validity-start>

419	3.3.7.  The unicode-version Element

421	   Whenever an LGR depends on character properties from a given version
422	   of the Unicode standard, the version number used in creating the LGR
423	   MUST be listed in the form x.y.z, where x, y, and z are positive,
424	   decimal integers (see [Unicode-Versions]).  If any software
425	   processing the table does not have access to character property data
426	   of the requisite version, it MUST NOT perform any operations relating
427	   to whole-label evaluation relying on Unicode properties
428	   (Section 5.2.3).

430	   The value of a given Unicode property in [UAX42] may change between
431	   versions, unless such change has been explicitly disallowed in
432	   [Unicode-Stability].  It is RECOMMENDED to only reference properties
433	   defined as stable or immutable.  As an alternative to referencing the
434	   property, the information can be presented explicitly in the LGR.

436	       <unicode-version>6.2.0</unicode-version>

438	   It is not necessary to include a "unicode-version" element for LGRs
439	   that do not make use of Unicode properties, however, it is
440	   RECOMMENDED.

442	3.3.8.  The references Element

444	   A Label Generation Ruleset may define a list of references which are
445	   used to associate various individual elements in the LGR to one or
446	   more normative references.  A common use for references is to
447	   annotate that code points belong to an externally defined collection
448	   or standard, or to give normative references for rules.

450	   References are specified in an optional "references" element contains
451	   any number of "reference" elements, each with a unique "id"
452	   attribute.  It is RECOMMENDED that the "id" attribute be a zero-based
453	   integer.  The value of each "reference" element SHOULD be the
454	   citation of a standard, dictionary or other specification in any
455	   suitable format.  In addition to an "id" attribute, a "reference"
456	   element may have a "comment" attribute for an optional free-form
457	   annotation.

459	       <references>
460	         <reference id="0">The Unicode Standard, Version 7.0</reference>
461	         <reference id="1">Big-5: Computer Chinese Glyph and Character
462	            Code Mapping Table, Technical Report C-26, 1984</reference>
463	         <reference id="2" comment="synchronized with Unicode 6.1">
464	            ISO/IEC
465	            10646:2012 3rd edition</reference>
466	         ...
467	       </references>
468	       ...
469	       <data>
470	         <char cp="0620" ref="0 2" />
471	         ...
472	       </data>

474	   A reference is associated with an element by using an optional "ref"
475	   attribute (see Section 4.3.1).  The use of "ref" attributes is
476	   limited to certain kinds of elements in the "data" or "rules"
477	   sections of the LGR, most notably those defining code points and
478	   rules.  A "ref" attribute may neither occur on elements that are
479	   named references to character classes and rules nor on certain other
480	   element types.  See description of these elements below.

482	4.  Code Points and Variants

484	   The bulk of a label generation ruleset is a description of which set
485	   of code points are eligible for a given label.  For rulesets that
486	   perform operations that result in potential variants, the code point-
487	   level relationships between variants need to also be described.

489	   The code point data is collected within the "data" element.  Within
490	   this element, a series of "char" and "range" elements describe
491	   eligible code points, or ranges of code points, respectively.

493	   Discrete permissible code points or code point sequences are declared
494	   with a "char" element, e.g.

496	       <char cp="002D"/>

498	   Ranges of permissible code points may be stipulated with a "range"
499	   element, e.g.

501	       <range first-cp="0030" last-cp="0039"/>

503	   The range is inclusive of the first and last code points.  All
504	   attributes defined for a "range" element act as if applied to each
505	   code point within.  A "range" element has no child elements.

507	   It is always possible to substitute a list of individually specified
508	   code points for a range element.  The reverse is not necessarily the
509	   case.  Whenever such a substitution is possible, it makes no
510	   difference in processing the data.  Tools reading or writing the LGR
511	   format are free to aggregate sequences of consecutive code points of
512	   the same properties into range elements.

514	   Code points must be expressed in uppercase, hexadecimal, and zero
515	   padded to a minimum of 4 digits.  In other words, represented
516	   according to the standard Unicode convention but without the prefix
517	   "U+".  The rationale for not allowing other encoding formats,
518	   including native Unicode encoding in XML, is explored in [UAX42].
519	   The XML conventions used in this format, including the element and
520	   attribute names, mirror this document where practical and reasonable
521	   to do so.  It is RECOMMENDED to list all "char" elements in ascending
522	   order of the "cp" attribute.

524	   All "char" elements in the data section MUST have distinct "cp"
525	   attributes.  The "range" elements MUST NOT specify code point ranges
526	   that overlap either another range or any single code point "char"
527	   elements.

529	4.1.  Sequences

531	   A sequence of two or more code points may be specified in an LGR, for
532	   example, when defining the source for n:m variant mappings.  Another
533	   use of sequences would be in cases when the exact sequence of code
534	   points is required to occur in order for the constituent elements to
535	   be eligible, such as when a specific code point is only eligible when
536	   preceded or followed by another code point.  The following would
537	   define the eligibility of the MIDDLE DOT (U+00B7) only when both
538	   preceded and followed by the LATIN SMALL LETTER L (U+006C):

540	       <char cp="006C 00B7 006C" comment="Catalan middle dot"/>

542	   All sequences defined this way must be distinct, but sub-sequences
543	   may be defined.  Thus, the sequence defined here may coexist with
544	   single code point definitions such as:

546	       <char cp="006C" />

548	   As an alternative to using sequences to define a required context, a
549	   "char" or "range" element may specify conditional context using an
550	   optional "when" attribute as described below in Section 4.2.5.  The
551	   latter method is more flexible in that such conditional context is
552	   not limited to specific code point in addition to allowing both
553	   prohibited as well as required context to be specified.

555	   As described below, the "char" element, whether or not it is used for
556	   a single code point, or for a sequence, may have optional child
557	   elements defining variants.  Both the "char" and "range" elements can
558	   take a number of optional attributes for conditional inclusion,
559	   commenting, cross referencing and character tagging, as described
560	   below.

562	4.2.  Variants

564	   Most LGRs typically only determine simple code point eligibility, and
565	   for them, the elements described so far would be the only ones
566	   required for their "data" section.  Others additionally specify a
567	   mapping of code points to other code points, known as "variants".
568	   What constitutes a variant code point is a matter of policy, and
569	   varies for each implementation.  The following examples are intended
570	   to demonstrate the syntax; they are not necessarily typical.

572	4.2.1.  Basic Variants

574	   Variant code points are specified using one of more "var" elements as
575	   children of a "char" element.  The target mapping is specified using
576	   the "cp" attribute.  Other, optional attributes for the "var" element
577	   are described below.

579	   For example, to map LATIN SMALL LETTER V (U+0076) as a variant of
580	   LATIN SMALL LETTER U (U+0075):

582	       <char cp="0075">
583	           <var cp="0076"/>
584	       </char>

586	   A sequence of multiple code points can be specified as a variant of a
587	   single code point.  For example, the sequence of LATIN SMALL LETTER O
588	   (U+006F) then LATIN SMALL LETTER E (U+0065) might hypothetically be
589	   specified as a variant for an LATIN SMALL LETTER O WITH DIAERESIS
590	   (U+00F6) as follows:

592	       <char cp="00F6">
593	           <var cp="006F 0065"/>
594	       </char>

596	   The source and target of a variant mapping may both be sequences, but
597	   not ranges.

599	   The "var" element specifies variant mappings in only one direction,
600	   even though the variant relation is usually considered symmetric,
601	   that is, if A is a variant of B then B should also be a variant of A.
602	   The format requires that the inverse of the variant be given
603	   explicitly to fully specify symmetric variant relations in the LGR.
604	   This has the beneficial side effect of making the symmetry explicit:

606	       <char cp="006F 0065">
607	           <var cp="00F6"/>
608	       </char>

610	   Variant relations are normally not only symmetric, but also
611	   transitive.  If A is a variant of B and B is a variant of C, then A
612	   is also a variant of C.  As with symmetry, these transitive relations
613	   are spelled out explicitly in the LGR.

615	   All variant mappings are unique.  For a given "char" element all
616	   "var" elements MUST have a unique combination of "cp", "when" and
617	   "not-when" attributes.  It is RECOMMENDED to list the "var" elements
618	   in ascending order of their target code point sequence.  (For "when"
619	   and "not-when" attributes, see Section 4.2.5).

621	4.2.2.  The type attribute

623	   Variants may be tagged with an optional "type" attribute.  The value
624	   of the "type" attribute may be any non-empty value not starting with
625	   an underscore and not containing spaces.  This value is used to
626	   resolve the disposition of any variant labels created using a given
627	   variant.  (See Section 6.2.)

629	   By default, the values of the "type" attribute directly describe the
630	   target policy status (disposition) for a variant label that was
631	   generated using a particular variant, with any variant label being
632	   assigned a disposition corresponding to the most restrictive variant
633	   type.  Several conventional disposition values are predefined below
634	   in Section 6.  Whenever these values can represent the desired
635	   policy, they SHOULD be used.

637	       <char cp="767C">
638	           <var cp="53D1" type="allocate"/>
639	           <var cp="5F42" type="block"/>
640	           <var cp="9AEA" type="block"/>
641	           <var cp="9AEE" type="block"/>
642	       </char>

644	   By default, if a variant label contains any instance of one of the
645	   variants of type "blocked" the label would be blocked, but if it
646	   contained only instances of variants to be allocated it could be
647	   allocated.  See the discussion about implied actions in Section 6.6.

649	   The XML format for the LGR makes the relation between the values of
650	   the "type" attribute on variants and the resulting disposition of
651	   variant labels fully explicit.  See the discussion in Section 6.2.
652	   Making this relation explicit allows a generalization of the "type"
653	   attribute from directly reflecting dispositions to a more
654	   differentiated intermediate value that used in the resolution of
655	   label disposition.  Instead of the default action of applying the
656	   most restrictive disposition to the entire label, such a generalized
657	   resolution can be used to achieve additional goals, such as limiting
658	   the set of allocated variant labels, or to implement other policies
659	   found in existing LGRs (see for example Appendix B).

661	   Because variant mappings MUST be unique, it is not possible to define
662	   the same variant for the same "char" element with different type
663	   attributes (see however Section 4.2.5).

665	4.2.3.  Null Variants

667	   A null variant is a variant string that maps to no code point.  This
668	   is used when a particular code point sequence is considered
669	   discretionary in the context of a whole label.  To specify a null
670	   variant, use an empty cp attribute.  For example, to mark a string
671	   with a ZERO WIDTH NON-JOINER (U+200C) to the same string without the
672	   ZERO WIDTH NON-JOINER:

674	       <char cp="200C">
675	           <var cp=""/>
676	       </char>

678	   This is useful in expressing the intent that some code points in a
679	   label are to be mapped away when generating a canonical variant of
680	   the label.  However, in tables that are designed to have symmetric
681	   variant mappings, this could lead to combinatorial explosion, if not
682	   handled carefully.

684	   The symmetric form of a null variant is expressed as follows:

686	       <char cp="">
687	           <var cp="200C" type="invalid" />
688	       </char>

690	   A "char" element with an empty "cp" attribute MUST specify at least
691	   one variant mapping.  It is strongly RECOMMENDED to use a type of
692	   "invalid" or equivalent when defining variant mappings from null
693	   sequences, so that variant mapping from null sequences are removed in
694	   variant label generation (see Section 4.2.2).

696	4.2.4.  Variants with Reflexive Mapping

698	   At first sight there seems to be no call for adding variant mappings
699	   for which source and target code points are the same, that is for
700	   which the mapping is reflexive, or, in other words, an identity
701	   mapping.  Yet such reflexive mappings occur frequently in LGRs that
702	   follow [RFC3743].

704	   Adding a "var" element allows both a type and a reference id to be
705	   specified for it.  While the reference id is not used in processing,
706	   the type of the variant can be used to trigger actions.  In permuting
707	   the label to generate all possible variants, the type associated with
708	   a reflexive variant mapping is applied to any of the permuted labels
709	   containing the original code point.

711	   In the following example, the code point U+3473 exists both as a
712	   variant of U+3447 and as a variant of itself (reflexive mapping).

714	   Assuming an original label of "U+3473 U+3447", the permuted variant
715	   "U+3473 U+3473" would consist of the reflexive variant of U+3473
716	   followed by a variant of U+3447.  Accordingly, the types for both of
717	   the variant mappings used to generate that particular permutation
718	   would have the value "preferred" given the following definitions of
719	   variant mappings:

721	        <char cp="3447" ref="0">
722	         <var cp="3473" type="preferred" ref="1 3" />
723	       </char>
724	       <char cp="3473" ref="0">
725	         <var cp="3447" type="block" ref="1 3" />
726	         <var cp="3473" type="preferred" ref="0" />
727	       </char>

729	   Having established the variant types in this way, a set of actions
730	   could be defined that return a disposition of "allocate" or
731	   "activate" for a label consisting exclusively of variants with type
732	   "preferred" for example.  (For details on how to define actions based
733	   on variant types see Section 6.2.1.)

735	   In general, using reflexive variant mappings in this manner makes it
736	   possible to calculate disposition values using a uniform approach for
737	   all labels, whether they consist of mapped variant code points,
738	   original code points, or a mixture of both.  In particular, the
739	   dispositions for two otherwise identical labels may differ based on
740	   which variant mappings were executed in order to generate each of
741	   them.  (For details on how to generate variants and evaluate
742	   dispositions, see Section 7.)

744	   Another useful convention that uses reflexive variants is described
745	   below in Section 6.2.1.

747	4.2.5.  Conditional Variants

749	   Fundamentally, variants are mappings between two sequences of code
750	   points.  However, in some instances for a variant relationship to
751	   exist, some context external to the code point sequence must be
752	   considered.  For example, a positional context may determine whether
753	   two code point sequences are variants of each other.

755	   An example of that are Arabic code points which can have different
756	   forms based on position, with some code points sharing forms, thus
757	   making them variants in the positions corresponding to those forms.
758	   Such positional context cannot be solely derived from the code point
759	   by itself, as the code point would be the same for the various forms.

761	   To specify a conditional variant relationship the optional "when"
762	   attribute is used.  The variant relationship exists when the
763	   condition in the "when" attribute is satisfied.  A "not-when"
764	   attribute may be used for conditions that must not be satisfied.  The
765	   value of each "when" or "not-when" attributes is a parameterized
766	   context rule as described below in Section 5.4.

768	   As described in Section 4.1 a "when" or "not-when" attribute may also
769	   be specified to any "char" element in the data section to define
770	   required or prohibited contextual conditions under which a code point
771	   is valid.

773	   Assuming the "rules" element contains suitably defined rules for
774	   "arabic-isolated" and "arabic-final", the following example shows how
775	   to mark ARABIC LETTER ALEF WITH WAVY HAMZA BELOW (U+0673) as a
776	   variant of ARABIC LETTER ALEF WITH HAMZA BELOW (U+0625), but only
777	   when it appears in its isolated or final forms:

779	       <char cp="0625">
780	           <var cp="0673" when="arabic-isolated"/>
781	           <var cp="0673" when="arabic-final"/>
782	       </char>

784	   Only a single "when" or "not-when" attribute can be applied to any
785	   "var" element, however, multiple "var" elements using the same
786	   mapping, but different "when" or "not-when" attributes may be
787	   specified.  In such a case care must be taken to ensure that for each
788	   context at most one of the context rules for the "when" or "not-when"
789	   attributes is satisfied; otherwise the results are undefined.

791	   Two contexts may be complementary, as in the following example, which
792	   shows ARABIC LETTER TEH MARBUTA (U+0629) as a variant of ARABIC
793	   LETTER ALEF MAKSURA (U+0649), but with two different types.

795	       <char cp="0647" >
796	         <var cp="0629" not-when="arabic-final" type="blocked" />
797	         <var cp="0629" when="arabic-final" type="allocatable" />
798	       </char>

800	   The intent is that in final position a label that uses U+0629 instead
801	   of U+0647 should be considered essentially the same label and
802	   therefore allocatable to the same entity, while the same substitution
803	   in non-final context leads to labels that are different, but
804	   considered confusable so that either one, but not both should be
805	   delegatable.

807	   For symmetry, the reverse mappings must exist, and must agree in
808	   their "when" or "not-when" attributes.  However, symmetry does not
809	   apply to the other attributes.  For example, these are the actual
810	   reverse mappings for the above:

812	       <char cp="0629" >
813	         <var cp="0647" not-when="arabic-final" type="allocatable" />
814	         <var cp="0647" when="arabic-final" type="allocatable" />
815	       </char>

817	   Here, both variants have the same "type" attribute.  While it is
818	   tempting to recognize that in this instance the "when" and "not-when"
819	   attributes are complementary and therefore between them cover every
820	   single possible context, it is STRONGLY RECOMMENDED to use the format
821	   shown in the example that makes the symmetry easily verifiable by
822	   parsers and tools.  (The same applies to entries created for
823	   transitivity.)

825	   Arabic is an example of a script for which such conditional variants
826	   have been established in at least some existing LGRs.  The mechanism
827	   defined here supports other forms of conditional variants that may
828	   required by other scripts.

830	4.3.  Annotations

832	   Two attributes, the "ref" and "comment" attributes, can be used to
833	   annotate individual elements in the LGR.  They are ignored in
834	   machine-processing or the LGR.  The "ref" attribute is intended for
835	   formal annotations, and the "comment" attribute for free form
836	   annotation.  The latter can be applied more widely.

838	4.3.1.  The ref Attribute

840	   Reference information may optionally be specified by a "ref"
841	   attribute, consisting of a space delimited sequence of reference
842	   identifiers.

844	       <char cp="522A" ref="0">
845	           <var cp="5220" ref="2 3"/>
846	           <var cp="5220" ref="2 3"/>
847	       </char>

849	   This facility is typically used to give source information for code
850	   points or variant relations.  This information is ignored when
851	   machine-processing an LGR.  If applied to a range the "ref" attribute
852	   applies to every code point in the range.  All reference identifiers
853	   MUST be from the set declared in the "references" element (see
854	   Section 3.3.8).  It is an error to repeat a reference identifier in
855	   the same "ref" attribute.  It is RECOMMENDED that identifiers be
856	   listed in ascending order.

858	   In addition to "char", "range" and "var" elements in the data
859	   section, a "ref" attribute may be present for these elements that
860	   appear in the rules section described below: actions, literals
861	   ("char" inside a rule), as well as for definitions of rules and
862	   classes, but not for named references using the "by-ref" attribute
863	   defined below.  For these elements, the use of the "by-ref" and "ref"
864	   attributes are mutually exclusive.  None of the elements in the
865	   metadata take a "ref" attribute; instead use the description element
866	   there.

868	4.3.2.  The comment Attribute

870	   Any "char", "range" or "variant" element in the data section may
871	   contain an optional "comment" attribute.  The contents of a "comment"
872	   attribute are free-form plain text.  Comments are ignored in machine
873	   processing of the table.  Comment attributes may also be placed on
874	   all elements in the "rules" section of the document, such as actions
875	   and match operators, such as literals ("char"), as well as
876	   definitions of classes and rules, but not on child elements of the
877	   "class" element.  Finally, in the metadata, only the "version" and
878	   "reference" elements may have "comment" attributes (to match the
879	   syntax in [RFC3743]).

881	4.4.  Code Point Tagging

883	   Typically, LGRs are used to explicitly designate allowable code
884	   points, where any label that contains a code point not explicitly
885	   listed in the LGR is considered an ineligible label according to the
886	   ruleset.

888	   For more complex registry rules, there may be a need to discern one
889	   or more subsets of code points.  This can be accomplished by applying
890	   an optional "tag" attribute to "char" or "range" elements that are
891	   child elements of the "data" element.  By collecting code points that
892	   share the same tag value, character classes may be defined (see
893	   Section 5.2.2) which can then be used in whole label evaluation rules
894	   (see Section 5.3.2).

896	   Each "tag" attribute may contain multiple values separated by white
897	   space.  A tag value is an identifier, which may also include certain
898	   punctuation marks, such as colon.  Formally, it MUST correspond to
899	   the XML 1.0 Nmtoken (Name token) production.  It is an error to
900	   duplicate a value within the same "tag" attribute.  A "tag" attribute
901	   for a "range" element applies to all code points in the range.
902	   Because code point sequences are not proper members of a set of code
903	   points, a "tag" attribute MUST NOT be present in a "char" element
904	   defining a code point sequence.

906	5.  Whole Label and Context Evaluation

908	5.1.  Basic Concepts

910	   The code points in a label sometimes need to satisfy context-based
911	   rules, for example for the label to be considered valid, or to
912	   satisfy the context for a variant mapping (see the description of the
913	   "when" attribute in Section 5.4).

915	   A Whole Label Evaluation rule (WLE) is applied to the whole label.
916	   It is used to validate both original labels and variant labels
917	   computed from them using a permutation over all applicable variant
918	   mappings.  A conditional context rules is a specialized form of WLE
919	   specific to the context around a single code point or code point
920	   sequence.  For example, if a rule is referenced in the "when"
921	   attribute of a variant mapping it is used to describe the conditional
922	   context under which the particular variant mapping is defined to
923	   exist.

925	   Each rule is defined in a "rule" element.  A rule may contain the
926	   following as child elements:

928	   o  literal code points or code point sequences

930	   o  character classes, which define sets of code points to be used for
931	      context comparisons

933	   o  context operators, which define when character classes and
934	      literals may appear

936	   o  nested rules, whether defined in place or invoked by reference

938	   Collectively, these are called match operators and are listed in
939	   Section 5.3.2.

941	5.2.  Character Classes

943	   Character classes are sets of characters that often share a
944	   particular property.  While they function like sets in every way,
945	   even supporting the usual set operators, they are called character
946	   classes here in a nod to the use of that term in regular expression
947	   syntax.  (This also avoids confusion with the term "character set" in
948	   the sense of character encoding.)

950	   Character classes (or sets) can be specified in several ways:

952	   o  by defining the set via matching a tag in the code point data.
953	      All characters with the same "tag" attribute are part of the same
954	      class;

956	   o  by referencing one of the Unicode character properties defined in
957	      the Unicode Character Database [UAX42];

959	   o  by explicitly listing all the code points in the class; or

961	   o  by defining the class as a set combination of any number of other
962	      classes.

964	5.2.1.  Declaring and Invoking Named Classes

966	   A character class has an optional "name" attribute, consisting of a
967	   single, identifier not containing spaces.  All names for classes must
968	   be unique.  If the "name" attribute is omitted, the class is
969	   anonymous and exists only inside the rule or combined class where it
970	   is defined.  A named character class is defined independently and can
971	   be referenced by name from within any rules or as part of other
972	   character class definitions.

974	       <class name="example" comment="an example class definition">
975	           <char cp="0061" />
976	           <char cp="4E00" />
977	       </class>
978	       ...
979	       <rule>
980	           <class by-ref="example" />
981	       </rule>

983	   An empty "class" element with a "by-ref" attribute is a reference to
984	   an existing named class.  The "by-ref" attribute cannot be used in
985	   the same "class" element with any of these attributes: "name", "from-
986	   tag", "property" or "ref".  The "name" attribute MUST be present, if
987	   and only if the class is a direct child element of the "rules"
988	   element.  It is an error to reference a named class for which the
989	   definition has not been seen.

991	5.2.2.  Tag-based Classes

993	   The "char" or "range" elements that are child elements of the "data"
994	   element may contain a "tag" attribute that consists of one or more
995	   space separated tag values, for example:

997	       <char cp="0061" tag="letter lower"/>
998	       <char cp="4E00" tag="letter"/>

1000	   This defines two tags for use with code point U+0061, the tag
1001	   "letter" and the tag "lower".  Use

1003	       <class name="letter" from-tag="letter">
1004	       <class name="lower" from-tag="lower" />

1006	   to define two named character classes, "letter" and "lower",
1007	   containing all code points with the respective tags, the first with
1008	   0061 and 4E00 as elements and the latter with 0061, but not 4E00 as
1009	   an element.  The "name" attribute may be omitted for an anonymous in-
1010	   place definition of a nested, tag-based class.

1012	   Tag values are typically identifiers, with the addition of a few
1013	   punctuation symbols, such as colon.  Formally they MUST correspond to
1014	   the XML 1.0 Nmtoken (Name token) production.  While a "tag" attribute
1015	   may contain a list of tag values, the "from-tag" attribute always
1016	   contains a single tag value.

1018	   If the document contains no "char" or "range" elements with a
1019	   corresponding tag, the character class represents the empty set.
1020	   This is valid, to allow a common "rules" element to be shared across
1021	   files.  However, it is RECOMMENDED that implementations allow for a
1022	   warning to ensure that referring to an undefined tag in this way is
1023	   intentional.

1025	5.2.3.  Unicode Property-based Classes

1027	   A class is defined in terms of Unicode properties by giving the
1028	   Unicode property alias and the property value or property value
1029	   alias, separated by a colon.

1031	       <class name="virama" property="ccc:9" />

1033	   The example above selects all code points for which the Unicode
1034	   canonical combining class (ccc) value is 9.  This value of the ccc is
1035	   assigned to all code points that encode viramas.  The string "ccc" is
1036	   the short-alias for the canonical combining class, as defined in the
1037	   Unicode Character Database [UAX42].

1039	   Unicode properties may, in principle, change between versions of the
1040	   Unicode Standard.  However, the values assigned for a given version
1041	   are fixed.  If Unicode Properties are used, a Unicode version MUST be
1042	   declared in the "unicode-version" element in the header.  (Note: some
1043	   Unicode properties are by definition stable across versions and do
1044	   not change once assigned (see [Unicode-Stability].)

1046	   It is RECOMMENDED that all implementations processing LGR files
1047	   provide support for the following minimal set of Unicode properties:

1049	   o  General Category (gc)

1051	   o  Script (sc)

1053	   o  Canonical Combining Class (ccc)

1055	   o  Bidi Class (bc)

1057	   o  Arabic Joining Type (jt)

1059	   o  Indic Syllabic Category (InSC)

1061	   o  Deprecated (Dep)

1063	   The short name for each property is given in parentheses.

1065	   If a program that is using an LGR to determine the validity of a
1066	   label encounters a property that it does not support, it MUST abort
1067	   with an error.

1069	5.2.4.  Explicitly Declared Classes

1071	   A class of code points may also be declared by listing the code
1072	   points that are a member of the class.  This is useful when tagging
1073	   cannot be used because code points are not listed individually as
1074	   part of the eligible set of code points for the given LGR, for
1075	   example because they only occur in code point sequences.

1077	   To define a class in terms of an explicit list of code points use a
1078	   space separated list of hexadecimal code point values:

1080	        <class name="abcd">0061 0062 0063 0064</class>

1082	   This defines a class named "abcd" containing the code points for
1083	   characters "a", "b", "c" and "d".  The ordering of the code points is
1084	   not material, but it is RECOMMENDED to list them in ascending order.

1086	   Code point ranges are represented by a start and end value separated
1087	   by a hyphen.  The following declaration is equivalent to the
1088	   preceding:

1090	       <class name="abcd">0061-0064</class>

1092	   Range and code point declarations can be freely intermixed:

1094	       <class name="abcd">0061 0062-0063 0064</class>

1096	5.2.5.  Combined Classes

1098	   Classes may be combined using operators for set complement, union,
1099	   intersection, difference and symmetric difference (exclusive-or).
1100	   Because classes fundamentally function like sets, the union of
1101	   several character classes is itself a class, for example.

1103	   +-------------------+----------------------------------------------+
1104	   | Logical Operation | Example                                      |
1105	   +-------------------+----------------------------------------------+
1106	   | Complement        | <complement><class by-ref="xxx"></complement>|
1107	   +-------------------+----------------------------------------------+
1108	   | Union             | <union>                                      |
1109	   |                   |    <class by-ref="class-1"/>                 |
1110	   |                   |    <class by-ref="class-2"/>                 |
1111	   |                   |    <class by-ref="class-3"/>                 |
1112	   |                   | </union>                                     |
1113	   +-------------------+----------------------------------------------+
1114	   | Intersection      | <intersection>                               |
1115	   |                   |    <class by-ref="class-1"/>                 |
1116	   |                   |    <class by-ref="class-2"/>                 |
1117	   |                   | </intersection>                              |
1118	   +-------------------+----------------------------------------------+
1119	   | Difference        | <difference>                                 |
1120	   |                   |    <class by-ref="class-1"/>                 |
1121	   |                   |    <class by-ref="class-2"/>                 |
1122	   |                   | </difference>                                |
1123	   +-------------------+----------------------------------------------+
1124	   | Symmetric         | <symmetric-difference>                       |
1125	   | Difference        |    <class by-ref="class-1"/>                 |
1126	   |                   |    <class by-ref="class-2"/>                 |
1127	   |                   | </symmetric-difference>                      |
1128	   +-------------------+----------------------------------------------+

1130	                               Set Operators

1132	   The elements from this table may be arbitrarily nested inside each
1133	   other, subject to the following restriction: a "complement" element
1134	   MUST contain precisely one "class" or one of the operator elements,
1135	   while an "intersection", "symmetric-difference" or "difference"
1136	   element MUST contain precisely two, and a "union" element MUST
1137	   contain two or more of these elements.

1139	   An anonymous combined class can be defined directly inside a rule or
1140	   of the match operator elements that allow child elements (see
1141	   Section 5.3.2) by using the set combination as the outer element.

1143	       <rule>
1144	           <union>
1145	               <class by-ref="xxx"/>
1146	               <class by-ref="yyy"/>
1147	           </union>
1148	       </rule>

1150	   The example shows the definition of an anonymous combined class that
1151	   represents the union of classes "xxx" and "yyy".  There is no need to
1152	   wrap this union inside another "class" element, and, in fact, set
1153	   combination elements MUST NOT be nested inside a "class" element.

1155	   Lastly, to create a named combined class that can be referenced in
1156	   other classes or in rules as <class by-ref="xxxyyy"/>, add a "name"
1157	   attribute to the set combination element, for example <union
1158	   name="xxxyyy" /> and place it at the top level immediately below the
1159	   "rules" element (see Section 5.2.1.

1161	    <rules>
1162	       <union class name="xxxyyy">
1163	           <class by-ref="xxx"/>
1164	           <class by-ref="yyy"/>
1165	       </union>
1166	         . . .
1167	     </ rules>

1169	   Because (as for ordinary sets) a combination of classes is itself a
1170	   class, no matter by what combinations of set operators a combined
1171	   class is created, a reference to it always uses the "class" element
1172	   as described in Section 5.2.1.  That is, a named class is always
1173	   referenced via an empty "class" element using the "by-ref" attribute
1174	   containing the name of the class to be referenced.

1176	5.3.  Whole Label and Context Rules

1178	   Each rule is comprised of a series of matching operators that must be
1179	   satisfied in order to determine whether a label meets a given
1180	   condition.  Rules may reference other rules or character classes
1181	   defined elsewhere in the table.

1183	5.3.1.  The rule Element

1185	   A matching rule is defined by a "rule" element, the child elements of
1186	   which are one of the match operators from Section 5.3.2.  In
1187	   evaluating a rule, each child element is matched in order.  Rule
1188	   elements may be nested.

1190	   Rules may optionally be named using a "name" attribute containing a
1191	   single identifier string with no spaces.  A named rule may be
1192	   incorporated into another rule by reference.  If the "name" attribute
1193	   is omitted, the rule is anonymous and may not be incorporated by
1194	   reference into another rule or referenced by an action or "when"
1195	   attribute.

1197	   A simple rule to match a label where all characters are members of
1198	   the class "preferred":

1200	       <rule name="preferred">
1201	           <start />
1202	           <class by-ref="preferred" count="1+"/>
1203	           <end />
1204	       </rule>

1206	   Rules are paired with explicit and implied actions, triggering these
1207	   actions when a rule matches a label.  For example, a simple explicit
1208	   action for the rule shown above would be:

1210	       <action disp="allocate" match="preferred" />

1212	   This has the effect of setting the policy disposition for a label
1213	   made up entirely of "preferred" code points to "allocate".  Explicit
1214	   actions are further discussed in Section 6 and the use of rules in
1215	   conditional contexts for implied actions is discussed in
1216	   Section 4.2.5 and Section 6.5.

1218	5.3.2.  The Match Operators

1220	   The child elements of a rule are a series of match operators, which
1221	   are listed here by type and name and with a basic example or two.

1223	   +------------+-------------+------------------------------------+
1224	   | Type       | Operator    | Examples                           |
1225	   +------------+-------------+------------------------------------+
1226	   | logical    | any         | <any />                            |
1227	   |            +-------------+------------------------------------+
1228	   |            | choice      | <choice>                           |
1229	   |            |             |  <rule by-ref="alternative1"/>     |
1230	   |            |             |  <rule by-ref="alternative2"/>     |
1231	   |            |             | </choice>                          |
1232	   +--------------------------+------------------------------------+
1233	   | positional | start       | <start />                          |
1234	   |            +-------------+------------------------------------+
1235	   |            | end         | <end />                            |
1236	   +--------------------------+------------------------------------+
1237	   | literal    | char        | <char cp="0061 0062 0063" />       |
1238	   +--------------------------+------------------------------------+
1239	   | set        | class       | <class by-ref="class1" />          |
1240	   |            |             | <class>0061 0064-0065</class>      |
1241	   +--------------------------+------------------------------------+
1242	   | group      | rule        | <rule by-ref="rule1" />            |
1243	   |            |             | <rule><any /><rule />              |
1244	   +--------------------------+------------------------------------+
1245	   | contextual | anchor      | <anchor />                         |
1246	   |            +-------------+------------------------------------+
1247	   |            | look-ahead  | <look-ahead><any /></look-ahead>   |
1248	   |            +-------------+------------------------------------+
1249	   |            | look-behind | <look-behind><any /></look-behind> |
1250	   +--------------------------+------------------------------------+

1252	                              Match Operators

1254	   Any element defining an anonymous class can be used as a match
1255	   operator, including any of the set combination operators (see
1256	   Section 5.2.5) as well as references to named classes.

1258	   All match operators shown as empty elements in the Examples column of
1259	   the table above do not support child elements of their own; otherwise
1260	   match operators may be nested.  In particular, anonymous "rule"
1261	   elements can be used for grouping.

1263	5.3.3.  The count Attribute

1265	   The optional "count" attribute specifies the minimally required or
1266	   maximal permitted number of times a match operator is used to match
1267	   input.  If the "count" attribute is

1269	   n    the match operator matches the input exactly n times, where n is
1270	        1 or greater.

1272	   n+   the match operator matches the input at least n times, where n
1273	        is 0 or greater.

1275	   n:m  the match operator matches the input at least n times where n is
1276	        0 or greater, but matches the input up to m times in total,
1277	        where m > n.  If m = n and n > 0, the match operator matches the
1278	        input exactly n times.

1280	   If there is no "count" attribute, the match operator matches the
1281	   input exactly once.

1283	   In matching, greedy evaluation is used in the sense defined for
1284	   regular expressions: beyond the required number or times, the input
1285	   is matched as many times as possible, but not so often as to prevent
1286	   a match of the remainder of the rule.

1288	   The "count" attribute MUST NOT be applied to match operators of type
1289	   "start", "end", "anchor", "look-ahead" and "look-behind" or to any
1290	   operators, such as "rule" or "choice" that contain them, whether the
1291	   latter are declared in place or used by reference.  The "count"
1292	   attribute may be applied to "class" and "rule" elements only if they
1293	   do not have a "name" attribute, that is, to anonymous rules and
1294	   classes or any invocation of predefined rules or classes by
1295	   reference.

1297	   The optional "count" attribute MAY be applied to match operators of
1298	   type "any", "char" and "class", as well as to match operators
1299	   "choice" and "rule", as long as they contain none of the operators
1300	   "start", "end", "anchor", "look-ahead" and "look-behind" as direct or
1301	   indirect child elements.  The same requirement applies recursively to
1302	   any "rule" element referenced inside a "choice" or "rule" with a
1303	   "count" attribute.  The "count" attribute cannot appear in the same
1304	   element as a "name" attribute.

1306	5.3.4.  The name and by-ref Attributes

1308	   Like classes (see Section 5.2.1), rules declared as immediate child
1309	   elements of the "rules" element MUST be named using a unique "name"
1310	   attribute, and all other instances MUST NOT be named.  Anonymous
1311	   rules and classes or reference to named rules and classes can be
1312	   nested inside other match operators by reference.

1314	   To reference a named rule or class inside a rule or match operator
1315	   use a rule or "class" element with an optional "by-ref" attribute
1316	   containing the name of the referenced element.  It is an error to
1317	   reference a rule or class for which the definition has not been seen.
1318	   The "by-ref" attribute cannot appear in the same element as the
1319	   "name" attribute, or in an element that has any child elements.

1321	   Here's an example of a rule requiring that all labels be letters
1322	   (optionally followed by combining marks) and possibly digits.  The
1323	   example shows rules and classes referenced by name.

1325	       <class name="letter" property="gc:L"/>
1326	       <class name="combining-mark" property="gc:M"/>
1327	       <class name="digit" property="gc:Nd">
1328	       <rule name="letter-grapheme">
1329	          <class by-ref="letter" count="1+"/>
1330	          <class by-ref="combining-mark" count="0+"/>
1331	       </rule>

1333	5.3.5.  The choice Element

1335	   The "choice" element is used to represent a list of two or more
1336	   alternatives:

1338	       <rule name="ldh">
1339	          <choice count="1+">
1340	              <class by-ref="letters"/>
1341	              <class by-ref="digits"/>
1342	              <char cp="002D" comment="literal HYPHEN"/>
1343	          </choice>
1344	       </rule>

1346	   Each child element of a "choice" represents one alternative.  The
1347	   first matching alternative determines the match for the "choice"
1348	   element.  To express a choice where an alternative itself consists of
1349	   a sequence of elements, the sequence must be wrapped in an anonymous
1350	   rule.

1352	5.3.6.  Literal Code Point Sequences

1354	   A literal code point sequence matches a single code point or a
1355	   sequence.  It is defined by a "char" element, with the code point or
1356	   sequence to be matched given by the "cp" attribute.  When used as a
1357	   literal, a "char" element may contain a "count" in addition to the
1358	   "cp" attribute and optional "comment" or "ref" attributes.  No other
1359	   attributes or child elements are permitted.

1361	5.3.7.  The any Element

1363	   The "any" element matches any single code point.  It may have a
1364	   "count" attribute.  For an example see Section 5.3.9

1366	   Unlike a literal, the "any" element" may not have a "ref" attribute.

1368	5.3.8.  The start and end Elements

1370	   To match the beginning or end of a label, use the "start" or "end"
1371	   element.  An empty label would match this rule:

1373	       <rule name="empty-label">
1374	           <start/>
1375	           <end/>
1376	       </rule>

1378	   Conceptually, Whole Label Evaluation Rules evaluate the label as a
1379	   whole, but in practice, many rules do not actually need to be
1380	   specified to match the entire label.  For example, to express a
1381	   requirement of not starting a label with a digit, a rule needs to
1382	   describe only the initial part of a label.

1384	   This example uses the previously defined rules, together with start
1385	   and end tag, to define a rule that requires that an entire label is
1386	   well-formed.  For this example that means, that it must start with a
1387	   letter and contains no leading digits or combining marks, nor
1388	   combining marks placed on digits.

1390	        <rule name="leading-letter" >
1391	          <start />
1392	          <rule by-ref="letter-grapheme" count="1"/>
1393	          <choice count="0+">
1394	            <rule by-ref="letter-grapheme" count="0+"/>
1395	            <class by-ref="digit" count="0+"/>
1396	          </choice>
1397	          <end />
1398	        </rule>

1400	   Each "start" or "end" element occurs at most once in a rule, except
1401	   if nested inside a "choice" element in such a way that in matching
1402	   each alternative at most one occurrence of each is encountered.
1403	   Otherwise, the result is an error; as is any case where a "start" or
1404	   "end" element is not encountered as first or last element to be
1405	   matched, respectively, in matching a rule.  Start and end elements do
1406	   not have a "count" or any other attribute.  It is an error for any
1407	   match operator enclosing a nested "start" or "end" element to have a
1408	   "count" attribute.

1410	5.3.9.  Example rule from IDNA2008

1412	   This is an example of the whole label evaluation rule from [RFC5892]
1413	   forbidding the mixture of the Arabic-Indic and extended Arabic-Indic
1414	   digits in the same label.  The example also demonstrates several
1415	   instances of the use of anonymous rules for grouping.

1417	       <data>
1418	          <range first-cp="0660" last-cp="0669" not-when="mixed-digits"
1419	                 tag="arabic-indic-digits" />
1420	          <range first-cp="06F0" last-cp="06F9" not-when="mixed-digits"
1421	                 tag="extended-arabic-indic-digits" />
1422	       </data>
1423	       <rules>
1424	          <rule name="mixed-digits">
1425	             <choice>
1426	               <rule>
1427	                   <class from-tag="arabic-indic-digits"/>
1428	                   <any count="0+"/>
1429	                   <class from-tag="extended-arabic-indic-digits"/>
1430	                </rule>
1431	                <rule>
1432	                   <class from-tag="extended-arabic-indic-digits"/>
1433	                   <any count="0+"/>
1434	                   <class from-tag="arabic-indic-digits"/>
1435	                </rule>
1436	             </choice>
1437	          </rule>
1438	       </rules>

1440	   The effect of this example is that a label containing a code point
1441	   from either of the two digit ranges is invalid for any label matching
1442	   the "mixed-digits" rule, that is, anytime a code point from the other
1443	   range is also present.  Note that this is not the same as
1444	   invalidating the definition of the "range" elements.

1446	5.4.  Parameterized Context or When Rules

1448	   A special type of rule provides a context for evaluating the validity
1449	   of a code point or variant mapping.  This rule is invoked by the
1450	   "when" attribute described in Section 4.2.5.  An action implied by a
1451	   context rule always has a disposition of "invalid" whenever the rule
1452	   is not matched (see Section 6.5).  Conversely, a "not-when" attribute
1453	   results in a disposition of invalid whenever the rule is matched.

1455	5.4.1.  The anchor Element

1457	   Such parameterized context or "When Rules" may contain a special
1458	   place holder represented by an "anchor" element.  As each When Rule
1459	   is evaluated, the "anchor" element is replaced by a literal
1460	   corresponding to the "cp" attribute of the element containing the
1461	   "when" (or "not-when") attribute.  The match to the "anchor" element
1462	   must be at the same position in the label as the code point or
1463	   variant mapping triggering the When Rule.

1465	   For example, the Greek lower numeral sign is invalid if not
1466	   immediately preceding a character in the Greek script.  This is most
1467	   naturally addressed with a When Rule using look-ahead:

1469	       <char cp="0375" when="preceding-greek"/>
1470	       ...
1471	       <class name="greek-script" property="sc:Grek"/>
1472	       <rule name="preceding-greek">
1473	           <anchor/>
1474	           <look-ahead>
1475	               <class by-ref="greek-script"/>
1476	           </look-ahead>
1477	       </rule>

1479	   In evaluating this rule, the "anchor" element is treated as if it was
1480	   replaced by a literal

1482	       <char cp="0375"/>

1484	   but only the instance of U+0375 at the given position is evaluated.
1485	   If a label had two instances of U+0375 with the first one matching
1486	   the rule and the second not, then evaluating the When Rule MUST
1487	   succeed for the first and fail for the second instance.

1489	   Unlike other rules, When Rules containing an "anchor" element MUST
1490	   only be invoked via the "when" or "not-when" attributes on code
1491	   points or variants; otherwise their "anchor" elements cannot be
1492	   evaluated.  However, it is possible to invoke rules not containing an
1493	   "anchor" element from a "when" or "not-when" attribute.  (See
1494	   Section 5.4.3)

1496	5.4.2.  The look-behind and look-ahead Elements

1498	   Context rules use the "look-behind" and "look-ahead" elements to
1499	   define context before and after the code point sequence matched by
1500	   the "anchor" element.  If the "anchor" element is omitted, neither
1501	   the "look-behind" nor the "look-ahead" element may be present.

1503	   Here is an example of a rule that defines an "initial" context for an
1504	   Arabic code point:

1506	       <class name="transparent" property="jt:T"/>
1507	       <class name="right-joining" property="jt:R"/>
1508	       <class name="left-joining" property="jt:L"/>
1509	       <class name="dual-joining" property="jt:D"/>
1510	       <class name="non-joining" property="jt:U"/>
1511	       <rule name="Arabic-initial">
1512	         <look-behind>
1513	           <choice>
1514	             <start/>
1515	             <rule>
1516	               <class by-ref="transparent" count="0+"/>
1517	               <class by-ref="non-joining"/>
1518	             </rule>
1519	           </choice>
1520	         </look-behind>
1521	         <anchor/>
1522	         <look-ahead>
1523	           <class by-ref="transparent" count="0+" />
1524	           <choice>
1525	             <class by-ref="right-joining" />
1526	             <class by-ref="dual-joining" />
1527	           </choice>
1528	         </look-ahead>
1529	       </rule>

1531	   A "when rule" contains any combination of "look-behind", "anchor" and
1532	   "look-ahead" elements in that order.  Each of these elements occurs
1533	   at most once, except if nested inside a "choice" element in such a
1534	   way that in matching each alternative at most one occurrence of each
1535	   is encountered.  Otherwise, the result is undefined.  None of these
1536	   elements takes a "count" attribute, nor does any enclosing match
1537	   operator.  Otherwise, the result is undefined.  If a context rule
1538	   contains a "look-ahead" or "look-behind" element, it MUST contain an
1539	   "anchor" element.  If, because of a choice element, a required anchor
1540	   is not actually encountered, the results are undefined.

1542	5.4.3.  Omitting the anchor Element

1544	   If the "anchor" element is omitted, the evaluation of the context
1545	   rule is not tied to the position of the code point or sequence
1546	   associated with the "when" attribute.

1548	   According to [RFC5892] Katakana middle dot is invalid in any label
1549	   not containing at least one Japanese character anywhere in the label.
1550	   Because this requirement is independent of the position of the middle
1551	   dot, the rule does not require an "anchor" element.

1553	       <char cp="30FB" when="japanese-in-label"/>
1554	       <rule name="japanese-in-label">
1555	           <union>
1556	               <class property="sc:Hani"/>
1557	               <class property="sc:Kata"/>
1558	               <class property="sc:Hira"/>
1559	           </union>
1560	       </rule>

1562	   The Katakana middle dot is used only with Han, Katakana or Hiragana.
1563	   The corresponding When Rule requires that at least one code point in
1564	   the label is in one of these scripts, but the position of that code
1565	   point is independent of the location of the middle dot and no anchor
1566	   therefore required.  (Note that the Katakana middle dot itself is of
1567	   script Common).

1569	6.  The action Element

1571	   The purpose of a rule is to trigger a specific action.  Often, the
1572	   action simply results in blocking or invalidating a label that does
1573	   not match a rule.  An example of an action invalidating a label
1574	   because it does not match a rule named "leading-letter" is as
1575	   follows:

1577	      <action disp="invalid" not-match="leading-letter"/>

1579	   If an action is to be triggered on matching a rule, a "match"
1580	   attribute is used instead.  Actions are evaluated in the order that
1581	   they appear in the XML file, Once an action is triggered by a label,
1582	   the disposition defined in the "disp" attribute is assigned to the
1583	   label and no other actions are evaluated for that label.

1585	   The goal of the Label Generation Rules is to identify all labels and
1586	   variant labels and to assign them disposition values.  These
1587	   dispositions are then fed into a further process that ultimately
1588	   implements all aspects of policy.  To allow this specification to be
1589	   used with the widest range of policies, the permissible values for
1590	   the "disp" attribute are neither defined nor restricted.
1591	   Nevertheless a set of commonly used disposition values is
1592	   RECOMMENDED.  (See Section 6.3)

1594	6.1.  The match and not-match Attributes

1596	   A "match" or "not-match" attribute specify a rule that must be
1597	   matched or not matched as a condition for triggering an action.  Only
1598	   a single rule may be named as the value of a "match" or "not-match"
1599	   attribute.  Because rules may be composed of other rules, this
1600	   restriction to a single attribute value does not impose any
1601	   limitation on the contexts that can trigger an action.

1603	   An action may contain a "match" or a "not-match" attribute, but not
1604	   both.  An action without any attributes is triggered by all labels
1605	   unconditionally.  For a very simple LGR, the following action would
1606	   allocate all labels that match the repertoire:

1608	       <action disp="allocate" />

1610	   Since rules are evaluated for all labels, whether they are the
1611	   original label or computed by permuting the defined and valid variant
1612	   mappings for the label's code points, actions based on matching or
1613	   not matching a rule may be triggered for both original and variant
1614	   labels, but they the rules are not affected by the disposition
1615	   attributes of the variant mappings.  To trigger any actions base on
1616	   these dispositions requires the use additional optional attributes
1617	   for actions described next.

1619	6.2.  Actions with Variant Type Triggers

1621	6.2.1.  The all-, any- and only-variants Attributes

1623	   An action may contain one of the optional attributes "any-variant",
1624	   "all-variants", or "only-variants" defining triggers based on variant
1625	   types.  The permitted value for these attributes consists of one or
1626	   more variant type values, separated by spaces.  When a variant label
1627	   is generated, these variant type values are compared to the set of
1628	   type values on the variant mappings used to generate the particular
1629	   variant label (see Section 7).

1631	   Any single match may trigger an action that contains an "any-variant"
1632	   attribute, while for an "all-variants" or "only-variants" attribute,
1633	   the variant type for all variant code points must match one or
1634	   several of the type values specified in the attribute to trigger the
1635	   action.  There is no requirement that the entire liste of variant
1636	   type values be matched, as long as all variant code points match at
1637	   least one of the values.

1639	   An "only-variants" attribute will trigger the action only if all code
1640	   points of the variant label have variant mappings from the original
1641	   code points.  In other words, the label contains no original code
1642	   points other than those with a reflexive mapping (see Section 4.2.4).

1644	        <char cp="0078" comment="x" />
1645	           <var cp="0078" type="allocate" comment="reflexive" />
1646	           <var cp="0079" type="block" />
1647	       </char>
1648	       <char cp="0079"  comment="y"/>
1649	           <var cp="0078" type="allocate" />
1650	       </char>
1651	       . . .
1652	       <action disp="block" any-variants="block" />
1653	       <action disp="allocate" only-variants="allocate" />
1654	       <action disp="some-type" any-variants="allocate" />

1656	   In the example above, the label "xx" would have variant labels "xx",
1657	   "xy", "yx" and "yy".  The first action would result in blocking any
1658	   variant label containing "y", because the variant mapping from "x" to
1659	   "y" is of type "block", triggering the "any-variants" condition.
1660	   Because in this example "x" has a reflexive variant mapping to itself
1661	   of type "allocate" the original label "xx" has a reflexive variant
1662	   "xx" that would trigger the "only_variants" condition on the second
1663	   action.

1665	   A label "yy" would have the variants "xy", "yx" and "xx".  Because
1666	   the variant mapping from "y" to "x" is of type "allocate" and a
1667	   mapping from "y" to "y" is not defined, the labels "xy" and "yx"
1668	   trigger the "any-variants" condition on the third label.  The variant
1669	   "xx", being generated using the mapping from "y" to "x" of type
1670	   "allocate", would trigger the "only-variants" condition on the
1671	   section action.  As there is no reflexive variant "yy", the original
1672	   label "yy" cannot trigger any variant type triggers.  However, it
1673	   could still trigger an action defined as matching or not matching a
1674	   rule.

1676	   In each action, one variant type trigger may be present by itself or
1677	   in conjunction with an attribute matching or not-matching a rule.  If
1678	   variant triggers and rule-matching triggers are used together, the
1679	   label MUST "match" or respectively "not-match" the specified rule,
1680	   AND satisfy the conditions on the variant type values given by the
1681	   "any-variant", "all-variants", or "only-variants" attribute.

1683	   A useful convention combines the "any-variants" trigger with
1684	   reflexive variant mappings (Section 4.2.4).  This convention is used,
1685	   for example, when multiple LGRs are defined within the same registry
1686	   and for overlapping repertoire.  In some cases, the delegation of a
1687	   label from one LGR must prohibit the delegation of another label in
1688	   some other LGR.  This can be done using a variant of type "blocked"
1689	   as in this example from an Armenian LGR, where the Armenian, Latin
1690	   and Cyrillic letters all look identical:

1692	       <char cp="0570" comment="Armenian small letter HO">
1693	         <var cp="0068" type="blocked" comment="Latin small letter H" />
1694	         <var cp="04BB" type="blocked"
1695	              comment="Cyrillic small letter SSHA" />
1696	       </char>

1698	   The issue is that the target code points for these two variants are
1699	   both outside the Armenian repertoire.  By using a reflexive variant
1700	   with the following convention:

1702	       <char cp="0068" comment="not part of repertoire">
1703	         <var cp="0068" type="out-of-repertoire-var"
1704	              comment="reflexive mapping" />
1705	         <var cp="04BB" type="blocked"  />
1706	         <var cp="0570" type="blocked"  />
1707	       </char>
1708	         ...

1710	   and associating this with an action of the form:

1712	       <action disp="invalid" any-variants="out-of-repertoire-var" />

1714	   it is possible to list the symmetric and transitive variant mappings
1715	   in the LGR even where they involve out-of-repertoire code points.  By
1716	   associating the action shown with the special type for these
1717	   reflexive mappings any original labels containing one or more of the
1718	   out-of-repertoire code points are filtered out -- just as if these
1719	   code points had not been listed in the LGR in the first place.
1720	   Nevertheless, they do participate in the permutation of variant
1721	   labels for n-repertoire labels (Armenian in the example), and these
1722	   permuted variants can be used to detect collisions with out-of-
1723	   repertoire labels (see Section 7).

1725	6.2.2.  Example from RFC 3743 Tables

1727	   This section gives an example of using variant type triggers,
1728	   combined with variants with reflexive mappings (Section 4.2.4) to
1729	   achieve LGRs that implement tables like those defined according to
1730	   [RFC3743] where the goal is to allow as variants only labels that
1731	   consist entirely of simplified or traditional variants, in addition
1732	   to the original label.

1734	   Assuming an LGR where all variants have been given suitable "type"
1735	   attributes of "block", "simplified", "traditional", or "both",
1736	   similar to the ones discussed in Appendix B.  Given such an LGR, the
1737	   following example actions evaluate the disposition for the variant
1738	   label:

1740	       <action disp="block" any-variant="block" />
1741	       <action disp="allocate" only-variants="simplified both" />
1742	       <action disp="allocate" only-variants="traditional both" />
1743	       <action disp="block" all-variants="simplified traditional " />
1744	       <action disp="allocate" />

1746	   The first action matches any variant label for which at least one of
1747	   the code point variants is of type "block".  The second matches any
1748	   variant label for which all of the code point variants are of type
1749	   "simplified" or "both", in other words an all-simplified label.  The
1750	   third matches any label for which all variants are of type
1751	   "traditional" or "both", that is all traditional.  These two actions
1752	   are not triggered by any variant labels containing some original code
1753	   points, unless each of those code points has a variant defined with a
1754	   reflexive mapping (Section 4.2.4).

1756	   The final two actions rely on the fact that actions are evaluated in
1757	   sequence, and that the first action triggered also defines the final
1758	   disposition for a variant label (see Section 6.4).  They further rely
1759	   on the assumption that the only variants with type "both" are also
1760	   reflexive variants.

1762	   Given these assumptions, any remaining simplified or traditional
1763	   variants must then be part of a mixed label, and so are blocked; all
1764	   labels surviving to the last action are original code points only
1765	   (that is the original label).  The example assumes that an original
1766	   label may be a mixed label; if that is not the case, the disposition
1767	   for the last action would be set to "block".

1769	   There are exceptions where the assumption on reflexive mappings made
1770	   above does not hold, so this basic scheme needs some refinements to
1771	   cover all cases.  For a more complete example, see Appendix B.

1773	6.3.  Recommended Disposition Values

1775	   The precise nature of the policy action taken in response to a
1776	   disposition and the name of the corresponding "disp" attributes are
1777	   only partially defined here.  It is strongly RECOMMENDED to use the
1778	   following dispositions only with their conventional sense.

1780	   invalid  The resulting string is not a valid label.  This disposition
1781	        may be assigned implicitly, see Section 6.5.  No variant labels
1782	        should be generated from a variant mapping with this type.

1784	   block  The resulting string is a valid label, but should be block
1785	        from registration.  This would typically apply for a derived
1786	        variant that has is undesirable as having no practical use or
1787	        being confusingly similar to some other label.

1789	   allocate  The resulting string should be reserved for use by the same
1790	        operator of the origin string, but not automatically allocated
1791	        for use.

1793	   activate  The resulting string should be activated for use.  (This is
1794	        the typical default action if no dispositions are defined and is
1795	        known as a "preferred" variant in [RFC3743])

1797	6.4.  Precedence

1799	   Actions are applied in the order of their appearance in the file.
1800	   This defines their relative precedence.  The first action triggered
1801	   by a label defines the disposition for that label.  To define a
1802	   specific order of precedence, list the actions in the desired order.
1803	   The conventional order of precedence for the actions defined in
1804	   Section 6.3 is "invalid", "block", "allocate", then "activate".  This
1805	   default precedence is used for the default actions defined in
1806	   Section 6.6.

1808	6.5.  Implied Actions

1810	   The context rules on code points ("not-when" or "when" rules) carry
1811	   an implied action with a disposition of "invalid" (not eligible).
1812	   These rules are evaluated at the time the code points for a label or
1813	   its variant labels are checked for validity (see Section 7).  In
1814	   other words, they are evaluated before any of the whole-label
1815	   evaluation rules and with higher precedence.  The context rules for
1816	   variant mappings are evaluated when variants are generated and/or
1817	   when variant tables are made symmetric and transitive.  They have an
1818	   implied action with a disposition of "invalid" which means a putative
1819	   variant mapping does not exist whenever the given context matches a
1820	   "not-when" rule or fails to match a "when" rule specified for that
1821	   mapping.  The result of that disposition is that the variant mapping
1822	   is ignored in generating variant labels and the value is therefore
1823	   not accessible to trigger any explicit actions.

1825	   Note that such non-existing variant mapping is different from a
1826	   blocked variant, which is a variant code point mapping that exists
1827	   but results in a label that may not be allocated.

1829	6.6.  Default Actions

1831	   As described in Section 6 any variant mapping may be given a "type"
1832	   attribute.  An action containing an "any-variant" or "all-variants"
1833	   attribute relates these type values to a resulting disposition for
1834	   the entire variant label.

1836	   If no actions are defined for the standard disposition values of
1837	   "invalid", "block", "allocate" and "activate", then the following
1838	   default actions exist that are shown below in their default order of
1839	   precedence (see Section 6.4).  This default order for evaluating
1840	   dispositions applies only to labels that triggered no explicitly
1841	   defined actions and which are therefore handled by default actions.
1842	   Default actions have a lower order of precedence than explicit
1843	   actions (see Section 7.3).

1845	   The default actions for variant labels are defined as follows:

1847	       <action disp="invalid" any-variant="invalid"/>
1848	       <action disp="block" any-variant="block"/>
1849	       <action disp="allocate" any-variant="allocate"/>
1850	       <action disp="activate" all-variants="activate"/>

1852	   A final default action sets the disposition to "allocate" for any
1853	   label matching the repertoire for which no other action has been
1854	   triggered (catch-all).

1856	       <action disp="allocate" />

1858	7.  Processing a Label Against an LGR

1860	7.1.  Determining Eligibility for a Label

1862	   In order to test a specific label for membership in the LGR, a
1863	   consumer of the LGR must iterate through each code point within a
1864	   given label, and test that each code point is a member of the LGR.
1865	   If any code point is not a member of the LGR, the label shall be
1866	   deemed as invalid.

1868	   An individual code point is deemed a member of the LGR when it is
1869	   listed using a "char" element, or is part of a range defined with a
1870	   "range" element, and all necessary condition in any "when" or "not-
1871	   when" attributes are correctly satisfied.

1873	   Alternatively, a code point is also deemed a member of the LGR when
1874	   it forms part of a sequence that corresponds to a sequence listed
1875	   using a "char" element for which the "cp" attribute defines a
1876	   sequence, and all necessary condition in any "when" or "not-when"
1877	   attributes are correctly satisfied.

1879	   A label must also not trigger any action that results in a
1880	   disposition of "invalid", otherwise it is deemed not eligible.  (This
1881	   step may need to be deferred, until variant code point dispositions
1882	   have been determined).

1884	   For LGRs that contain reflexive variant mappings (defined in
1885	   Section 4.2.4), the final evaluation of eligibility for the label
1886	   must be deferred until variants are generated.  In essence, LGRs that
1887	   use this feature treat the original label as the (identity) variant
1888	   of itself.  For such LGRs, the ordinary iteration over code points
1889	   would generally only exclude a subset of invalid labels, but it could
1890	   be used effectively as a pre-screening.

1892	7.2.  Determining Variants for a Label

1894	   For a given eligible label, the set of variant labels is deemed to
1895	   consist of each possible permutation of original code points and
1896	   substituted code points or sequences defined in "var" elements,
1897	   whereby all "when" and "not-when" attributes are correctly satisfied
1898	   for each "char" or "var" element in the given permutation and all
1899	   applicable whole label evaluation rules are satisfied as follows:

1901	   1.  Create each possible permutation of a label, by substituting each
1902	       code point or code point sequence in turn by any defined variant
1903	       mapping (including any reflexive mappings)

1905	   2.  Apply variant mappings with "when" or "not-when" attributes only
1906	       if the conditions are satisfied; otherwise they are not defined

1908	   3.  Record each of the "type" values on the variant mappings used in
1909	       creating a given variant label in a disposition set; for any
1910	       unmapped code point record the "type" value of any reflexive
1911	       variant (see Section 4.2.4)

1913	   4.  Determine the disposition for each variant label per Section 7.3

1915	   5.  If the disposition is "invalid", remove the label from the set

1917	   6.  If final evaluation of the disposition for the unpermuted label
1918	       per Section 7.3 results in a disposition of "invalid", remove all
1919	       associated variant labels from the set.

1921	7.3.  Determining a Disposition for a Label or Variant Label

1923	   For a given label (variant or original), its disposition is
1924	   determined by evaluating in order of their appearance all actions for
1925	   which the label or variant label satisfies the conditions.

1927	   1.  For any label, the disposition is given by the value of the
1928	       "disp" attribute for the first action triggered by the label.  An
1929	       action is triggered, if any of the following is true:

1931	       *  the label matches or doesn't match the whole label evaluation
1932	          rule, given in the "match" or "not-match" attribute
1933	          respectively for that action;

1935	       *  any or all of the recorded variant types for a variant label
1936	          match the types specified in an "any-variant", "all-variants",
1937	          or "only-variants" attribute, for that action, and in case of
1938	          "only-variants", the label contains only code points that are
1939	          the target of applied variant mappings;

1941	       *  the label matches or doesn't match the whole label evaluation
1942	          rule, given in the "match" or "not-match" attribute
1943	          respectively for that action and any or all of the recorded
1944	          variant types for a variant label match the types specified in
1945	          an "any-variant", "all-variants", or "only-variants"
1946	          attribute, respectively, for that action, and in case of
1947	          "only-variants" the label contains only code points that are
1948	          the target of applied variant mappings; or

1950	       *  the action does not contain any "match", "not-match", "any-
1951	          variant", "all-variants", or "only-variants" attributes:
1952	          catch-all.

1954	   2.  For any remaining variant label, assign the variant label the
1955	       disposition using the default actions defined in Section 6.6.
1956	       For this step, variant types outside the predefined recommended
1957	       set (see Section 6.3) are ignored.

1959	   3.  For any remaining label, set the disposition to "allocate".

1961	8.  Conversion to and from Other Formats

1963	   Both [RFC3743] and [RFC4290] provide different grammars for IDN
1964	   tables.  These formats are unable to fully cater for the increased
1965	   requirements of contemporary IDN variant policies.

1967	   This specification is a superset of functionality provided by these
1968	   IDN table formats, thus any table expressed in those formats can be
1969	   expressed in this format.  Automated conversion can be conducted
1970	   between tables conformant with the grammar specified in each
1971	   document.

1973	   For notes on how to translate an RFC 3743-style table, see
1974	   Appendix B.

1976	9.  IANA Considerations

1978	9.1.  Media Type

1980	   IANA is asked to register the media type of "application/lgr+xml" to
1981	   enable transmission of a well-formed LGR in accordance with this
1982	   specification.  This media type SHOULD be used to signal to an LGR-
1983	   aware client that the content is designed to be interpreted as an
1984	   LGR.

1986	   [TODO: Add Media Type registration details per [RFC7303]]

1988	9.2.  URN Registration

1990	   This document uses a URN to describe the XML namespace in accordance
1991	   with [RFC3688].  IANA is asked to register the following URN for this
1992	   purpose.

1994	   URI: urn:ietf:params:xml:ns:lgr-1.0

1996	   Registrant Contact: See the Authors of this document.

1998	   XML: None.

2000	10.  Security Considerations

2002	   If a system that is querying an identifier list (such as a domain
2003	   zone) that uses the rules in this memo, and those rules are not
2004	   implemented correctly, and that system is relying on the rules being
2005	   applied, the system might fail if the rules are not applied in a
2006	   predictable fashion.  This could cause security problems for the
2007	   querying system.

2009	11.  References

2011	   [ASIA-TABLE]
2012	              DotAsia Organisation, ".ASIA ZH IDN Language Table".

2014	   [LGR-PROCEDURE]
2015	              Internet Corporation for Assigned Names and Numbers,
2016	              "Procedure to Develop and Maintain the Label Generation
2017	              Rules for the Root Zone in Respect of IDNA Labels".

2019	   [RFC3339]  Klyne, G., Ed. and C. Newman, "Date and Time on the
2020	              Internet: Timestamps", RFC 3339, July 2002.

2022	   [RFC3688]  Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
2023	              January 2004.

2025	   [RFC3743]  Konishi, K., Huang, K., Qian, H., and Y. Ko, "Joint
2026	              Engineering Team (JET) Guidelines for Internationalized
2027	              Domain Names (IDN) Registration and Administration for
2028	              Chinese, Japanese, and Korean", RFC 3743, April 2004.

2030	   [RFC4290]  Klensin, J., "Suggested Practices for Registration of
2031	              Internationalized Domain Names (IDN)", RFC 4290, December
2032	              2005.

2034	   [RFC5564]  El-Sherbiny, A., Farah, M., Oueichek, I., and A. Al-Zoman,
2035	              "Linguistic Guidelines for the Use of the Arabic Language
2036	              in Internet Domains", RFC 5564, February 2010.

2038	   [RFC5646]  Phillips, A. and M. Davis, "Tags for Identifying
2039	              Languages", BCP 47, RFC 5646, September 2009.

2041	   [RFC5892]  Faltstrom, P., "The Unicode Code Points and
2042	              Internationalized Domain Names for Applications (IDNA)",
2043	              RFC 5892, August 2010.

2045	   [RFC7303]  Thompson, H. and C. Lilley, "XML Media Types", RFC 7303,
2046	              July 2014.

2048	   [TDIL-HINDI]
2049	              Technology Development for Indian Languages (TDIL)
2050	              Programme, "Devanagari Script Behaviour for Hindi".

2052	   [UAX42]    Unicode Consortium, "Unicode Character Database in XML".

2054	   [Unicode-Stability]
2055	              Unicode Consortium, "Unicode Encoding Stability Policy,
2056	              Property Value Stability".

2058	   [Unicode-Versions]
2059	              Unicode Consortium, "Unicode Version Numbering".

2061	   [WLE-RULES]
2062	              Internet Corporation for Assigned Names and Numbers, "WLE
2063	              Rules".

2065	   [XML]      World Wide Web Consortium, "Extensible Markup Language
2066	              (XML) 1.0".

2068	Appendix A.  Example Tables

2070	   The following presents a minimal LGR table defining the lower case
2071	   LDH (letter-digit-hyphen) repertoire and containing no rules or
2072	   metadata elements.  Many simple LGR tables will look quite similar,
2073	   except that they would contain some metadata.

2075	   <?xml version="1.0" encoding="utf-8"?>
2076	   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
2077	   <data>
2078	       <char cp="002D" comment="HYPHEN (-)" />
2079	       <range first-cp="0030" last-cp="0039"
2080	         comment="DIGIT ZERO - DIGIT NINE" />
2081	       <range first-cp="0061" last-cp="007A"
2082	         comment="LATIN SMALL LETTER A - LATIN SMALL LETTER Z" />
2083	   </data>
2084	   </lgr>

2086	   The following sample LGR shows a more complete collection of the
2087	   elements and attributes defined in this specification in a somewhat
2088	   typical context.

2090	   <?xml version="1.0" encoding="utf-8"?>

2092	   <!-- This example uses a large subset of the features of this
2093	        specification. It does not include every set operator,
2094	        match operator element, or action trigger attribute, their
2095	        use being largely parallel to the ones demonstrated. -->

2097	   <lgr xmlns="urn:ietf:params:xml:ns:lgr-1.0">
2098	   <!-- meta element with all optional elements -->
2099	     <meta>
2100	       <version comment="initial version">1</version>
2101	       <date>2010-01-01</date>
2102	       <language>sv</language>
2103	       <domain>example</domain>
2104	       <validity-start>2010-01-01</validity-start>
2105	       <validity-end>2013-12-31</validity-end>
2106	       <description type="text/html">
2107	           <![CDATA[
2108	           This language table was developed with the
2109	           <a href="http://swedish.example/">Swedish
2110	           examples institute</a>.
2111	           ]]>
2112	       </description>
2113	       <unicode-version>6.3.0</unicode-version>
2114	       <references>
2115	         <reference id="0" comment="the most recent" >The
2116	               Unicode Standard 6.2</reference>
2117	         <reference id="1" >RFC 5892</reference>
2118	         <reference id="2" >Big-5: Computer Chinese Glyph
2119	            and Character Code Mapping Table, Technical Report
2120	            C-26, 1984</reference>
2121	       </references>
2122	    </meta>
2123	    <!-- the data section describing the repertoire -->
2124	     <data>
2125	       <!-- single code point "char" element -->
2126	       <char cp="002D" ref="1" comment="HYPHEN" />

2128	       <!-- range elements for contiguous code points,  with tags -->
2129	       <range first-cp="0030" last-cp="0039" ref="1" tag="digit" />
2130	       <range first-cp="0061" last-cp="007A" ref ="1" tag="letter" />

2132	       <!-- code point sequence -->
2133	       <char cp="006C 00B7 006C" comment="catalan middle dot" />

2135	       <!-- alternatively use a when rule -->
2136	       <char cp="00B7" when="catalan-middle-dot" />

2138	        <!-- code point with context rule -->
2139	       <char cp="200D" when="joiner" ref="2" />

2141	       <!-- code points with variants -->
2142	       <char cp="4E16" tag="preferred" ref="0">
2143	         <var cp="4E17" type="block" ref="2" />
2144	         <var cp="534B" type="allocate" ref="2" />
2145	       </char>
2146	       <char cp="4E17" ref="0">
2147	         <var cp="4E16" type="allocate" ref="2" />
2148	         <var cp="534B" type="allocate" ref="2" />
2149	       </char>
2150	       <char cp="534B" ref="0">
2151	         <var cp="4E16" type="allocate" ref="2" />
2152	         <var cp="4E17" type="block" ref="2" />
2153	       </char>
2154	     </data>

2156	     <!-- Context and whole label rules -->
2157	     <rules>
2158	       <!-- Require the given code point to be between two 006C -->
2159	       <rule name="catalan-middle-dot" ref="0">
2160	           <look-behind>
2161	               <char cp="006C" />
2162	           </look-behind>
2163	           <anchor />
2164	           <look-ahead>
2165	               <char cp="006C" />
2166	           </look-ahead>
2167	       </rule>
2168	       <!-- example of a context rule based on property -->
2169	       <class name="virama" property="ccc:9" />
2170	       <rule name="joiner"  ref="1" >
2171	           <look-behind>
2172	               <class by-ref="virama" />
2173	           </look-behind>
2174	           <anchor />
2175	       </rule>

2177	       <!-- example of using set operators -->

2179	       <!-- Subtract vowels from letters to get
2180	            consonant, demonstrating the different
2181	            set notations and the difference operator -->
2182	       <difference name="consonants">
2183	            <class comment="all letters">0061-007A</class>
2184	            <class comment="all vowels">
2185	                    0061 0065 0069 006F 0075
2186	            </class>
2187	        </difference>

2189	        <!-- by using the start and end, rule matches whole label -->
2190	        <rule name="three-or-more-consonants">
2191	            <start />
2192	            <!-- reference the class defined by the difference
2193	                 and require three or more matches -->
2194	            <class by-ref="consonants" count="3+" />
2195	            <end />
2196	       </rule>

2198	       <!-- rule for negative matching -->
2199	       <rule name="non-preferred"
2200	             comment="matches any non-preferred code point">
2201	           <complement comment="non-preferred" >
2202	               <class from-tag="preferred" />
2203	           </complement>
2204	       </rule>

2206	      <!-- actions triggered by matching rules and/or
2207	           variant types -->
2208	       <action disp="consonants"
2209	               match="three-or-more-consonants" />
2210	       <action disp="block" any-variant="block" />
2211	       <action disp="activate" all-variants="allocate"
2212	               not-match="non-preferred" />
2213	     </rules>
2214	   </lgr>

2216	Appendix B.  How to Translate RFC 3743 based Tables into the XML Format

2218	   As a background, the [RFC3743] rules work as follows:

2220	   1.  The Original (requested) label is checked to make sure that all
2221	       the code points are a subset of the repertoire.

2223	   2.  If it passes the check, the Original label is allocatable.

2225	   3.  Generate the all-simplified and all-traditional variant labels
2226	       (union of all the labels generated using all the simplified
2227	       variants of the code points) for allocation.

2229	   To illustrate by example, here is one of the more complicated set of
2230	   variants:

2232	       U+4E7E
2233	       U+4E81
2234	       U+5E72
2235	       U+5E79
2236	       U+69A6
2237	       U+6F27

2239	   The following shows the relevant section of the Chinese language
2240	   table published by the .ASIA registry [ASIA-TABLE].  Its entries
2241	   read:

2243	    <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>

2245	   These are the lines corresponding to the set of variants listed above

2247	   U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
2248	   U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
2249	   U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
2250	   U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
2251	   U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
2252	   U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6

2254	   The corresponding data section XML format would look like this:

2256	       <data>
2257	       <char cp="4E7E">
2258	       <var cp="4E7E" type="both" comment="identity" />
2259	       <var cp="4E81" type="block" />
2260	       <var cp="5E72" type="simp" />
2261	       <var cp="5E79" type="block" />
2262	       <var cp="69A6" type="block" />
2263	       <var cp="6F27" type="block" />
2264	       </char>
2265	       <char cp="4E81">
2266	       <var cp="4E7E" type="trad" />
2267	       <var cp="5E72" type="simp" />
2268	       <var cp="5E79" type="block" />
2269	       <var cp="69A6" type="block" />
2270	       <var cp="6F27" type="block" />
2271	       </char>
2272	       <char cp="5E72">
2273	       <var cp="4E7E" type="trad"/>
2274	       <var cp="4E81" type="block"/>
2275	       <var cp="5E72" type="both" comment="identity"/>
2276	       <var cp="5E79" type="trad"/>
2277	       <var cp="69A6" type="block"/>
2278	       <var cp="6F27" type="block"/>
2279	       </char>
2280	       <char cp="5E79">
2281	       <var cp="4E7E" type="block"/>
2282	       <var cp="4E81" type="block"/>
2283	       <var cp="5E72" type="simp"/>
2284	       <var cp="5E79" type="trad" comment="identity"/>
2285	       <var cp="69A6" type="block"/>
2286	       <var cp="6F27" type="block"/>
2287	       </char>
2288	       <char cp="69A6">
2289	       <var cp="4E7E" type="block"/>
2290	       <var cp="4E81" type="block"/>
2291	       <var cp="5E72" type="simp"/>
2292	       <var cp="5E79" type="block"/>
2293	       <var cp="69A6" type="trad" comment="identity"/>
2294	       <var cp="6F27" type="block"/>
2295	       </char>
2296	       <char cp="6F27">
2297	       <var cp="4E7E" type="simp"/>
2298	       <var cp="4E81" type="block"/>
2299	       <var cp="5E72" type="block"/>
2300	       <var cp="5E79" type="block"/>
2301	       <var cp="69A6" type="block"/>
2302	       <var cp="6F27" type="trad" comment="identity"/>
2303	       </char>
2304	     </data>

2306	   Here the simplified variants have been given a type of "simp", the
2307	   traditional variants one of "trad" and all other ones are given
2308	   "block".

2310	   Because some variant mappings show in more than one column, while the
2311	   XML format allows only a single type value, they have been given the
2312	   type of "both".

2314	   Note that some variant mappings map to themselves (identity), that is
2315	   the mapping is reflexive (see Section 4.2.4).  In creating the
2316	   permutation of all variant labels, these mappings have no effect,
2317	   other than adding a value to the variant type list for the variant
2318	   label containing them.

2320	   In the example so far, all of the entries with type="both" are also
2321	   mappings where source and target are identical.  That is, they are
2322	   reflexive mappings as defined in Section 4.2.4.

2324	   Given a label "U+4E7E U+4E81", the following labels would be ruled
2325	   allocatable under [RFC3743] based on how that standard is commonly
2326	   implemented in domain registries:

2328	       Original label:     U+4E7E U+4E81
2329	       Simplified label 1: U+4E7E U+5E72
2330	       Simplified label 2: U+5E72 U+5E72
2331	       Traditional label:  U+4E7E U+4E7E

2333	   However, if allocatable labels were generated simply by a straight
2334	   permutation of all variants with type other than type="block" and
2335	   without regard to the simplified / traditional variants, we would end
2336	   up with an extra allocatable label of "U+5E72 U+4E7E".  This label is
2337	   comprised of a both Simplified Chinese character and a Traditional
2338	   Chinese code point and therefore shouldn't be allocatable.

2340	   To more fully resolve the dispositions requires several actions to be
2341	   defined as described in Section 6.2.2 which will override the default
2342	   actions from Section 6.6.  After blocking all labels that contain a
2343	   variant with type "block", these actions will allocate labels based
2344	   on the following variant types: "simp", "trad" and "both".  Note that
2345	   these variant types do not directly relate to dispositions for the
2346	   variant label, but that the actions will resolve them to the standard
2347	   dispositions on labels, to with "block" and "allocate".

2349	   To resolve label dispositions requires five actions to be defined (in
2350	   the rules section of this document) these actions apply in order and
2351	   the first one triggered, defines the disposition for the label.  The
2352	   actions are:

2354	   1.  block all variant labels containing at least one blocked variant.

2356	   2.  allocate all labels that consist entirely of variants that are
2357	       "simp" or "both"

2359	   3.  also allocate all labels that are entirely "trad" or "both"

2361	   4.  block all surviving labels containing any one of the dispositions
2362	       "simp" or "trad" or "both" because they are now known to be part
2363	       of an undesirable mixed simplified/traditional label

2365	   5.  allocate any remaining label; the original label would be such a
2366	       label.

2368	   The rules declarations would be represented as:

2370	     <rules>
2371	       <!--Action elements - order defines precedence-->
2372	       <action disp="block"     any-variant="block" />
2373	       <action disp="allocate"  only-variants="simp both" />
2374	       <action disp="allocate"  only-variants="trad both" />
2375	       <action disp="block"     any-variant="simp trad" />
2376	       <action disp="allocate"  comment="catch-all" />
2377	     </rules>

2379	   Up to now, variants with type "both" have occurred only associated
2380	   with reflexive variant mappings.  The "action" elements defined above
2381	   rely on the assumption that this is always the case.  However,
2382	   consider the following set of variants:

2384	       U+62E0;U+636E;U+636E;U+64DA
2385	       U+636E;U+636E;U+64DA;U+62E0
2386	       U+64DA;U+636E;U+64DA;U+62E0

2388	   The corresponding XML would be:

2390	       <char cp="62E0">
2391	       <var cp="636E" type="both" comment="both, but not reflexive" />
2392	       <var cp="64DA" type="block" />
2393	       </char>
2394	       <char cp="636E">
2395	       <var cp="636E" type="simp" comment="reflexive, but not both" />
2396	       <var cp="64DA" type="trad" />
2397	       <var cp="62E0" type="block" />
2398	       </char>
2399	       <char cp="64DA">
2400	       <var cp="636E" type="simp" />
2401	       <var cp="64DA" type="trad" comment="reflexive" />
2402	       <var cp="62E0" type="block" />
2403	       </char>

2405	   To make such variant sets work requires a way to selectively trigger
2406	   an action based on whether a variant type is associated with an
2407	   identity or reflexive mapping, or is associated with an ordinary
2408	   variant mapping.  This can be done by adding a prefix "r-" to the
2409	   "type" attribute on reflexive variant mappings.  For example the
2410	   "trad" for code point U+64DA in the preceding figure would become
2411	   "r-trad".

2413	   With the dispositions prepared in this way, only a slight
2414	   modification to the actions is needed to yield the correct set of
2415	   allocatable labels:

2417	     <action disp="block" any-variant="block" />
2418	     <action disp="allocate" only-variants="simp r-simp both r-both" />
2419	     <action disp="allocate" only-variants="trad r-trad both r-both" />
2420	     <action disp="block" all-variants="simp trad both" />
2421	     <action disp="allocate" />

2423	   The first three actions get triggered by the same labels as before.

2425	   The fourth action blocks any label that combines an original code
2426	   point with any mix of ordinary variant mappings; however no labels
2427	   that are a combination of only original code points (code points
2428	   having either no variant mappings or a reflexive mapping) would be
2429	   affected.  These are the original labels and they are allocated in
2430	   the last action.

2432	   Using this scheme of assinging types to ordinary and reflexive
2433	   variants, all RFC 3743-style tables can be converted to XML.  By
2434	   defining a set of actions as outlined above, the LGR will yield the
2435	   correct set of allocatable variants: all variants consisting
2436	   completely of variant code points preferred for simplified or
2437	   traditional, respectively, will be allocated, as will be the original
2438	   label.  All other variant labels will be blocked.

2440	Appendix C.  Indic Syllable Structure Example

2442	   In LGRs for Indic scripts it may be desirable to restrict valid
2443	   labels to sequences of valid Indic syllables, or aksharas.  This
2444	   appendix gives a sample set of rules designed to enforce this
2445	   restriction.

2447	   An example of a BNF from for an akshara which has been published in
2448	   "Devanagari Script Behavior for Hindi" [TDIL-HINDI].  The rules for
2449	   ther languages and scripts used in India are expected to be generally
2450	   similar.

2452	   For Hindi, the BNF has the form:

2454	       V[m]|{C[N]H}C[N](H|[v][m])

2456	   Where:

2458	   V    (upper case) is any independent vowel

2460	   m    is any vowel modifier (Devanagari Anusvara, Visarga, and
2461	        Candrabindu)

2463	   C    is any consonant (with inherent vowel)

2465	   N    is Nukta

2467	   H    is a Halant (or Virama)

2469	   v    (lower case) is any dependent vowel sign (matra)

2471	   {}   encloses items which may be repeated one or more times

2473	   [ ]  encloses items which may or may not be present

2475	   |    separates items, out of which only one can be present

2477	   By using the Unicode property "InSC" or "Indic_Syllable_Category"
2478	   which corresponds rather directly to the classification of characters
2479	   in the BNF above, we can directly translate the BNF into a set of WLE
2480	   rules matching the definition of an akshara.

2482	    <rules>
2483	       <!--Character Class Definitions go here-->
2484	       <class name="halant" property="InSC:Virama" />
2485	       <union name="vowel-modifier">
2486	         <class property="InSC:Visarga" />
2487	         <class property="InSC:Bindu" comment="includes anusvara" />
2488	       </union>
2489	       <!--Whole label evaluation and Context rules go here-->
2490	       <rule name="consonant-with-optional-nukta">
2491	           <class by-ref="InSC:Consonant" />
2492	           <class by-ref="InSC:Nukta"  count="0:1"/>
2493	       </rule>
2494	       <rule name="independent-vowel-with-optional-modifier">
2495	           <class by-ref="InSC:Vowel_Independent" />
2496	           <class by-ref="vowel-modifier"  count="0:1" />
2497	       </rule>
2498	       <rule name="optional-dependent-vowel-with-opt-modifier" >
2499	         <class by-ref="InSC:Vowel_Dependent" count="0:1" />
2500	         <class by-ref="vowel-modifier" count="0:1"  />
2501	       </rule>
2502	       <rule name="consonant-cluster">
2503	         <rule count="0+">
2504	           <rule by-ref="consonant-with-optional-nukta" />
2505	           <class by-ref="halant" />
2506	         </rule>
2507	         <rule by-ref="consonant-with-optional-nukta" />
2508	         <choice>
2509	           <class by-ref="halant" />
2510	           <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
2511	         </choice>
2512	       </rule>
2513	       <rule name="akshara">
2514	         <choice>
2515	           <rule by-ref="independent-vowel-with-optional-modifier" />
2516	           <rule by-ref="consonant-cluster" />
2517	         </choice>
2518	       </rule>
2519	       <rule name="WLE-akshara-or-other" comment="series of one or
2520	           more aksharas, possibly alternating with other types of
2521	           code points such as digits">
2522	         <start />
2523	         <choice count="1+">
2524	           <class property="InSC:other"  />
2525	           <rule by-ref="akshara"  />
2526	         </choice>
2527	         <end />
2528	       </rule>
2529	       <!--Action elements go here - order defines precedence-->
2530	       <action disp="invalid" not-match="WLE-akshara-or-other" />
2531	     </rules>

2533	   With the rules and classes as defined above, the final action assigns
2534	   a disposition of "invalid" to all labels that are not composed of a
2535	   sequence of well-formed aksharas, optionally interspersed with other
2536	   characters, perhaps digits, for example.

2538	   The relevant Unicode property could be replicated by tagging
2539	   repertoire values directly in the LGR which would remove the
2540	   dependency on any specific version of the Unicode Standard.

2542	   Generally, dependent vowels may only follow consonant expressions,
2543	   however, for some scripts, like Bengali, the Unicode standard
2544	   supports sequences of dependent vowels or their application on
2545	   independent vowels.  This makes the definition of akshara less
2546	   restrictive.

2548	   It is possible to reduce the complexity of these rules by defining
2549	   alternate rules which simply define the permissible pair-wise context
2550	   of adjacent code points by character class--such as the rule that a
2551	   Halant can only follow a (nuktated) consonant.  (See the example in
2552	   [WLE-RULES]).

2554	Appendix D.  RelaxNG Compact Schema

2556	   default namespace = "urn:ietf:params:xml:ns:lgr-1.0"

2558	   #
2559	   # SIMPLE TYPES
2560	   #

2562	   # RFC 5646 language tag (e.g. "de", "und-Latn", etc.)
2563	   language-tag = xsd:token

2565	   # The scope to which the LGR applies. For the "domain" scope type it
2566	   # should be a fully qualified domain name.
2567	   scope-value = xsd:token {
2568	       minLength = "1"
2569	   }

2571	   ## a single code point
2572	   code-point = xsd:token {
2573	       pattern = "[0-9A-F]{4,6}"
2574	   }

2576	   ## a space-separated sequence of code points
2577	   code-point-sequence = xsd:token {
2578	       pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+"
2579	   }

2581	   ## single code point, or a sequence of code points
2582	   code-point-literal = code-point | code-point-sequence

2584	   code-point-set-shorthand = xsd:token {
2585	       pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})"
2586	                 ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*"
2587	   }

2589	   ## dates are used in information fields in the meta section.
2590	   date = xsd:token {
2591	       pattern = "\d{4}-\d\d-\d\d"
2592	   }

2594	   ## reference to a rule name (used in "when" and "not-when"
2595	   ## attributes, as well as the "by-ref" attribute of the "rule"
2596	   ## element.)
2597	   rule-ref = xsd:IDREF
2598	   ## a space-separated list of tags. Tags should generally follow
2599	   ## xsd:Name syntax. However, we are using the xsd:NMTOKENS here
2600	   ## because there is no native XSD datatype for space-separated
2601	   ## xsd:Name
2602	   tags = xsd:NMTOKENS

2604	   ## The value space of a "from-tag" attribute. Although it is closer
2605	   ## to xsd:IDREF lexically and semantically, tags are not unique in
2606	   ## the document. As such, we are unable to take advantage of
2607	   ## facilities provided by a validator. xsd:NMTOKEN is used instead
2608	   ## of the stricter xsd:Names here so as to be consistent with the
2609	   ## above.
2610	   tag-ref = xsd:NMTOKEN

2612	   ## an identifier type (used by "name" attributes).
2613	   identifier = xsd:ID

2615	   ## used in the class "by-ref" attribute to reference another class
2616	   ## of
2617	   ## the same "name" attribute value.
2618	   class-ref = xsd:IDREF

2620	   ## count attribute pattern ("n", "n+" or "n:m")
2621	   count-pattern = xsd:token {
2622	       pattern = "\d+(\+|:\d+)?"
2623	   }

2625	   #
2626	   # STRUCTURES
2627	   #

2629	   ## Representation of a single code point, or a sequence of code
2630	   ## points
2631	   char = element char {
2632	       attribute cp { code-point-literal },
2633	       attribute comment { text }?,
2634	       attribute when { rule-ref }?,
2635	       attribute not-when { rule-ref }?,
2636	       attribute tag { tags }?,
2637	       attribute ref { text }?,
2638	       variant*
2639	   }

2641	   ## Representation of a range of code points
2642	   range = element range {
2643	       attribute first-cp { code-point },
2644	       attribute last-cp { code-point },
2645	       attribute comment { text }?,
2646	       attribute tag { tags }?,
2647	       attribute ref { text }?
2648	   }

2650	   ## Representation of a single code point (no sequences allowed, and
2651	   ## no tag attribute allowed). This is used when defining the set of
2652	   ## characters that constitute a class.
2653	   char-simple = element char {
2654	       attribute cp { code-point }
2655	   }

2657	   ## Representation of a range of code points, for use in defining the
2658	   ## set of characters that constitute a class.
2659	   range-simple = element range {
2660	       attribute first-cp { code-point },
2661	       attribute last-cp { code-point }
2662	   }

2664	   ## Representation of a variant code point or sequence
2665	   variant = element var {
2666	       attribute cp { code-point-literal },
2667	       attribute type { text }?,
2668	       attribute when { rule-ref }?,
2669	       attribute not-when { rule-ref }?,
2670	       attribute comment { text }?,
2671	       attribute type { text }?,
2672	       attribute ref { text }?
2673	   }

2675	   #
2676	   # Classes
2677	   #

2679	   ## a "class" element that references the name of another "class"
2680	   ## (or set-operator like "union") defined elsewhere.
2681	   ## If used as a matcher (appearing under a "rule" ## element),
2682	   ## the "count" attribute may be present.
2683	   class-invocation = element class {
2684	       (attribute by-ref { class-ref }
2685	           | attribute from-tag { tag-ref }),
2686	       attribute count { count-pattern }?,
2687	       attribute comment { text }?
2688	   }

2690	   ## defines a new class (set of code points) using Unicode property or
2691	   ## code point literals
2692	   class-declaration = element class {
2693	       # "name" attribute MUST be present if this is a "top-level" class
2694	       # declaration, i.e. appearing directly under the "rules" element.
2695	       # Otherwise, it MUST be absent.
2696	       attribute name { identifier }?,
2697	       # If used as a matcher (appearing in a "rule" element), the
2698	       # "count" attribute may be present. Otherwise, it MUST be absent.
2699	       attribute count { count-pattern }?,
2700	       attribute comment { text }?,
2701	       attribute ref { text }?,
2702	       (
2703	         # define the class by property (e.g. property="sc:Latn"), OR
2704	         attribute property { text }
2705	         # define the class by tagged code points, OR
2706	         | attribute from-tag { tag-ref }
2707	         # list of single code points and ranges, OR
2708	         | (char-simple | range-simple)+
2709	         # text node to allow for shorthand notation e.g.
2710	         # "0061 0062-0063"
2711	         | code-point-set-shorthand
2712	       )
2713	     }

2715	   class-or-set-operator-nested =
2716	     class-invocation | class-declaration | set-operator

2718	   class-or-set-operator-declaration =
2719	     # a "class" element or set operator (effectively defining a class)
2720	     # directly in the "rules" element.
2721	     class-declaration | set-operator

2723	   #
2724	   # Set operators
2725	   #

2727	   complement-operator = element complement {
2728	       attribute name { identifier }?,
2729	       attribute comment { text }?,
2730	       attribute ref { text }?,
2731	       # "count" attribute MUST only be used when this set-operator is
2732	       # used as a matcher (i.e. nested in a <rule> element)
2733	       attribute count { count-pattern }?,
2734	       class-or-set-operator-nested
2735	   }

2737	   union-operator = element union {
2738	       attribute name { identifier }?,
2739	       attribute comment { text }?,
2740	       attribute ref { text }?,
2741	       # "count" attribute MUST only be used when this set-operator is
2742	       # used as a matcher (i.e. nested in a <rule> element)
2743	       attribute count { count-pattern }?,
2744	       class-or-set-operator-nested,
2745	       # needs two or more child elements
2746	       class-or-set-operator-nested+
2747	   }

2749	   intersection-operator = element intersection {
2750	       attribute name { identifier }?,
2751	       attribute comment { text }?,
2752	       attribute ref { text }?,
2753	       # "count" attribute MUST only be used when this set-operator is
2754	       # used as a matcher (i.e. nested in a <rule> element)
2755	       attribute count { count-pattern }?,
2756	       class-or-set-operator-nested,
2757	       class-or-set-operator-nested
2758	   }

2760	   difference-operator = element difference {
2761	       attribute name { identifier }?,
2762	       attribute comment { text }?,
2763	       attribute ref { text }?,
2764	       # "count" attribute MUST only be used when this set-operator is
2765	       # used as a matcher (i.e. nested in a <rule> element)
2766	       attribute count { count-pattern }?,
2767	       class-or-set-operator-nested,
2768	       class-or-set-operator-nested
2769	   }

2771	   symmetric-difference-operator = element symmetric-difference {
2772	       attribute name { identifier }?,
2773	       attribute comment { text }?,
2774	       attribute ref { text }?,
2775	       # "count" attribute MUST only be used when this set-operator is
2776	       # used as a matcher (i.e. nested in a <rule> element)
2777	       attribute count { count-pattern }?,
2778	       class-or-set-operator-nested,
2779	       class-or-set-operator-nested
2780	   }

2782	   ## operators that transform class(es) into a new class.
2783	   set-operator = complement-operator
2784	                  | union-operator
2785	                  | intersection-operator
2786	                  | difference-operator
2787	                  | symmetric-difference-operator

2789	   #
2790	   # Match operators (matchers)
2791	   #

2793	   any-matcher = element any {
2794	       attribute count { count-pattern }?,
2795	       attribute comment { text }?
2796	   }

2798	   choice-matcher = element choice {
2799	       attribute count { count-pattern }?,
2800	       attribute comment { text }?,
2801	       # two or more match operators
2802	       match-operator-choice,
2803	       match-operator-choice+
2804	   }

2806	   char-matcher =
2807	     # for use as a matcher - like "char" but without a "tag" attribute
2808	     element char {
2809	       attribute cp { code-point-literal },
2810	       # If used as a matcher (appearing in a "rule" element), the
2811	       # "count" attribute may be present. Otherwise, it MUST be
2812	       # absent.
2813	       attribute count { count-pattern }?,
2814	       attribute comment { text }?,
2815	       attribute ref { text }?
2816	   }

2818	   start-matcher = element start {
2819	       attribute comment { text }?
2820	   }

2822	   end-matcher = element end {
2823	       attribute comment { text }?
2824	   }

2826	   anchor-matcher = element anchor {
2827	       attribute comment { text }?
2828	   }

2830	   look-ahead-matcher = element look-ahead {
2831	       attribute comment { text }?,
2832	       match-operators-non-pos
2833	   }
2834	   look-behind-matcher = element look-behind {
2835	       attribute comment { text }?,
2836	       match-operators-non-pos
2837	   }

2839	   ## non-positional match operator that can be used as a
2840	   ## direct child element of the choice matcher.
2841	   match-operator-choice = (
2842	     any-matcher | choice-matcher | start-matcher | end-matcher
2843	     | char-matcher | class-or-set-operator-nested | rule-matcher
2844	   )

2846	   ## non-positional match operators do not contain any anchor,
2847	   ## look-behind or look-ahead elements.
2848	   match-operators-non-pos = (
2849	     start-matcher?,
2850	     (any-matcher | choice-matcher | char-matcher
2851	      | class-or-set-operator-nested | rule-matcher)*,
2852	     end-matcher?
2853	   )

2855	   ## positional match operators have an anchor element, which may be
2856	   ## preceeded by a look-behind element, or followed by a look-ahead
2857	   ## element, or both.
2858	   match-operators-pos =
2859	     look-behind-matcher?, anchor-matcher, look-ahead-matcher?

2861	   match-operators = match-operators-non-pos | match-operators-pos

2863	   #
2864	   # Rules
2865	   #

2867	   # top-level rule must have "name" attribute
2868	   rule-declaration-top = element rule {
2869	       attribute name { identifier },
2870	       attribute comment { text }?,
2871	       attribute ref { text }?,
2872	       match-operators
2873	   }

2875	   ## rule element used as a matcher (either by-ref or contains other
2876	   ## match operators itself)
2877	   rule-matcher =
2878	     element rule {
2879	       attribute count { count-pattern }?,
2880	       attribute comment { text }?,
2881	       attribute ref { text }?,
2882	       (attribute by-ref { rule-ref } | match-operators)
2883	     }

2885	   #
2886	   # Actions
2887	   #

2889	   action-declaration = element action {
2890	       attribute comment { text }?,
2891	       attribute ref { text }?,
2892	       attribute disp { text },
2893	       ( attribute match { text } | attribute not-match { text } )?,
2894	       ( attribute any-variant { text }
2895	         | attribute all-variants { text }
2896	         | attribute only-variants { text } )?
2897	   }

2899	   # DOCUMENT STRUCTURE

2901	   start = lgr
2902	   lgr = element lgr {
2903	       attribute id { text }?,
2904	       meta-section?,
2905	       data-section,
2906	       rules-section?
2907	   }

2909	   ## Meta section - information recorded with an label
2910	   ## generation ruleset that generally does not affect machine
2911	   ## processing (except for unicode-version). However, if any
2912	   ## "class-declaration" uses the "property" attribute, one or
2913	   ## more unicode-version MUST be present.

2915	   meta-section = element meta {
2916	       element version {
2917	           attribute comment { text }?,
2918	           text
2919	       }?
2920	       & element date {
2921	           xsd:token {
2922	               pattern = "\d{4}-\d{2}-\d{2}"
2923	           }
2924	       }?
2925	       & element language { language-tag }*
2926	       & element scope {
2927	           # type may by "domain" or an application-defined value
2928	           attribute type { xsd:NCName },
2929	           scope-value
2930	       }*
2931	       & element validity-start { text }?
2932	       & element validity-end { text }?
2933	       & element unicode-version {
2934	           xsd:token {
2935	               pattern = "\d+\.\d+\.\d+"
2936	           }
2937	       }?
2938	       & element description {
2939	           attribute type { text }?,
2940	           text
2941	       }?
2942	       & element references {
2943	           element reference {
2944	               attribute id { text },
2945	               attribute comment { text }?,
2946	               text
2947	           }*
2948	       }?
2949	   }

2951	   data-section = element data { (char | range)+ }

2953	   ## Note that action declarations are strictly order dependent.
2954	   ## class-or-set-operator-declaration and rule-declaration-top
2955	   ## are weakly order dependent, they must precede first use of the
2956	   ## identifier via by-ref.
2957	   rules-section = element rules {
2958	     ( class-or-set-operator-declaration
2959	       | rule-declaration-top
2960	       | action-declaration)*
2961	   }

2963	Appendix E.  Acknowledgements

2965	   This format builds upon the work on documenting IDN tables by many
2966	   different registry operators.  Notably, a comprehensive language
2967	   table for Chinese, Japanese and Korean was developed by the "Joint
2968	   Engineering Team" [RFC3743] that is the basis of many registry
2969	   policies; and a set of guidelines for Arabic script registrations
2970	   [RFC5564] was published by the Arabic-language community.

2972	   Contributions that have shaped this document have been provided by
2973	   Francisco Arias, Mark Davis, Paul Hoffman, Nicholas Ostler, Thomas
2974	   Roessler, Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan and
2975	   John Yunker.

2977	Appendix F.  Editorial Notes

2979	   This appendix to be removed prior to final publication.

2981	F.1.  Known Issues and Future Work

2983	   o  A method of specifying the origin URI for a table, and an
2984	      expiration or refresh policy, as meta-data may be a useful way to
2985	      declare how the table will be updated.

2987	   o  The "domain" element should be specified as absolute, so that the
2988	      Root can be identified as needed for the Root Zone LGR.

2990	   o  The recommended names for disposition ("block" and "allocate")
2991	      deviate from the name in the Root Zone LGR Procedure ("blocked"
2992	      and "allocatable").  The latter were chosen to highlight that the
2993	      machine processing of the LGR table is just the first step, actual
2994	      allocation requires additional actions, hence "allocatable".  This
2995	      should be resolved.

2997	F.2.  Change History

2999	   -00  Initial draft.

3001	   -01  Add an XML Namespace, and fix other XML nits.  Add support for
3002	        sequences of code points.  Improve on consistently using Unicode
3003	        nomenclature.

3005	   -02  Add support for validity periods.

3007	   -03  Incorporate requirements from the Label Generation Ruleset
3008	        Procedure for the DNS Root Zone.  These requirements include a
3009	        detailed grammar for specifying whole-label variants, and the
3010	        ability to explicitly declare of the actions associated with a
3011	        specific variant.  The document also consistently applies the
3012	        term "Label Generation Ruleset", rather than "IDN table", to
3013	        reflect the policy term now being used to describe these.

3015	   -04  Support reference information per [RFC3743].  Update description
3016	        in response to feedback.  Extend the context rules to "char"
3017	        elements and allow for inverse matching ("not-when").  Extend
3018	        the description of label processing and implied actions, and
3019	        allow for actions that reference disposition attributes on any
3020	        or all variant mappings used in the generation of a variant
3021	        label.

3023	   -05  Change the name of the "disposition" attribute to "disp".  Add
3024	        comment attribute on version and reference elements.  Allow
3025	        empty "cp" attributes in char elements to support expressing
3026	        symmetric mapping of null variants.  Describe use of variants
3027	        that map identically.  Clarify how actions are triggered, in
3028	        particular based on variant dispositions, as well as description
3029	        of default actions.  Revise description of processing a label
3030	        and its variants.  Move example table at the head of appendices.
3031	        Add "only-variants" attribute.  Change "name" attribute to "by-
3032	        ref" attribute for referencing named classes and rules.  Change
3033	        "not" to "complement".  Remove "match" attribute on rules as
3034	        redundant if "start" and "end" are supported.  Rename "match"
3035	        element to "anchor" as better fitting it's function and removing
3036	        confusion with both the "match" attribute on actions as well as
3037	        the generic term Match Operator.  Augmented the examples
3038	        relevant to [RFC3743].

3040	   -06  Extend the discussion of reflexive variants and their use;
3041	        includes update of the appendix on converting tables in the
3042	        style of [RFC3743].  Improve description of tagging and clarify
3043	        that it doesn't apply to sequences.  Specify that root zone uses
3044	        ".".  Add an appendix with an Indic Syllable Structure example.
3045	        Extend count attribute to allow maximal counts.

3047	   -07  Change "byref" to "by-ref".  Add list of recommended properties.
3048	        Change "location" to "positional" for collective name of start/
3049	        end match operators.  Use from-tag instead of by-ref for tag-
3050	        based classes.  Made optional or mutually exclusive nature of
3051	        some attributes more explicit.  Allowing "comment" attributes on
3052	        all child elements of "rules" except "char" and "range" elements
3053	        used as child elements of "class".  Recast the design goals and
3054	        requirements at the start of the document.  Reword aspects of
3055	        the document to make it clear the format's application is not
3056	        limited only to domain names.

3058	   -08  Change "domain" to scope with type="domain".  Reword in several
3059	        places for clarity.  Flesh out note on security.  Change "disp"
3060	        to "type" for variants, to mark that these attributes do not
3061	        necessarily correspond one-to-one to variant label dispositions.
3062	        Add example of variant type triggers.  Remove "long form" of
3063	        class definition.

3065	   -09  Grammatical updates, clarity improvements.  Altered some DNS-
3066	        specific terminology.

3068	   -10  Added convention for out-of-repertoire variants, additional
3069	        examples of when rules in the context of symmetry, isolated
3070	        minor copy editing.  Use a URN as the XML namespace
3071	        (provisional).  Specify a media type for the file.

3073	Authors' Addresses

3075	   Kim Davies
3076	   Internet Corporation for Assigned Names and Numbers
3077	   12025 Waterfront Drive
3078	   Los Angeles, CA  90094
3079	   US

3081	   Phone: +1 310 301 5800
3082	   Email: kim.davies@icann.org
3083	   URI:   http://www.icann.org/

3085	   Asmus Freytag
3086	   ASMUS Inc.

3088	   Email: asmus@unicode.org