idnits 2.17.1 draft-jseng-idn-admin-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1176 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There are 81 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 1165 has weird spacing: '...t about good ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'ISO7098' on line 1029 looks like a reference

  -- Missing reference section? 'IDN-WG' on line 1034 looks like a reference

  -- Missing reference section? 'STRINGPREP' on line 132 looks like a
     reference

  -- Missing reference section? 'IDNA' on line 1005 looks like a reference

  -- Missing reference section? 'PUNYCODE' on line 1009 looks like a reference

  -- Missing reference section? 'NAMEPREP' on line 1017 looks like a reference

  -- Missing reference section? 'Note1' on line 139 looks like a reference

  -- Missing reference section? 'STD13' on line 1038 looks like a reference

  -- Missing reference section? 'UNICODE' on line 1024 looks like a reference

  -- Missing reference section? 'C2C' on line 1042 looks like a reference

  -- Missing reference section? 'Note2' on line 164 looks like a reference

  -- Missing reference section? 'I18NTERMS' on line 998 looks like a reference

  -- Missing reference section? 'RFC3066' on line 1002 looks like a reference

  -- Missing reference section? 'ABNF' on line 995 looks like a reference

  -- Missing reference section? 'DIGIT' on line 531 looks like a reference

  -- Missing reference section? 'UNIHAN' on line 1021 looks like a reference


     Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 19 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	INTERNET DRAFT                                     Editors:  James SENG
2	draft-jseng-idn-admin-01.txt                               John KLENSIN
3	18th Oct 2002                                      Authors:  K. KONISHI
4	Expires 18th April 2003                        K. HUANG, H. QIAN, Y. KO

6	     Internationalized Domain Names Registration and Administration
7	               Guideline for Chinese, Japanese and Korean

9	Status of this Memo

11	     This document is an Internet-Draft and is in full conformance
12	     with all provisions of Section 10 of RFC2026 except that the
13	     right to produce derivative works is not granted.

15	    Internet-Drafts are working documents of the Internet
16	    Engineering Task Force (IETF), its areas, and its working
17	    groups. Note that other groups may also distribute working
18	    documents as Internet-Drafts.

20	    Internet-Drafts are draft documents valid for a maximum of
21	    six months and may be updated, replaced, or obsoleted by other
22	    documents at any time. It is inappropriate to use Internet-
23	    Drafts as reference material or to cite them other than as
24	    "work in progress."

26	    The list of current Internet-Drafts can be accessed at
27	    http://www.ietf.org/ietf/1id-abstracts.txt

29	    The list of Internet-Draft Shadow Directories can be accessed at
30	    http://www.ietf.org/shadow.html.

32	Abstract

34	Achieving internationalized access to domain names raises many complex
35	issues.  These include not only associated with basic protocol design
36	(i.e., how the names are represented on the network, compared, and
37	converted to appropriate forms) but also issues and options for
38	deployment, transition, registration and administration.

40	The IETF IDN working group focused on the development of a standards
41	track specification for access to domain names in a broader range of
42	scripts than the original ASCII.  It became clear during its efforts
43	that there was great potential for confusion, and difficulties in
44	deployment and transition, due to characters with similar appearances
45	or interpretations and that those issues could best be addressed
46	administratively, rather than through restrictions embedded in the
47	protocols.

49	This document provides guidelines for zone administrators (including
50	but not limited to registry operators and registrars), and information
51	for all domain names holders, on the administration of those domain
52	names which contain characters drawn from Chinese, Japanese and Korean
53	scripts (CJK).  Other language groups are encouraged to develop their
54	own guidelines as needed, based on these guideline if that is helpful.

56	Comments on this document can be sent to the authors at
57	idn-admin@jdna.jp.

59	Table of Contents

61	0. Pre-Note for ASCII-version of this document                        2

63	1. Introduction                                                       3

65	2. Definitions                                                        5

67	3. Administrative Framework                                           6
68	3.1. Principles underlying these Guidelines                           7
69	3.2. Registration of IDL                                              8
70	3.2.1. Language character variant table                               9
71	3.2.2  Formal syntax                                                 10
72	3.2.3. Registration Algorithm                                        10
73	3.3. Deletion and Transfer of IDL and IDL Package                    12
74	3.4. Activation and De-activation of IDN variants                    13
75	3.5. Adding/Deleting language(s) association                         13
76	3.6. Versioning of the language character variant tables             13

78	4. Example of Guideline Adoption                                     14

80	i. Notes                                                             17

82	ii. Acknowledgements                                                 17

84	iii. Authors                                                         18

86	iv. Appendex A                                                       18

88	v. Normative References                                              19

90	vi. Non-normative References                                         19

92	vii. Other Issues                                                    19

94	0. Pre-Note for ASCII-version of this document

96	In order to make meanings clear, especially in examples, Han ideographs
97	are used in several places in this document.  Of course, these
98	ideographs do not appear in its ASCII form of this document.  So, for
99	the convenience of readers of the ASCII format and some readers not
100	familiar with recognizing and distinguishing Chinese characters, each
101	use of a particular character will be associated with both its Unicode
102	code point and an "asterisk tag" with its corresponding Chinese
103	Romanization [ISO7098] with the tone mark represented by a number 1 to
104	4.  Those tags have no meaning outside this document; they are intended
105	simply to provide a quick visual and reading reference to facilitate
106	the combinations and transformations of characters in the guideline and
107	table excerpts.  Appendix A would provide the Romanization of the
108	ideographs in Japanese (ISO 3602) and Korean (ISO 11941).

110	1. Introduction

112	Defining and specifying protocols for Internationalized Domain Names
113	has been one of the most controversial tasks initiated by the IETF in
114	recent years.  Domain names are the fundamental naming architecture of
115	the Internet; many Internet protocols and applications rely on the
116	stability, continuity, and absence of ambiguity of the DNS.

118	The introduction of internationalized domain names (IDN) amplifies the
119	difficulty of putting names into identifiers and the confusion between
120	scripts and languages.  It impacts many internet protocols and
121	applications and creates more complexity in technical administration
122	and services.

124	While the IETF IDN working group [IDN-WG] focused on the technical
125	problems of IDN, administrative guidelines are also important in order
126	to reduce unnecessary user confusion and domain name disputes among
127	domain name holders.

129	The IDN working group has completed working group last call for the
130	following internet-drafts:

132	1. Preparation of Internationalized Strings [STRINGPREP]
133	2. Internationalizing Host Names In Applications [IDNA]
134	3. Punycode version 0.3.3 [PUNYCODE]
135	4. A Stringprep Profile for Internationalized Domain Names [NAMEPREP]

137	These drafts specify that the intersystem protocols that make up the
138	domain name system infrastructure remain unchanged.  Instead, they
139	introduce internationalization (I18N) [Note1] in client software
140	(particularly via the IDNA protocol) using an ASCII Compatible Encoding
141	(ACE) known as Punycode.

143	The domain name protocols [STD13] also specify that characters are to
144	be interpreted so that upper and lower case Latin-based characters are
145	considered equivalent.  But with the introduction of Unicode characters
146	beyond US-ASCII, and the possibility to represent a single character in
147	multiple ways in ISO10646/Unicode [UNICODE], a normalization process,
148	known as Nameprep, has been proposed to handle the more complex
149	problems of character-matching for those additional characters.
150	Nameprep is also executed by client software as described in IDNA.

152	While Nameprep normalizes domain names so that the users have an
153	improved chance of getting the right domain name from information
154	provided in other forms, as required for I18N, Nameprep does not handle
155	any localization (L10N).

157	This becomes significant when a domain name holder attempts to use a
158	Unicode string forming a "name", "word", or "phrase" that may have
159	certain meaning in a certain language or when used as a domain name.
160	Such Unicode string may have different variants in the context of the
161	language or culture.

163	Generally, these localized variants in CJK can be classified into four
164	categories, as described by Halpern et al. [C2C]: [Note2]

166	a. Character (or Code) variants

168	Character (or Code) variants refer to variants that are generated by
169	character-by-character (or code-by-code) substitution.

171	An example in English would be "A" or "a" (U+0041 or U+0061).
172	Two examples in Chinese would be U+98DB *fei1* or U+98DE *fei1*
173	and U+6A5F *ji1* or U+673A *ji1*.

175	Note that this does not mean the choice between U+6A5F and U+673A is
176	always symmetric like the one between "A" and "a" -- it is a choice only
177	for Chinese but not for Japanese.

179	The variants for particular characters may be just to drop them. For
180	example, points and vowels characters in Hebrew (U+05B0 to U+05C4) and
181	Arabic (U+064B to U+0652) are optional; the variants for strings
182	containing them are constructed by simply dropping those points and
183	vowels.

185	Code variants may also occur when different code points are assigned to
186	what visually or abstractly are the "same" character, possibility due
187	to compatibility issues, type face differences or script range. For
188	example, LATIN CAPITAL LETTER A (U+0041) normally has an appearance
189	identical to GREEK CAPTIAL LETTER A (U+0391). CJK scripts have font
190	variants for compatibility (either U+4E0D or U+F967 may be used) and
191	"zVariant" (e.g. U+5154 and U+514E).

193	The difficulty lies in defining which characters are the "same" and
194	which are not.

196	b. Orthographic variants

198	Orthographic variants refer to variants that are generated by word-by-
199	word substitution.

201	An example in English would be "color" and "colour".

203	It is possible for some of these orthographic variants to be generated
204	by character variants. For example "airplane" in Chinese may be either
205	U+98DB U+6A5F *fei1 ji1* or U+98DE U+673A *fei1 ji1*.

207	Other orthographic variants may not be generated by character variants.
208	For example, in Chinese, both U+767C *fa1* and U+9AEE *fa4*
209	are related to U+53D1 *fa1 or fa4* depending on the word. For hair,
210	U+5934 U+53D1 *tou2 fa4*, the variant should be U+982D U+9AEE
211	*tou2 fa4* but not U+982D U+767C *tou2 fa1*.

213	c. Lexemic variants

215	Lexemic variants refer to variants that can be generated when language
216	is considered, by word-by-word substitution.

218	An example in English would be cab, taxi, or taxicab.

220	An example in Chinese would be U+8CC7 U+8A0A *zi1 xun4* or
221	U+4FE1 U+606F *xin4 xi1*.

223	Note that there is no relationship between U+8CC7 and U+4FE1 or U+8A0A
224	and U+606F, i.e., the sequence U+8CC7 U+606F *zi1 xi1* does not
225	exist in Chinese.

227	d. Contextual variants

229	Contextual variants refer to variants that are generated by word-by-
230	word substitutions with context considered.

232	In English, the word "plane" has different meanings and could be
233	replaced by with different equivalent words (synonyms) such as
234	"airplane" or "plane" (as in a flat-surface or device for smoothing
235	wood) depending on context.  And, of course, "plain", which is
236	pronounced the same way, and indistinguishable in speech-to-text
237	contexts such as computer input systems for the visually impaired, is a
238	different word entirely.

240	Similarly, the word U+6587 U+4EF6 *wen2 jian4* could be either
241	document U+6587 U+4EF6 *wen2 jian4* or data file U+6A94 U+6848
242	*dang3 an4* depending on context.

244	Although domain names were designed to be identifiers without any
245	language context, users have not been prevented from using strings in
246	domain names and interpreting them as "words" or "names". It is likely
247	that users will do this with IDN as well. Therefore, given the added
248	complications of using a much broader range of characters, precautions
249	will be required when deploying IDN to minimize confusion and fraud.

251	The intention of these guidelines is to provide advice about the
252	deployment of IDNs, with language consideration, but focusing only on
253	the category of character variants to increase the possibility of
254	successful resolution and reduced confusion while accepting inherent
255	DNS limitations.

257	2. Definitions

259	Unless otherwise stated, the definitions of the terms used in this
260	document are consistent with "Terminology Used in Internationalization
261	in the IETF" [I18NTERMS].

263	"FQDN" refers to a fully-qualified domain name and "domain name label"
264	refers to a label of a FQDN.

266	RFC3066 [RFC3066] defines a system for coding and representing
267	languages.

269	ISO/IEC 10646 is a universal multiple-octet coded character set that is
270	a product of ISO/IEC JTC1/SC2/WG2, Work Item JTC1.02.18 (ISO/IEC 10646).
271	It is a multi-part standard: Part 1, published as ISO/IEC 10646-
272	1:2000(E) covering the Architecture and Basic Multilingual Plane; Part
273	2, published as ISO/IEC 10646-2:2001(E) covers the supplementary
274	(additional) planes.

276	The Unicode Consortium publishes "The Unicode Standard -- Version 3.0",
277	ISBN 0-201-61633-5. In March 2002, Unicode Consortium published Unicode
278	Standard Annex #28.  That annex defines Version 3.2 of The Unicode
279	Standard, which is fully synchronized with ISO/IEC 10646-1:2000 (with
280	Amendment 1).

282	The term "Unicode character" is used here to refer to characters chosen
283	from The Unicode Standard Version 3.2 (and hence from ISO/IEC 10646).
284	In this document, the characters are identified by their positions (or
285	"code points"). The notation U+12AB, for example, indicates the
286	character at the position 12AB (hexadecimal) in the Unicode 3.2 table.

288	Similarly, "Unicode string" refers to a string of Unicode characters.
289	The Unicode string is identify by the sequence of the Unicode
290	characters regardless of the encoding scheme.

292	The term "IDN" is often used to refer to many different things: (a) an
293	abbreviation for "Internationalized Domain Name" (b) a fully-qualified
294	domain name that contains at least one label that contains characters
295	not appearing in ASCII (c) a label of a domain name that contains at
296	least one character beyond ASCII (d) a Unicode string to be processed
297	by Nameprep (e) an IDN Package (in this document context) (f) a
298	Nameprep processed string (g) a Nameprep and Punycode processed string
299	(h) the IETF IDN Working Group (g) ICANN IDN Committee (h) other IDN
300	activities in other companies/organizations etc.

302	Because of the potential confusion, this document shall use the term
303	"IDN" as an abbreviation for "Internationalized Domain Name" only.

305	And also, this document provides a guideline to be applied on a per
306	zone basis, one label at a time, the term "Internationalized Domain
307	Name Label" or "IDL" will be used instead.

309	In this document, the term "registration" refers to the process by
310	which a potential domain name holder requests that a label be placed in
311	the DNS, either as an individual name within a domain or as a sub-
312	domain delegation from another domain name holder. A successful
313	registration would then lead to the label or delegation records being
314	placed in the relevant zone file.  The guidelines presented here are
315	recommended for all zones, at any hierarchy level, in which CJK
316	characters are to appear, not just domains at the first or second level.

318	CJK characters are characters commonly used in Chinese, Japanese or
319	Korean language including but not limited to ASCII (U+0020 to U+007F,
320	Han Ideograph (U+3400 to U+9FAF and U+20000 to U+2A6DF), Bopomofo
321	(U+3100 to U+312F and U+31A0 to U+31BF), Kana (U+3040 to U+30FF), Jamo
322	(U+1100 to 11FF and U+3130 to U+318F), Hangul (U+AC00 to U+D7AF and
323	U+3130 to U+318F) and its respective compatibility forms.

325	3. Administrative Framework

327	Zone administrators are responsible for the administration of the
328	domain name labels under their control. A zone administrator might be
329	responsible for a large zone such as a Top Level Domain (TLD), generic
330	or country code, or a smaller one such as a typical second or third
331	level domain.  A large zone would often be more complex then a smaller
332	one (sometimes it is just larger).  However, normally, actual technical
333	administrative tasks -- such as addition, deletion, delegation and
334	transfer of zones between domain name holders -- are similar for all
335	zones.

337	At the same time, different zones may have different policies and
338	processes.  For example, a pay-per-domain policy and registry/registrar
339	model for .COM may not be applicable to such domains as .SG or .IBM.COM.
340	The latter, for example, has very restricted policies about who is
341	permitted to have a domain name label under IBM.COM, the types of
342	string that are permitted, and different procedures for obtaining those
343	string.

345	This document only provides guidelines for how CJK characters should be
346	handled within a zone, how language issues should be considered and
347	incorporated, and how domain name labels containing CJK characters
348	should be administered (including registration, deletion and transfer
349	of labels). It does not provide any guidance for handling of non-CKJ
350	characters or languages in zones.

352	Other IDN policies, as the creation of new TLDs, or the cost structure
353	for registrations, are outside the scope of this document.  Such
354	discussions should be conducted in forums outside the IETF as well.

356	Technical implementation issues are not discussed here either.  For
357	example, the decision as to whether various of the guidelines should be
358	implemented as registry or registrar actions is left to zone
359	administrators, possibly differing from zone to zone.

361	3.1. Principles underlying these Guidelines

363	In many places, this document would assumes "First-Come-First-Serve"
364	(FCFS) as a conflict policy in the event of a dispute although FCFS is
365	not listed as one of the principles. If other policies dominate
366	priorities and "rights", one can use these guidelines by replacing uses
367	of FCFS in this document by appropriate other policy rules specific to
368	the zone.  In other cases, some of these guidelines may not be
369	applicable although, some alternatives for determining rights to labels
370	-- such as use of UDRP or mutual exclusion -- might have little impact
371	on other aspects of these guidelines.

373	(a) Each IDL to be registered should be associated with one or more
374	languages.

376	Although some Unicode strings may be pure identifiers made up of an
377	assortment of characters from many languages and scripts, IDLs are
378	likely to be names or phrases that have certain meaning in some
379	language.  While a zone administration might or might not require
380	"meaning" as a registration criterion, the possibility of meaning
381	provides a useful tool when trying to avoid user confusion.

383	Zone administrators should administratively associate one or more
384	language with each IDL.  These associations should either be pre-
385	determined by the zone administrator and applied to the entire zone or
386	chosen by the registrants on a per-IDL basis.  The latter may be
387	necessary for some zones, but will make administration more difficult
388	and will increase the likelihood of conflicts in variant forms.

390	A given zone might have multiple languages associated with it, or have
391	no language specified at all, but doing so may provide additional
392	opportunities for user confusion, and is therefore not recommended.

394	The zone administrator must also verify the validity of the IDL
395	requested by using information associated with the chosen language and
396	possibly other rules as appropriate.

398	(b) When an IDL is registered, all of the character variants for the
399	associated language(s) should be reserved for the registrant.  Each
400	language associated with the IDL will lead to different character
401	variants.

403	IDL reservations of the type described here normally do not appear in
404	the distributed DNS zone file.  In other words, these reserved IDLs do
405	not resolve. Domain name holders could request these reserved IDLs to
406	be placed in the zone file and made active and resolvable as, e.g.,
407	aliases or synonyms.

409	Since different languages may imply different sets of variants, the
410	IDLs reserved for one IDL may overlap those reserved for another.  In
411	this case, the reserved IDLs should be bound to one registration or the
412	other, or excluded from both, according to the applicable registration
413	or dispute resolution policy for the zone.

415	(c) For a given base language, the IDL may have one or more recommended
416	variants that should be suggested to the domain name holder for active
417	registration as synonyms.

419	Some language rules may prefer certain variants over others. To
420	increase the likelihood of correct and predictable resolution of the
421	IDL by end-users, the recommended variants should be active.

423	(d) The IDL and its reserved variants with the language(s) association
424	must be atomic.

426	The IDL and its reserved variants for the associated language(s) are to
427	be considered as a single unit -- an "IDL Package". For a given IDL,
428	that IDL package is defined by these guidelines and created upon
429	registration.

431	The IDL Package is atomic: Transfer and deletion of IDL are performed
432	on the IDL Package as a whole. IDL, either active or reserved, within
433	the IDL Package must not be transferred or deleted individually.  I.e.,
434	any re-registration, transfers, or other actions that impact the IDL
435	should also impact the reserved variants.  Separate registration or
436	other actions for the variants are not possible if these guidelines are
437	to accomplish their purpose.

439	Conflict policy of the zone may result in violation of the IDL Package
440	atomicity. In such case, the conflict policy would take precedence.

442	3.2. Registration of IDL

444	Conforming to the principles described in 3.1, the registration of an
445	IDL would require at least two components, i.e., the character variant
446	tables for the language and the registration algorithm.

448	3.2.1. Language character variant table

450	Any lines starting with, or portions of lines after, the hash
451	symbol("#") are treated as comments. Comments have no significance in
452	the processing of the tables, nor are there any syntax requirements
453	between the hash symbol and the end of the line. Blank lines in the
454	tables are ignored completely.

456	Every language should have a character variant table provided by a
457	relevant group (or organization or other body) and based on established
458	standards. The group that defines a particular character variant table
459	should document references to the appropriate standards in beginning of
460	table, tagged with the word "Reference" followed by an integer (the
461	reference number) followed by the description of the reference.  For
462	example,

464	Reference 1 CP936 (commonly known as GBK)
465	Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
466	Reference 3 List of Simplified character Table (Simplified column)
467	Reference 4 zSimpVariant in Unihan.txt
468	Reference 5 variant that exists in GB2312, common simplified hanzi

470	Each language character variant table must have a version number. This
471	is tagged with the word "Version" followed by an integer then followed
472	by the date in the format YYYYMMDD, where YYYY is the 4 digit Year, MM
473	is the 2 digit Month and DD is the 2 digit Day of the publication date
474	of the table

476	Version 1 20020701 	# July 2002 Version 1

478	The table has three fields, separated by semicolons.  The fields are:
479	"valid code point"; "recommended variant(s)"; and "character
480	variant(s)".

482	Only code points listed in the "valid code point" field are allowed to
483	be registered as part of a IDL associated with that language.

485	There can be one or more "recommended variant(s)" (i.e., entries in the
486	"recommended variant(s)" column). If the "recommended variant(s)"
487	column is empty, then there is no corresponding variant.

489	The "character variant(s)" column contains all variants of the code
490	point, including but not limited to the code point itself and the
491	"recommended variant(s)".

493	If the variant is composed of a sequence of code points, then sequence
494	of code points is listed separated by a space in the "recommended
495	variant(s)" or "character variant(s)".

497	If there are multiple variants, each variant must be separated by a
498	comma in the "recommended variant(s)" or "character variant(s)".

500	Any code point listed in the "recommended variant(s)" column must be
501	allowed, by the rules for the relevant language, to be registered.
502	However, this is not a requirement for the entries in the "character
503	variant(s)" column; it is possible that some of those entries may not
504	be allowed to be registered.

506	Every code point in the table should have a corresponding reference
507	number (associated with the references) specified to justify the entry.
508	The reference number is placed in parentheses after the code point. If
509	there is more than one reference, then the numbers are placed within a
510	single set of parentheses and separated by commas.

512	3.2.2. Formal syntax

514	This section uses the IETF "ABNF" metalanguage [ABNF]

516	LanguageCharacterVariantTable = 1*ReferenceLine VersionLine 1*EntryLine
517	ReferenceLine = "Reference" SP RefNo SP RefDesciption [ Comment ] CRLF
518	RefNo = 1*DIGIT
519	RefDesciption = *[VCHAR]
520	VersionLine = "Version" SP VersionNo SP VersionDate [ Comment ] CRLF
521	VersionNo = 1*DIGIT
522	VersionDate = YYYYMMDD
523	EntryLine = VariantEntry/Comment CRLF
524	VariantEntry = ValidCodePoint [ "(" RefList ") ] ;" RecommendedVariant
525	";" CharacterVariant [ Comment ]
526	ValidCodePoint = CodePoint
527	RefList = RefNo  0*( "," RefNo )
528	RecommendedVariant = CodePointSet 0*( "," CodePointSet )
529	CharacterVariant = CodePointSet 0*( "," CodePointSet )
530	CodePointSet = CodePoint 0* ( SP CodePoint )
531	CodePoint = 4DIGIT [DIGIT] [DIGIT]
532	Comment = "#" *VCHAR

534	YYYYMMDD is an integer representing a date where YYYY is the 4 digit
535	year, MM is the 2 digit month and DD is the 2 digit day.

537	3.2.3. Registration Algorithm

539	(An explanation of these steps follows them)

541	1.	IN <= IDL to be registered and
542	   	{L} <= Set of languages associated with IN
543	2.     	{V} <= Set of version numbers of the language character
544	               variant tables derived from {L}
545	3.    	NP(IN) <= Nameprep processed IN  and
546	      	check availability of NP(IN).
547	      	If not available, route to conflict policy.
548	4. 	For each AL in {L}
549	4.1.	  Check validity of NP(IN) in AL. If failed, stop processing.
550	4.2. 	  PV(IN,AL) <= Set of available Nameprep processed recommended
551	                       variants of NP(IN) in AL
552	4.3.	  RV(IN,AL) <= Set of available Nameprep processed character
553	                       variants of NP(IN) in AL
554	4.4.	End of Loop
555	5.	{PV} <= Set of all PV(IN,AL) with optional processing.
556	6.	{ZV} <= {PV} set-union NP(IN)
557	7. 	{RV} <= Set of all RV(IN,AL) set-minus {ZV}
558	8.	Create IDL Package for IN using IN, {L}, {V}, {ZV} and {RV}
559	9.	Put {ZV} into zone file

561	Explanation

563	Step 1 takes the IDL to be registered and the associated language(s) as
564	input to the process.

566	Step 2 extract the set of version numbers of the associated language(s)
567	tables.

569	Step 3 Nameprep processed the IDL.  If the Nameprep processed IDL is
570	already registered or reserved, then the conflict policy is applied
571	here. For example, if FCFS is used, the registration process would stop
572	here.

574	Step 4 goes through all languages associated with the proposed IDL,
575	checks for validity in each language, and generates the recommended
576	variants and the reserved variants.

578	In step 4.1, IDL validation is done by checking that every code point
579	in the Nameprep processed IDL is a code point allowed by the "valid
580	code point" column of the character variant table for the language. If
581	one or more code points are invalid, the registration process must stop
582	here.

584	Step 4.2 generates the list of recommended variants of the IDL by doing
585	a combination of all possible variants listed in "recommend variant(s)"
586	column for each code point in the Nameprep processed IDL. Generated
587	variants must be processed with Nameprep.  If any of the recommended
588	variants of the IDL is registered or reserved, then the conflict policy
589	will be applied although this does not prevent the IDL from being
590	registered. For example, if FCFS is used, then the conflicting
591	variant(s) will be removed from the list.

593	Step 4.3 generates the list of reserved variants by doing a combination
594	of all the possible variants listed in "character variant(s)" column
595	for each code point in the Nameprep processed IDL. Generated variants
596	must be Nameprep processed.  If any of the variants are registered or
597	reserved, then the conflict policy will apply here although this does
598	not prevent the IDL from being registered.  For example, if FCFS is
599	used, then the conflict variants will be removed from the list.

601	The "combination" in Step 4.2 and Step 4.3 could achieve by a recursive
602	function similar to the following pseudo code:

604	Function Combination(Str)
605	  F <= first codepoint of Str
606	  SStr <= Substring of Str, without the first code point
607	  NSC <= {}

609	  If SStr is empty Then
610	     For each V in (Variants of code point F)
611	       NSC = NSC set-union (the string with the code point V)
612	     End of Loop
613	  Else
614	    SubCom = Combination(SStr)
615	    For each V in (Variants of code point F)
616	      For each SC in SubCom
617	        NSC = NSC set-union (the string with the
618	         first code point V followed by the string SC)
619	      End of Loop
620	    End of Loop
621	  Endif

623	  Return NSC

625	Step 5 generates the list of all recommended variants for all language.
626	Optionally, the algorithm may reduce the list of recommended variants
627	by prompting the user to select the recommended variants.

629	Step 6 generates the list of variants including the Nameprep processed
630	IDL which to be activated and Step 7 generates the list of reserved
631	variants.

633	Then an "IDL Package" for IDL is created in Step 8 with the original
634	IDL, the associated language(s), all the list of activated IDLs and the
635	list of variants.  The version numbers of the language character
636	variants tables are also stored in the IDL Package.

638	Lastly, the activated IDLs are converted using ToASCII [IDNA] with
639	UseSTD13ASCIIRules on and then put into the zone file. If the IDL is a
640	subdomain name, it will be delegated. The activated IDLs may be
641	delegated to a different domain name server so long it is owned by the
642	same domain name holder.

644	3.3. Deletion and Transfer of IDL and IDL Package

646	In normal domain administration, every domain name label is independent
647	of all other domain name labels.  Registration, deletion and transfer
648	of domain name labels is done on a per domain name label basis.
649	Depending on the zone's administrative policies, aliases (e.g., "CNAME"
650	entries) may be bound to particular labels with rules about whether one
651	can be changed without the other.  Current policies in gTLDs generally
652	prohibit registration of such aliases, in part to avoid needing to form
653	and enforce policies about these change (or binding) rules.

655	However, with internationalization, each IDL is bound to a list of
656	variant IDLs (with the list depending on the associated language),
657	bound together in an IDL Package.

659	Because all variants of the IDL should belong to a single domain name
660	holder, the IDL Package should be treated as a single entity.
661	Individual IDL, either active or reserved, within the IDL Package must
662	not be deleted or transferred independently of the other IDLs.
663	Specifically, if an IDL is to be deleted or transferred, that action
664	must be taken only as part of an action that affects the entire IDL
665	Package.

667	If the local conflict policy requires IDL to be transferred and deleted
668	independently of the IDL Package, the conflict policy would take
669	precedence. In such event, the conflict policy should be associated
670	with a transfer or delete procedure taking IDL Package into
671	consideration.

673	When an IDL Package is deleted, all the active and reserved variants
674	would be available again.  IDL Package deletion does not change any
675	other IDL Packages, including IDL Packages that have variants that
676	conflict with the variants in the deleted IDL Package. This is to be
677	consistent with the atomicity and predictability of the IDL Package.

679	3.4. Activation and De-activation of IDL variants

681	As there are active IDLs and inactive IDLs within an IDL Package,
682	processes are required to activate or de-activate IDL variants in an
683	IDL Package.

685	The activation algorithm is described below:

687	1. IN <= IDL to be activated & PA <= IDL Package
688	2. NP(IN) <= Nameprep processed IN
689	3. If NP(IN) not in {RV} then stop
690	4. {RV} <= {RV} set-minus NP(IN) and {ZV} <= {ZV} set-union NP(IN)
691	5. Put {ZV} into the zone file

693	Similarly, the deactivation algorithm:
694	1. IN <= IDL to be deactivated & PA <= IDL Package
695	2. NP(IN) <= Nameprep processed IN
696	3. If NP(IN) not in {ZV} then stop
697	4. {RV} <= {RV} set-union NP(IN) and {ZV} <= {ZV} set-minus NP(IN)
698	5. Put {ZV} into the zone file

700	3.5. Adding/Deleting language(s) association

702	The list of variants is generated from the IDL and tables for the
703	associated languages.  If the language associations are changed, then
704	the lists of variants have to be updated.  On the other hand, the IDL
705	Package is atomic and the list of variants must not be changed after
706	creation.

708	Therefore, this document recommends deleting the IDL Package followed
709	by a registration with the new set of languages rather than attempting
710	to add or delete language(s) association within the IDL Package.  Zone
711	administrators may find it desirable to devise procedures to prevent
712	other parties from capturing the labels in the IDL Package during these
713	operations.

715	3.6. Versioning of the language character variant tables

717	Language character variants tables are subjected to changes over time
718	and the changes may or may not be backward compatible.  It is possible
719	that different version of the language character variants tables may
720	produce a different set of recommended variants and reserved variants.

722	New IDL Packages should use the latest version of the language
723	character variants tables.

725	Existing IDL Packages created using previous version of language
726	character variants tables are not affected when there a new version of
727	the character variants table is released.

729	4. Example of Guideline Adoption

731	To provide a meaningful example, some language character variant tables
732	have to be defined.  Assume, then, that the following four language
733	character variants tables are defined (note that these tables are not a
734	representation of the actual table and they do not contain sufficient
735	entries to be used in any actual implementation):

737	a) language character variants tables for zh-cn and zh-sg

739	Reference 1 CP936 (commonly known as GBK)
740	Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
741	Reference 3 List of Simplified character Table (Simplified column)
742	Reference 4 zSimpVariant in Unihan.txt
743	Reference 5 variant that exists in GB2312, common simplified hanzi

745	Version 1 20020701 # July 2002

747	56E2(1);56E2(5);5718(2)			# sphere, ball, circle; mass, lump
748	5718(1);56E2(4);56E2(2),56E3(2)		# sphere, ball, circle; mass, lump
749	60F3(1);60F3(5);			# think, speculate, plan, consider
750	654E(1);6559(5);6559(2)			# teach
751	6559(1);6559(5);654E(2)			# teach, class
752	6DF8(1);6E05(5);6E05(2)			# clear
753	6E05(1);6E05(5);6DF8(2)			# clear, pure, clean; peaceful
754	771E(1);771F(5);771F(2)			# real, actual, true, genuine
755	771F(1);771F(5);771E(2)			# real, actual, true, genuine
756	8054(1);8054(3);806F(2)			# connect, join; associate, ally
757	806F(1);8054(3);8054(2),8068(2)		# connect, join; associate, ally
758	96C6(1);96C6(5);			# assemble, collect together

760	b) language variants table for zh-tw

762	Reference 1 CP950 (commonly known as BIG5)
763	Reference 2 zVariant, zTradVariant, zSimpVariant in Unihan.txt
764	Reference 3 List of Simplified Character Table (Traditional column)
765	Reference 4 zTradVariant in Unihan.txt

767	Version 1 20020701 # July 2002

769	5718(1);5718(4);56E2(2),56E3(2)		# sphere, ball, circle; mass, lump
770	60F3(1);60F3(1);			# think, speculate, plan, consider
771	6559(1);6559(1);654E(2)			# teach, class
772	6E05(1);6E05(1);6DF8(2)			# clear, pure, clean; peaceful
773	771F(1);771F(1);771E(2)			# real, actual, true, genuine
774	806F(1);806F(3);8054(2),8068(2)		# connect, join; associate, ally
775	96C6(1);96C6(1);			# assemble, collect together

777	c) language variants table for ja

779	Reference 1 CP932 (commonly known as Shift-JIS)
780	Reference 2 zVariant in Unihan.txt
781	Reference 3 variant that exists in JIS X0208, commonly used Kanji

783	Version 1 20020701 # July 2002

785	5718(1);5718(3);56E3(2)			# sphere, ball, circle; mass, lump
786	60F3(1);60F3(3);			# think, speculate, plan, consider
787	654E(1);6559(3);6559(2)			# teach
788	6559(1);6559(3);654E(2)			# teach, class
789	6DF8(1);6E05(3);6E05(2)			# clear
790	6E05(1);6E05(3);6DF8(2)			# clear, pure, clean; peaceful
791	771E(1);771E(1);771F(2)			# real, actual, true, genuine
792	771F(1);771F(1);771E(2)			# real, actual, true, genuine
793	806F(1);806F(1);8068(2)			# connect, join; associate, ally
794	96C6(1);96C6(3);			# assemble, collect together

796	d) language variants table for ko

798	Reference 1 CP949 (commonly known as EUC-KR)
799	Reference 2 zVariant in Unihan.txt

801	Version 1 20020701 # July 2002

803	5718(1);56E2(1);56E3(2)			# sphere, ball, circle; mass, lump
804	60F3(1);60F3(1);			# think, speculate, plan, consider
805	654E(1);6559(1);6559(2)			# teach
806	6DF8(1);6E05(1);6E05(2)			# clear
807	771E(1);771F(1);771F(2)			# real, actual, true, genuine
808	806F(1);8054(1);8068(2)			# connect, join; associate, ally
809	96C6(1);96C6(1);			# assemble, collect together

811	Example 1: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
812	           {L} = {zh-cn, zh-sg, zh-tw}

814	NP(IN) = (U+6E05 U+771F U+6559)
815	PV(IN,zh-cn) = (U+6E05 U+771F U+6559)
816	PV(IN,zh-sg) = (U+6E05 U+771F U+6559)
817	PV(IN,zh-tw) = (U+6E05 U+771F U+6559)
818	{ZV} = {(U+6E05 U+771F U+6559)}
819	{RV} = {(U+6E05 U+771E U+6559),
820	        (U+6E05 U+771E U+654E),
821	        (U+6E05 U+771F U+654E),
822	        (U+6DF8 U+771E U+6559),
823	        (U+6DF8 U+771E U+654E),
824	        (U+6DF8 U+771F U+6559),
825	        (U+6DF8 U+771F U+654E)}

827	Example 2: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
828	           {L} = {ja}

830	NP(IN) = (U+6E05 U+771F U+6559)
831	PV(IN,ja) = (U+6E05 U+771F U+6559)
832	{ZV} = {(U+6E05 U+771F U+6559)}
833	{RV} = {(U+6E05 U+771E U+6559),
834	        (U+6E05 U+771E U+654E),
835	        (U+6E05 U+771F U+654E),
836	        (U+6DF8 U+771E U+6559),
837	        (U+6DF8 U+771E U+654E),
838	        (U+6DF8 U+771F U+6559),
839	        (U+6DF8 U+771F U+654E)}

841	Example 3: IDL = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
842		   {L} = {zh-cn, zh-sg, zh-tw, ja, ko}

844	NP(IN) = (U+6E05 U+771F U+6559) *qing2 zhen1 jiao4*
845	Invalid registration because U+6E05 is invalid in L = ko

847	Example 4: IDL = (U+806F U+60F3 U+96C6 U+5718)
848	                 *lian2 xiang3 ji2 tuan2*
849	           {L} = {zh-cn, zh-sg, zh-tw}

851	NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
852	PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
853	PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
854	PV(IN,zh-tw) = (U+806F U+60F3 U+96C6 U+5718)
855	{ZV} = {(U+8054 U+60F3 U+96C6 U+56E2),
856		(U+806F U+60F3 U+96C6 U+5718)}
857	{RV} = {(U+8054 U+60F3 U+96C6 U+56E3),
858	        (U+8054 U+60F3 U+96C6 U+5718),
859	        (U+806F U+60F3 U+96C6 U+56E2),
860	        (U+806f U+60F3 U+96C6 U+56E3),
861	        (U+8068 U+60F3 U+96C6 U+56E2),
862	        (U+8068 U+60F3 U+96C6 U+56E3),
863	        (U+8068 U+60F3 U+96C6 U+5718)

865	Example 5: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
866			 *lian2 xiang3 ji2 tuan2*
867		     {L} = {zh-cn, zh-sg}

869	NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
870	PV(IN,zh-cn) = (U+8054 U+60F3 U+96C6 U+56E2)
871	PV(IN,zh-sg) = (U+8054 U+60F3 U+96C6 U+56E2)
872	{ZV} = {(U+8054 U+60F3 U+96C6 U+56E2)}
873	{RV} = {(U+8054 U+60F3 U+96C6 U+56E3),
874	        (U+8054 U+60F3 U+96C6 U+5718),
875	        (U+806F U+60F3 U+96C6 U+56E2),
876	        (U+806f U+60F3 U+96C6 U+56E3),
877	        (U+806F U+60F3 U+96C6 U+5718),
878	        (U+8068 U+60F3 U+96C6 U+56E2),
879	        (U+8068 U+60F3 U+96C6 U+56E3),
880	        (U+8068 U+60F3 U+96C6 U+5718)}

882	Example 6: IDL = (U+8054 U+60F3 U+96C6 U+56E2)
883	 		 *lian2 xiang3 ji2 tuan2*
884		   {L} = {zh-cn, zh-sg, zh-tw}

886	NP(IN) = (U+8054 U+60F3 U+96C6 U+56E2)
887	Invalid registration because U+8054 is invalid in L = zh-tw

889	Example 7: IDL = (U+806F U+60F3 U+96C6 U+5718)
890	  		 *lian2 xiang3 ji2 tuan2*
891		   {L} = {ja,ko}

893	NP(IN) = (U+806F U+60F3 U+96C6 U+5718)
894	PV(IN,ja) = (U+806F U+60F3 U+96C6 U+5718)
895	PV(IN,ko) = (U+806F U+60F3 U+96C6 U+5718)
896	{ZV} = {(U+806F U+60F3 U+96C6 U+5718)}
897	{RV} = {(U+806F U+60F3 U+96C6 U+56E3),
898		(U+8068 U+60F3 U+96C6 U+5718),
899	        (U+8068 U+60F3 U+96C6 U+56E3)}

901	i. Notes

903	1. The terms "i18n" and "l10n", sometimes used in upper-case form (i.e.,
904	"I18N" and "L10N"), have become popular in international standards
905	usage as abbreviations for "internationalization" and "localization",
906	respectively.  The abbreviations were derived by using the first and
907	last letters of the words, with the number of characters that appear
908	between them.  I.e., in "internationalization", there are 18 characters
909	between the initial "i" and the terminal "n".

911	2. Every human language is unique and therefore, every linguistic and
912	localization issue is also unique. It is difficult or impossible to
913	make comparisons across multiple languages or to classify them into
914	categories.  And any cross-language analogies are, by their very nature,
915	imperfect at best.

917	For example, to classify Traditional Chinese/Simplified Chinese as
918	upper/lower case makes as much sense as to classify TC/SC as "spelling
919	variant" like "color" and "colour". Both comparisons are potentially
920	useful but neither is completely correct.

922	3. The variants in CJK are very complex and require many different
923	layers of solution. This guideline is a one of the solution components,
924	but not sufficient, by itself, to solve the whole problem.

926	ii. Acknowledgements

928	The authors gratefully acknowledge the contributions of:

930	V.CHEN, N.HSU, H.HOTTA, S.TASHIRO, Y.YONEYA and other Joint Engineering
931	Team members at the JET meeting in Bangkok.

933	Yves Arrouye, an observer at the JET meeting, for his contribution on
934	the IDL Package.

936	Soobok LEE
937	L.M TSENG
938	Patrik FALTSTROM
939	Paul HOFFMAN
940	Erin CHEN
941	LEE Xiaodong
942	Harald ALVESTRAND

944	iii. Author(s)

946	James SENG
947	PSB Certification
948	3 Science Park Drive
949	#03-12 PSB Annex
950	Singapore 118233
951	Phone: +65 6885-1657
952	Email: jseng@pobox.org.sg

954	Kazunori KONISHI
955	JPNIC
956	Kokusai-Kougyou-Kanda Bldg 6F
957	2-3-4 Uchi-Kanda, Chiyoda-ku
958	Tokyo 101-0047
959	JAPAN
960	Phone: +81 49-278-7313
961	Email: konishi@jp.apan.net

963	Kenny HUANG
964	TWNIC
965	3F, 16, Kang Hwa Street, Taipei
966	Taiwan
967	TEL : 886-2-2658-6510
968	Email: huangk@alum.sinica.edu

970	QIAN Hualin
971	CNNIC
972	No.6 Branch-box of No.349 Mailbox, Beijing 100080
973	Peoples Republic of China
974	Email: Hlqian@cnnic.net.cn

976	KO YangWoo
977	PeaceNet
978	Yangchun P.O. Box 81 Seoul 158-600
979	Korea
980	Email: newcat@peacenet.or.kr

982	John C KLENSIN
983	1770 Massachusetts Ave, No. 322
984	Cambridge, MA 02140
985	USA
986	Email: Klensin+ietf@jck.com

988	iv.  Appendix A

990	[How to read the Han Ideograph provided in this document. --  Will
991	complete this section in next revision]

993	v. Normative References

995	[ABNF]	    Augmented BNF for Syntax Specifications: ABNF, RFC 2234, D.
996	            Crocker and P. Overell, Eds., November 1997.

998	[I18NTERMS] Terminology Used in Internationalization in the IETF,
999	            draft-hoffman-i18n-terms-07.txt, September 2002,
1000	            Paul Hoffman, work in progress

1002	[RFC3066]   Tags for the Identification of Languages, RFC3066,
1003	            Jan 2001, H. Alvestrand

1005	[IDNA]	    Internationalizing Domain Names in Applications,
1006	            draft-ietf-idn-idna, Feb 2002, Patrik Faltstrom,
1007		        Paul Hoffman, Adam M. Costella, work in progress

1009	[PUNYCODE]  Punycode: An encoding of Unicode for use with IDNA,
1010		        draft-ietf-idn-punycode, Feb 2002, Adam M. Costello,
1011		        work in progress

1013	[STRINGPREP]Preparation of Internationalized Strings,
1014		        draft-hoffman-stringprep, Feb 2002, Paul Hoffman,
1015		        Marc Blanchet, work in progress

1017	[NAMEPREP]  Nameprep: A Stringprep Profile for Internationalized
1018		        Domain Names, work in progress, draft-ietf-idn-nameprep,
1019		        Feb 2002, Paul Hoffman, Marc Blanchet, work in progress

1021	[UNIHAN]    Unicode Han Database, Unicode Consortium
1022	            ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt

1024	[UNICODE]   The Unicode Consortium, "The Unicode Standard -- Version
1025		        3.0", ISBN 0-201-61633-5. Unicode Standard Annex #28,
1026	            (http://www.unicode.org/unicode/reports/tr28/) defines
1027		        Version 3.2 of The Unicode Standard.

1029	[ISO7098]   ISO 7098;1991 Information and documentation -- Romanization
1030	            of Chinese, ISO/TC46/SC2.

1032	vi. Non-normative References

1034	[IDN-WG]    IETF Internationalized Domain Names Working Group,
1035	            idn@ops.ietf.org, James Seng, Marc Blanchet.
1036	            http://www.i-d-n.net/

1038	[STD13]	    Paul Mockapetris, "Domain names - concepts and facilities"
1039		        (RFC 1034) and "Domain names - implementation and
1040		        specification" (RFC 1035), STD 13, November 1987.

1042	[C2C]	    Pitfalls and Complexities of Chinese to Chinese Conversion,
1043	            http://www.cjk.org/cjk/c2c/c2c.pdf, Jack Halpern, Jouni
1044	            Kerman

1046	vii. Other Issues

1048	It is possible that many variants generated may have no meaning in the
1049	associated language or languages.  The intention is not to generate
1050	meaningful "words" but to generate similar variants to be reserved.

1052	The language Character Variants tables are critical to the success of
1053	the guideline.  A badly designed table may either generate too many
1054	meaningless variants or may not generate enough meaningful variants.
1055	The principles to be used to generate the tables are not within the
1056	scope of this document, nor are the tables themselves.

1058	This document recommends against registration of IDL in a particular
1059	language until the language character variants table for that language
1060	is available.

1062	Outstanding Issues

1064	(1)	Erin suggested (if I (JcK) correctly understood her) that, if
1065	multiple languages are associated with a given name, the recommended
1066	variant list for a given code point be treated as the intersection of
1067	the variant lists for each of the languages, not the union.  As I
1068	understand the current algorithm, it effectively takes the union.
1069	Taking the intersection has the technical advantage that it would
1070	significantly reduce the number of variant strings that must be
1071	reserved.  It also has the policy advantage of discouraging people
1072	from registering with multiple languages if they don't need to -
1073	otherwise, we will have everyone trying to register in all of the
1074	possibly-relevant languages, which would make this effort a good deal
1075	less effective than it might be.

1077	Taking the intersection is also consistent with a rule that appears to
1078	exist now.  As shown in Example 3, if an attempt is made to register a
1079	name and associate it with multiple languages, it must be valid in all
1080	of those languages or the registration attempt will fail.  So we
1081	intersect the validity criteria on a language basis, and should
1082	probably intersect the variants.

1084	But that is an algorithm change, since we have to extract the variant
1085	lists for each code point for each language, take the intersection,
1086	and then process against that, rather than against each language in
1087	turn.

1089	[JS - I disagree in taking the intersection of the set. No doubt by
1090	doing intersection we will reduce the abuse of specifying multiple
1091	language to increase the set of reserved variants, our goal is
1092	precisely to reserve as much variants as possible for the domain name
1093	holder, not vice versa.

1095	Suppose we have a string ABC with variants ABD ACD ABF in Chinese, ABE
1096	ACD in Japanese and CBD ACD in Korean.

1098	Assuming a registrant register ABC in CJK, right now he will get the
1099	reserved set of {ABC, ACD, ABF, ABE, CBD}.

1101	On the other hand, if we do intersection, this set will be reduced to
1102	{ACD}, leaving other variants like ABF, ABE and CBD open for potential
1103	conflict. And the only way he can protect this confusion is to
1104	individually register ABF, ABE and CBD manually individually,
1105	something we trying to prevent.]

1107	[Further explanation by Erin:

1109	I'm sorry maybe my previous suggestion is not clear enough.

1111	I mean if multiple languages are associated with a given nanme, the
1112	range of valid code point sould be the intersection of all the
1113	associated languages.

1115	But, if multiple languages are associated with a given nanme, the
1116	recommended variants should be take the union and put into zone file.
1117	The same, the character variant code also sould be take the union for
1118	each of the languages.]

1120	(2)	A note went by indicating that the plan was to drop the Han
1121	characters from the IETF-submission version of this document.  We can
1122	post I-Ds in PDF and publish RFCs in PDF and/or Postscript, as long as
1123	we provide ASCII.   I find having the Han characters very useful, and
1124	trust that those of you who can read them find them even more so.  So
1125	I would suggest that we hand off the pair of an ASCII document (with
1126	the Han characters removed) and a PDF document (that looks like the
1127	Word text we have been looking it) to the I-D editor.  I've got full
1128	Acrobat here and can presumably produce the thing if needed.

1130	(3)	We still need to sort out the issue of whether reserving a
1131	variant that may (in a current or future table) conflict with another
1132	character, with the possibility of activating it is an invitation to
1133	cybersquatting and other abuses.  That isn't clear, let me try an
1134	illustration: suppose we have a character X, with variants A, B, and C,
1135	and a character Y, with variants D and C.  Now, if Y is registered
1136	first, then its package includes {Y*, D, C}, using the symbol "*" to
1137	denote an active name.  When X is registered, its package consists of
1138	{X, A, B}.  X's owner can't reserve or activate C, since it was
1139	reserved to Y.  But much of the reason for doing all of this work was
1140	the concern that C can be confused with either Y or X.  So doesn't
1141	this create an opportunity for Y to threaten, or extort money from, X
1142	by threatening to activate C?

1144	[JS -- The conflict of X & Y over C in this case could be resolved by
1145	existing conflict policy. The revised guideline now makes it possible
1146	to modify the IDL Package in the event of dispute]

1148	That problem gets worse, I think, if Erin's suggestion in (1) is not
1149	adopted.  And I continue to believe that the only solution that will
1150	work is to prevent anyone from activating C.  Or, more generally, at
1151	any given time, there will be a set of language variant tables that
1152	will be considered valid by the administrator of a particular zone.
1153	The zone administrator would take the union of all of those tables,
1154	using the 'valid code point' as the key as usual, and then permanently
1155	reserve any character that appeared most than once in a variant column.
1156	Small matter of programming.

1158	(4)	In page 9, on the paragraph starting with "The character
1159	variant(s) column contains ..."

1161	Page: 21
1162	This seems to be saying that the code points listed in the third
1163	column will always be a proper superset of the union of the first and
1164	second columns.  If that is correct, it violates a fundamental
1165	principle that I was taught about  good programming and systems design
1166	-- minimization of duplication of information, since such duplicates
1167	are error-prone.   And, if I have not interpreted the intent correctly,
1168	the text needs to be fixed.  Somehow.

1170	[JS -- correct, it is duplicated. The duplication is bad from
1171	system design view but it makes it 'complete' and easy to explain.]