idnits 2.17.1 draft-hoffman-idn-reg-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document is more than 15 pages and seems to lack a Table of Contents. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 819 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 4 characters in excess of 72. == There are 4 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 16, 2003) is 7497 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3490 (ref. 'IDNA') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (ref. 'NAMEPREP') (Obsoleted by RFC 5891) Summary: 6 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman 2 draft-hoffman-idn-reg-02.txt IMC & VPNC 3 October 16, 2003 4 Expires in six months 5 Intended status: Experimental 7 A Method for Registering Internationalized Domain Names 9 Status of this Memo 11 This document is an Internet-Draft and is in full conformance with all 12 provisions of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering Task 15 Force (IETF), its areas, and its working groups. Note that other groups 16 may also distribute working documents as Internet-Drafts. 18 Internet-Drafts are draft documents valid for a maximum of six months 19 and may be updated, replaced, or obsoleted by other documents at any 20 time. It is inappropriate to use Internet-Drafts as reference material 21 or to cite them other than as "work in progress." 23 The list of current Internet-Drafts can be accessed at 24 http://www.ietf.org/ietf/1id-abstracts.txt 26 The list of Internet-Draft Shadow Directories can be accessed at 27 http://www.ietf.org/shadow.html. 29 Abstract 31 This document describes some suggested practices for registering 32 internationalized domain names (IDNs) in a zone. Before accepting 33 registrations of domain names into a zone, the zone's registry should 34 decide which codepoints in the Unicode character set the zone will 35 accept. The registry should also decide whether particular characters in 36 a registered domain name should cause registration of multiple 37 equivalent domain names; these domain names might be added to the zone 38 or blocked from registration. This document also describes how to handle 39 character variants in registering IDNs, and how to publish tables that 40 list the character variants. 42 1. Introduction 44 IDNA [IDNA] specifies an encoding of characters from the Unicode 45 character set [UNICODE] which is backwards-compatible with the current 46 definition of hostnames. This implies that domain names encoded 47 according to IDNA will be able to be transported between peers using any 48 existing protocol, including DNS. 50 IDNA, through its requirement of Nameprep [NAMEPREP], uses tables that 51 are based only on the characters themselves; no attention is paid to the 52 intended language (if any) for the domain name. However, for many domain 53 names, the intended language of one or more parts of the domain name 54 actually does matter to the registry for the names and to users. 56 If there are no constraints on registration in a zone, people can 57 register characters that increase the risk of misunderstandings, 58 cybersquatting, and other forms of confusion. A similar situation exists 59 despite the introduction of IDNA exemplified by domain names such as 60 example.com and examp1e.com (note that the latter domain contains the 61 digit "1" instead of the letter "l"). 63 For some human languages, there are characters and/or strings that have 64 equivalent or near-equivalent usages. If someone is allowed to register 65 a name with such a character or string, the registry might want to 66 automatically associate all the names that have the same meaning with 67 the registered name. The registry can also decide if the names that came 68 from one registration should go into the zone, be blocked from other 69 people registering them, or a combination of these two actions. 71 This document describes three things: 73 - suggested practices for describing character variants 75 - a method for using a zone's character variants to determine which 76 names should be associated with a registration 78 - a format for publishing a zone's table of character variants 80 [IDN-CJK] offers a somewhat different proposal to the problem of 81 registration policy. That document uses a different registration 82 philosophy than what is described here, and is focused on a small number 83 of Asian languages. 85 1.1 Main concepts 87 [[ Need to outline what is presented in the proposal. Include: 89 - registration bundles keyed on the base registration 91 - bundles can overlap 93 - some names are in the zone, others only block 95 - does not prohibit human processing, but does not encourage it 97 - tables based on languages 98 ]] 100 1.2 Shortcomings 102 This document does not deal with how to handle whois data for associated 103 registrations, and does not deal with registrar-registry protocols. 104 These topics are likely to be of great importance to registries and 105 registrants, and should be dealt with in other documents. 107 This document deals directly only with variants of single characters, 108 not variants of strings (although variants themselves can be strings). 109 Thus, the methods described here is not be sufficient to help all 110 languages. Registries which cover languages where it would make 111 linguistic sense to create variants from strings should define their own 112 rules for doing so. 114 The procedures described here do not take into account mapping that is 115 dependant on the position of characters in a domain name. Many languages 116 (such as Hebrew and Greek) have rules that would cause different 117 variants to be used based on whether a character appears at the 118 beginning or end of a word, or whether a character appears next to a 119 specific character. Adding rules for these kinds of mappings are 120 possible, but difficult. Not only would the table format need to be 121 expanded to deal with positional variants, the order in which the 122 characters are tested for whether they create variants would also have 123 to be specified. 125 1.3 Terminology 127 Characters in this document are given as their Unicode codepoints on 128 U+xxxx format or with their official names. 130 The following terms are used in this document. 132 A "string" is an sequence of one or more characters. 134 This document discusses characters that have equivalent or 135 near-equivalent characters or strings. The "base character" is the 136 character that has one or more equivalents. The "variant(s)" are the 137 character(s) and/or string(s) that are equivalent to the base character. 138 Note that these might not be true equivalent characters: a base 139 character might have a mapping to a particular variant character, but 140 that variant character does not have to have a mapping to the base 141 character. 143 The "base registration" is the single name that the registrant requested 144 from the registry. 146 A "registration bundle" is the set of all labels that comes from 147 expanding the base characters for a single name into their variants. 149 A "registry" is the administrative authority for a DNS zone. That is, 150 the registry is the body that makes and enforces policies that are used 151 in a particular zone in the DNS. 153 2. Starting to add IDNs to a zone 155 There are four primary considerations when adding IDNs to a zone: 157 - Which characters should be allowed to be registered 159 - If any of the characters that are allowed to be registered have 160 variants that should affect the registration process 162 - How registration bundles are created and maintained 164 - If there are registration bundles, how they will affect the zone 165 itself and future registrations 167 2.1 Choosing characters that may be registered 169 A zone has to decide which characters are allowed to be registered. 170 Before IDNA was standardized, the only characters allowed were the ASCII 171 letters, digits, and the hyphen character. With IDNA, that list is much 172 larger. 174 The first decision for a zone is whether or not they want to allow 175 IDNA-based labels. If not, they can simply prohibit any label that 176 begins with the IDNA ACE prefix "xn--". Zones with this policy can 177 safely ignore the rest of this document. 179 If a zone decides to allow IDNA-based labels, it needs to decide which 180 characters are allowed to be registered. It further needs to decide 181 which characters are allowed to be in the zone, and which characters can 182 be registered but not appear in the zone. 184 Some options for what zones will want to include are: 186 - the ASCII characters plus just enough characters to represent just one 187 language 189 - just enough characters to represent a small number of languages 191 - enough characters to represent many languages 193 - any character allowed by IDNA 195 The decision on what to include may be influenced by administrative 196 issues for the zone, such as languages that are normally associated with 197 the zone, or agreements that the zone has made with governmental bodies 198 or other organizations. For example, ICANN has a set of rules on how 199 some top-level domains must act with respect to IDNs [ICANN-IDN]. A zone 200 does not need to declare which languages it does or does not allow in 201 the names in its zone, but making such a declaration makes it clearer to 202 registrants what characters the zone does and does not allow. 204 It is strongly recommended that a registry act conservatively when 205 starting accepting IDNA-based domain names, even if the registry does 206 not use the ideas described in this document. Registries should start 207 with the smallest number of characters as possible to represent the 208 needs of the zone's registrants. If a registry follows the advice in 209 this document, more characters can be added to the zone later, but once 210 characters are labels that are in a zone, they cannot be removed without 211 causing a lot of administrative problems. The most notable problem with 212 making some characters not allowed in names is that a registry could be 213 forced to remove actively-used names from its zone, thereby causing 214 instability for users of the zone and angering the names' owners. 216 2.2 Choosing variants 218 The area of character variants is rife with problems. There is no 219 universal agreement about which base characters have variants, or if 220 they do, what those variants are. For example, in some regions of the 221 world and in some languages, LATIN SMALL LETTER O WITH DIAERESIS and 222 LATIN SMALL LETTER O WITH STROKE are variants of each other, while in 223 other regions, most people would think that LATIN SMALL LETTER O WITH 224 DIAERESIS has no variants. In some cases, the list of variants is 225 difficult to enumerate. For example, it has taken years for the Chinese 226 language community to create variant tables for use in IDNA, and the 227 tables are not widely-accepted at the time of this writing. 229 Thus, the first thing a registry should ask is whether or not any of the 230 characters that they want to use have variants. If not, the registry's 231 work is much simpler. This is not to say that a registry should ignore 232 variants if they exist: adding variants after a registry has started to 233 take registrations is nearly as difficult administratively as removing 234 characters from the list of acceptable characters. That is, if a 235 registry later decides that two characters are variants of each other, 236 and there are actively-used names in the zones that differ only on the 237 new variants, the registry might have to transfer ownership of one of 238 the names to a different owner. 240 The list of character variants used in a zone should be stable. Although 241 it is possible to add variants for characters later, doing so can cause 242 confusing with registrants. 244 Of course, zone managers should inform all current registrants when the 245 registration policy for the zone changes. This includes when IDN 246 characters are allowed in the zone the first time, when characters are 247 added later, and when character variant tables change. 249 In many languages there are two variants for a character, but one 250 variant is strongly preferred. A registry might only allow the base 251 registration in the preferred form, or it might allow any form for the 252 base registration. If the variant tables are created carefully, the 253 resulting bundles will be the same, but some registries will give 254 special status to the base registration such as its appearance in whois 255 databases. 257 2.3 Creation and maintenance of registration bundles 259 Another ramification of having variants is that they will cause zones to 260 have bundles of names. Describing registration bundles to typical 261 registrants will be a very difficult task. (Many current registries have 262 a hard time explaining to registrants what they can or cannot do with 263 their single registrations.) It is likely that registrants can better 264 understand this by having the bundle be identified by the base 265 registration. 267 A registration bundle must be maintained as a single unit. This is not 268 to say that each names in a bundle is treated the same, but that the 269 administration of each name should be done in the context of the entire 270 bundle. Different names in a bundle should not have different 271 administrators. 273 Adding additional IDN characters to a zone where some or all labels in a 274 registration bundle are resolved in the zone. A registrant who had a 275 single name could become the owner of group of names, and would be 276 expected to manage that group of names according to the zone's policies. 277 Because managing a group of names is inherently more difficult than 278 managing a single name, zone administrators need to avoid creating new 279 rules that would force current registrants to change the way the manage 280 their zones. 282 2.3.1 Overlapping registration bundles 284 Depending on how the registry creates its tables, it is possible for 285 registration bundles to overlap, meaning that two different people own 286 rights to a name. This can cause significant problems for the registry 287 in explaining to users what their rights are for names that contain 288 variants of the name they registered. 290 Clearly, a registrant cannot register a name that already exists as the 291 base registration for another bundle. However, a registrant can register 292 a name which has a variant that exists in the bundle of an existing 293 registration. That is, bundles can have names that are the same, but the 294 zone can never have two different entries for the same name. 296 The basic registration rule for most zones is "first come, first served 297 unless the registration is misleading". That rule should probably be 298 extended to the names in a registration bundle. That is, the first 299 registrant whose bundle contains a name gets to have rights over that 300 name. Because of this, the registry must associate a registration date 301 with each bundle. 303 When a second bundle is created that contains a variant name that 304 already exists in an earlier bundle, the registry can inform the new 305 registrant that not all of the names in its bundle are usable in the 306 same fashion. Further, if the registrant of the first bundle allows its 307 registration to expire while the second bundle still exists, the owner 308 of the second bundle gains control over the overlapping names which 309 before were controlled by the owner of the first bundle. 311 2.4 Choosing how to use variants 313 A registry has three options for how to handle the case where the 314 registration bundle has more than one label. The policy options are: 316 1) Resolve all labels in the zone, making the zone information identical 317 to that of the registered label. 319 2) Block all labels other than the registered label so they cannot be 320 registered in the future. 322 3) Resolve some labels and block some other labels. 324 In all cases, at least the registered label should appear in the zone. 325 It would be almost impossible to describe to name owners why the name 326 that they asked for is not in the zone, but some other name that they 327 now control is. 329 2.4.1 Advantages and disadvantages of the options 331 Option 1 will cause end users to be able to find names with variants 332 more easily, but will result in larger zone files. For some language 333 tables, the zone file could become so large that it could negatively 334 affect the ability of the registry to perform name resolution. If the 335 base registration contains several characters that have equivalents, the 336 owner could end up having to take care of large number of zones. For 337 instance, if DIGIT ONE is a variant of LATIN SMALL LETTER L, the owner 338 of the domain name all-lollypops.example.com will have to manage 32 339 zones. 341 Option 2 does not increase the size of the zone file, but it may cause 342 end users to not be able to find names with variants that they would 343 expect. If the base registration contains characters that have 344 equivalents, Internet users who don't know what the base characters used 345 in the registration will not know what character to type in to get a DNS 346 response. For instance, if DIGIT ONE is a variant of LATIN SMALL LETTER 347 L, and LATIN SMALL LETTER L is a variant of DIGIT ONE, the user who sees 348 "pale.example.com" will no know whether to type a "1" or a "l" after the 349 "pa" in the first label. 351 Option 3 is likely to cause the most confusion with users because 352 including some variants will cause a name to be found, but using other 353 variants will cause the name to be not found. For example, even if 354 people understood that DIGIT ONE and LATIN SMALL LETTER L were variants, 355 a typical DNS user wouldn't know which character to type because they 356 wouldn't know whether this pair were allocating variants or blocking 357 variants. However, this option can be used to balance the desires of the 358 name owner (that every possible attempt to enter their name will work) 359 with the desires of the zone administrator (to make the zone more 360 manageable and possibly to be compensated for greater amounts of work 361 needed for a single registration). 363 2.4.2 Operational characteristics 365 With any of these three options, the registry must keep a database that 366 links each label in the registration bundle to the base registration. 367 This link needs to be maintained so that changes in the non-DNS 368 registration information (such as the label's owner name and address) is 369 reflected in every member of the registration bundle as well. 371 If the registry chose option 1, when the zone information for the base 372 registration changes, the zone information for all the members of the 373 registration bundle must change in exactly the same way. The zone 374 information for every member of the registration bundle must remain 375 identical as long as any of the members of the registration bundle 376 remain in the zone. A registry can keep the zone information for the 377 registration bundle identical using a database, or using DNAME records, 378 or using a combination of the two. 380 If the registry chose option 2, when the zone information for the base 381 registration changes, the blocked information for all the members of the 382 registration bundle must be identical to that of the base registration, 383 and must remain identical as long as the base registration remains in 384 the zone. A registry can keep the zone and blocked name information for 385 the registration bundle identical using a database. 387 If the registry chose option 3, it must use an unspecified method to 388 keep the elements in the registration bundle cohesive. Because of the 389 administrative difficulty involved, this option must only be used under 390 carefully-controlled circumstances. Further, the rules for which names 391 in the bundle appear in the zone and which are blocked must be explained 392 to name owners. It is particularly important to explain the 393 ramifications of overlapping registration bundles, if the registry's 394 variant policies allow their creation. 396 3. Language-based tables 398 The registration strategy described in this document uses a table that 399 lists all characters allowed for input and any variants of those 400 characters. Note that the table lists all characters allowed, not only 401 the ones that have variants. 403 Each table is specific to a single language or a specified group of 404 languages. Although a multi-language table can be produced, it may be 405 simpler to keep each table language-specific and only use the table for 406 the language of the desired registration. For example, it is probably 407 easy to create a single table that would handle both Japanese and 408 French; it would probably be much harder to create a single table that 409 would handle both Arabic and Persian. 411 It is widely expected that there will be different tables for a single 412 language or group of languages created by different people. Many 413 languages are spoken in many different countries, and each country might 414 have a different view of which characters should or should not be 415 considered needed for that language. For example, some people would say 416 that the Latin characters are needed for various Indic languages, while 417 others would say that they are not. 419 A table that covers a wide variety of languages will probably allow a 420 much wider range of characters to be used in names. At the same time, 421 that table cannot easily use character variants because variants for one 422 language will be different from the variants used in a different 423 language. To handle conflicting variants among languages, the registry 424 can choose to have no variants for any base characters, or can choose to 425 have variants for a subset of the languages that are expressible in the 426 characters allowed. 428 The set of processing rules for a language has to be carefully crafted 429 so that all expected variants will be created, and no unexpected 430 variants are created. The procedure in this document assumes that the 431 zone uses just one table when registering a particular name, but a set 432 of tables that are searched in a specified order can be treated like a 433 larger table with a processing order. 435 3.1 Intended language for a registration 437 In order to use a language-based table for processing, the registry has 438 to know the language of the name being registered. This information 439 could come by asking the registrant, or by the fact that a registry has 440 rules that only allows a single language. However, the requirement of 441 knowing the intended language leads to a very difficult problem: many 442 valid domain names have no inherent language. Examples of domain names 443 that do not have a language include: 445 - trade names and family names 447 - names that are acronyms, all-numeric or a combination of the two 448 ("jln", "3000", "jln3000") 450 - names that purposely have more than one language in them 451 ("neuvo-ramen") 453 - proper names that are made up ("glowow") 455 4. Registration procedure 457 This procedure has three inputs: 459 - the proposed base registration 461 - the language for the proposed base registration 463 - the processing table associated with that language 465 The output of the process is either failure (the base registration 466 cannot be registered at all), or a registration bundle that contains one 467 or more labels ( always including the base registration). As described 468 earlier, the registration bundle should be stored with its date of 469 creation so that issues with overlapping elements between bundles can 470 later be resolved on a first-come, first-served basis. 472 There are two steps to processing the registration: 474 1) Check whether the proposed base registration exists in any bundle. If 475 it does, stop immediately with a failure. 477 2) Process the base registration with the CreateBundle process described 478 below. 480 Note that the process must be executed only once. The process must not 481 be run on any output of the process, only on the proposed base 482 registration. 484 4.1 Human intervention in registration 486 Some registries will want registration to be completely automatic, that 487 is, with no human intervention. Other registries will want to have human 488 intervention (or at least checking) of registrations. For example, if a 489 registry has a rule that registration cannot have harmful words, that 490 registry needs to have a human check each registration. Another example 491 where human intervention would be needed is a registry that allows 492 multiple languages in its zone but does not trust the registrants to say 493 the intended language of a registration. 495 A table should not have more than one entry for a particular base 496 character. A table with more than one variant rule for the same base 497 character requires that some names be evaluated by humans and will open 498 the registration process to dispute. Such human intervention in the 499 registration process may be unavoidable for some languages or for some 500 registries, but it should be avoided if there is a desire for 501 predictability in the registration process. 503 The description below does not specify where human intervention would 504 happen because there are so many possibilities, based on the type of 505 checking that a registry might want. It makes sense that a check might 506 be made before step 1 or step 3 to be sure that the base registration 507 meets any semantic rules for the zone, and that the intended language is 508 in fact appropriate for the base registration. Some registries might 509 also want another set of checks after step 5 to be sure that all the 510 entries in the bundle are semantically appropriate for the zone. For 511 example, if a zone prohibits mixed-script registrations, that check 512 should be made both before step 1 (to check the base registration) and 513 after step 5 (to check whether the variant-creation step created any 514 mixed-script items in the bundle). 516 4.2 Description of CreateBundle 518 The CreateBundle process determines if a registration bundle can be 519 created and, if so, fills that bundle only with valid labels. 521 During the processing, an "temporary bundle" contains partial labels, 522 that is, labels that are being built and are not complete labels. The 523 partial labels in the temporary bundle consist of strings. 525 The steps in the CreateBundle process are: 527 1) Split the base registration into individual characters, called 528 "candidate characters". Compare every candidate character against the 529 base characters in the table. If any candidate character does not exist 530 in the set of base characters, the system must stop and not register any 531 names (that is, it must not register either the base registration or any 532 labels that would have come from character variants). 534 2) Perform the steps in ToASCII for the base registration. If ToASCII 535 fails for the base registration, the system must stop and not register 536 any of the label (that is, it must not register either the base 537 registration or any created labels, even if those labels would have 538 passed ToASCII). If ToASCII succeeds, add the result to the registration 539 bundle. 541 3) For every candidate character in the base registration, do the 542 following: 544 3a) Create the set of characters that consists of the candidate 545 character and any variants. 547 3b) For each character in the set from step 3a, duplicate the 548 temporary bundle that resulted from the previous candidate character, 549 and add the new character to the end of each partial label. 551 4) The temporary bundle now contains zero or more labels that consist of 552 Unicode characters. For every label in the temporary bundle, do the 553 following: 555 4a) Process the label with ToASCII to see if ToASCII succeeds. If it 556 does, put the label into the registration bundle. Otherwise, do not 557 process this label from the temporary bundle any further; it will not 558 go into the registration bundle. 560 5) The resulting registration bundle with the base registration and 561 possibly other labels. Finish. 563 4.3 Example of expansion of a base registration into a bundle 565 [[ Need at least one worked-through example of step 3 with interesting 566 variant situations ]] 568 5. Table format 570 The format of the table is meant to be machine-readable but not 571 human-readable. It is fairly trivial to convert the table into one that 572 can be read by people. 574 Each character in the table is given in the "U+" notation for Unicode 575 characters. The lines of the table are terminated with either a carriage 576 return character (ASCII 0x0D), a linefeed character (ASCII 0x0A), or a 577 sequence of carriage return followed by linefeed (ASCII 0x0D 0x0A). The 578 order of the lines in the table may or may not matter, depending on how 579 the table is constructed. 581 Comment lines in the table are preceded with a "#" character (ASCII 0x2C). 583 Each non-comment line in the table starts with the character that is 584 allowed in the registry, which is also called the "base character". If 585 the base character has any variants, it is followed by a vertical bar 586 character ("|", ASCII 0x7C) and the variant string. If the base 587 character has more than one variant, the variants are separated by a 588 colon (":", ASCII 0x3A). Strings are given with a hyphen ("-", ASCII 589 0x2D) between each character. Comments beging with a "#" (ASCII 0x2C), 590 and may be preceded by spaces (" ", ASCII 0x20). 592 The following is an example of how a table might look. The entries in 593 this table are purposely silly and should not be used by any registry as 594 the basis for choosing variants. For the example, assume that the 595 registry: 596 - allows the FOR ALL character (U+2200) with no variants 597 - allows the COMPLEMENT character (U+2201) which has a single variant 598 of LATIN CAPITAL LETTER C (U+0043) 599 - allows the PROPORTION character (U+2237) which has one variant which 600 is the string COLON (U+003A) COLON (U+003A) 601 - allows the PARTIAL DIFFERENTIAL character (U+2202) which has two 602 variants: LATIN SMALL LETTER D (U+0064) and GREEK SMALL LETTER DELTA 603 (U+03B4) 605 The table would look like: 607 # An example of a table 608 U+2200 609 U+2201|U+0043 610 U+2237|U+003A-U+003A # Note that the variant is a string 611 U+2202|U+0064:U+03B4 613 Implementors of table processors should remember that there are tens of 614 thousands of characters whose codepoints are greater than 0xFFFF. Thus, 615 any program that assumes that each character in the table is represented 616 in exactly six octets ("U", "+", and exactly four octets representing 617 the character value) will fail with tables that use characters whose 618 value is greater than 0xFFFF. 620 6. Examples 622 The following shows examples of the first two of the registry's options 623 as described in Section 2.4 (putting all labels in the zone; putting 624 only the base registration in the zone and blocking the rest). The third 625 option (resolving some labels but blocking others) is an extension of 626 these two and is not shown here. 628 The examples assume that the registry for the zone example.com uses the 629 following very short table, which says that LATIN SMALL LETTER L 630 (U+006C) has a single variant, DIGIT ONE (U+0031). 632 U+006C|U+0031 634 A registrant approaches the zone and requests a registration for the 635 name pale.example.com, for which there are two name servers 636 (x.example.com and y.example.com). After processing the base 637 registration "pale", the registration bundle contains "pale" and "pa1e". 639 6.1 Example 1: allocating multiple labels 641 Assume that the registry for the zone example.com uses option 1 642 (allocating multiple labels) as its registration policy. 644 The registry allocates pale.example.com and pa1e.example.com to the 645 registrant. The registry also creates a link in its registration 646 database from pa1e.example.com to pale.example.com so that any changes 647 to either the non-zone information or the zone information for one name 648 will be reflected in the other name. 650 The registry adds the following four records to the example.com zone: 652 $ORIGIN example.com. 653 pale IN NS x.example.com. 654 pale IN NS y.example.com. 655 pa1e IN NS x.example.com. 656 pa1e IN NS y.example.com. 658 Note that the registry might instead use DNAME records for allocating 659 labels. If the registry uses DNAMEs, the registry would instead add 660 the following three records to the example.com zone: 662 $ORIGIN example.com. 663 pale IN NS x.example.com. 664 pale IN NS y.example.com. 665 pa1e IN DNAME pale.example.com. 667 An end user who requests the name server for pa1e.example.com will get a 668 positive response with the correct information. 670 6.2 Example 2: blocking labels 672 Assume that the registry for the zone example.com uses option 2 673 (blocking labels) as its registration policy. 675 The registry allocates pale.example.com to the registrant and blocks 676 pa1e.example.com from being registered by anybody. The registry also 677 creates a link in its registration database from pa1e.example.com to 678 pale.example.com so that any changes to the non-zone information for 679 pale.example.com will be reflected in the blocked name. 681 The registry adds the following two records to the example.com zone: 683 $ORIGIN example.com. 684 pale IN NS x.example.com. 685 pale IN NS y.example.com. 687 An end user who requests the name server for pa1e.example.com will get a 688 response of "no such name". 690 7. Owner implications of multiple labels 692 The creation of a registration bundle for equivalent or near-equivalent 693 labels in a zone at the time of registration leads to many delegations. 694 This leads to records in parallel zones which must be synchronized. That 695 is, the owner of a registration bundle must keep the same information in the 696 zone for each label in the bundle. 698 Using the examples from Section 6, assume that the owner of the label 699 "pale" and "pa1e" creates a subdomain, "www". If the owner of 700 "example.com" used multiple delegations for the labels, the owner of 701 "pale" and "pa1e" would use two records: 703 $ORIGIN pale.example.com. 704 www IN A 1.2.3.4 706 $ORIGIN pa1e.example.com. 707 www IN A 1.2.3.4 709 An alternative for these two records, which helps the registrant 710 keep their names in synch, would be: 712 $ORIGIN pale.example.com. 713 www IN A 1.2.3.4 715 $ORIGIN pa1e.example.com. 716 www IN CNAME www.pale.example.com. 718 If the owner of "example.com" used a DNAME record to make "pale" and 719 "pa1e" equivalent, the owner of "pale" and "pa1e" could instead use one 720 record: 722 $ORIGIN pale.example.com. 723 www IN A 1.2.3.4 725 8. Security considerations 727 Apart from considerations listed in the IDNA specification, this 728 document explicitly talks about rules that a registry can define as part 729 of the policy which can be applied in a zone. A registry can apply an 730 variants table which solves some problems with homographs already 731 outlined in the security consideration section of IDNA. This might be 732 considered good for security because it will reduce the possible 733 confusion for the user, and lower the risk that the user will "connect" 734 to a service which was not intended. 736 Poorly-designed tables can cause security problems. For example, if a 737 table creates variants for base characters that are not really variants, 738 and names using those incorrect variants are allocated in the zone, an 739 unsuspecting end user will get unexpected positive answers to DNS 740 queries. For example, if the base character "a" has a variant of "e", 741 and those variants are allocated in the zone, a user who looks up 742 "bed.example.com" will get the information for "bad.example.com", even 743 though the user could easily see that these two labels are different. 745 Users will likely expect that variants will be the same in zones and may 746 make assumptions based on those expectations. However, the variants and 747 how they are used will probably be different between zones, even for two 748 zones that use the same language. This could lead to spoofing users by 749 registering names that leverage the differences in rules between the 750 zones. 752 9. References 754 9.1 Normative References 756 [IDNA] Faltstrom, P., Hoffman, P. and A. Costello, "Internationalizing 757 Domain Names in Applications (IDNA)", RFC 3490, March 2003. 759 [NAMEPREP] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile 760 for Internationalized Domain Names (IDN)", RFC 3491, March 2003. 762 [UNICODE] The Unicode Consortium. The Unicode Standard, Version 3.2.0 763 is defined by The Unicode Standard, Version 3.0 (Reading, MA, 764 Addison-Wesley, 2000. ISBN 0-201-61633-5), as amended by the Unicode 765 Standard Annex #27: Unicode 3.1 (http://www.unicode.org/reports/tr27/) 766 and by the Unicode Standard Annex #28: Unicode 3.2 767 (http://www.unicode.org/reports/tr28/). 769 9.2 Non-normative References 771 [ICANN-IDN] ICANN, "Deployment of Internationalized Domain Names", 772 774 [IDN-CJK] "Internationalized Domain Names Registration and 775 Administration Guideline for Chinese, Japanese, and Korean", 776 draft-jseng-idn-admin, work in progress. 778 10. IANA considerations 780 There are no IANA considerations for this document. The tables described 781 in this document can be created by anyone. Tables at IANA are often 782 considered to be authoritative, but languages have no one who is 783 authoritative for them. It is unclear what value, if any, there is for 784 someone to know what table a particular zone says it is using for 785 registration. Further, the tables are expected to be updated at 786 irregular times as new characters are added to the list of acceptable 787 characters. Therefore, it is probably unwise for IANA to keep a registry 788 of these tables. 790 11. Author's address 792 Paul Hoffman 793 Internet Mail Consortium and VPN Consortium 794 127 Segre Place 795 Santa Cruz, CA 95060 USA 796 phoffman@imc.org 798 A. Changes from the -01 document 800 Changed the intended status to Experimental 802 Added comments to the table syntax in section 5