idnits 2.17.1 draft-ietf-idn-compare-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 688 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The abstract seems to contain references ([IDN-REQ]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 11, 2000) is 8690 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1034' is mentioned on line 122, but not defined == Missing Reference: 'UTR-15' is mentioned on line 446, but not defined == Missing Reference: 'HOFFMAN' is mentioned on line 640, but not defined == Missing Reference: 'OSCARSSON' is mentioned on line 658, but not defined == Unused Reference: 'UTR15' is defined on line 636, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'BLOCK-NAMES' == Outdated reference: A later version (-04) exists of draft-duerst-i18n-norm-03 -- Possible downref: Normative reference to a draft: ref. 'DUERST' == Outdated reference: A later version (-10) exists of draft-ietf-idn-requirements-02 -- Possible downref: Normative reference to a draft: ref. 'IDN-REQ' == Outdated reference: A later version (-02) exists of draft-ietf-idn-idne-01 -- Possible downref: Normative reference to a draft: ref. 'IDNE' == Outdated reference: A later version (-06) exists of draft-skwan-utf8-dns-03 -- Possible downref: Normative reference to a draft: ref. 'KWAN' == Outdated reference: A later version (-03) exists of draft-ietf-idn-race-00 -- Possible downref: Normative reference to a draft: ref. 'RACE' ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629) ** Obsolete normative reference: RFC 2671 (Obsoleted by RFC 6891) == Outdated reference: A later version (-02) exists of draft-jseng-utf5-01 -- Possible downref: Normative reference to a draft: ref. 'SENG' -- Possible downref: Non-RFC (?) normative reference: ref. 'UDNS' -- Possible downref: Non-RFC (?) normative reference: ref. 'UTR15' Summary: 8 errors (**), 0 flaws (~~), 13 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman 2 draft-ietf-idn-compare-01.txt IMC & VPNC 3 July 11, 2000 4 Expires in six months 6 Comparison of Internationalized Domain Name Proposals 8 Status of this memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC 2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other groups 15 may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Abstract 30 The IDN Working Group is working on proposals for internationalized 31 domain names that might become a standard in the IETF. Before a single 32 full proposal can be made, competing proposals must be compared on a 33 wide range of requirements and desired features. This document compares 34 the many parts of a comprehensive protocol that have been proposed. It 35 is the companion document to "Requirements of Internationalized Domain 36 Names" [IDN-REQ], which lays out the requirements for the 37 internationalized domain name protocol. 39 1. Introduction 41 As the IDN Working Group has discussed the requirements for IDN, 42 suggestions have been made for various candidate protocols that might 43 meet the requirements. These proposals have been somewhat helpful in 44 bringing up real-world needs for the requirements. 46 It became clear no single proposal had wide agreement from the working 47 group. In fact, the authors of various proposals found themselves taking 48 some features from other proposals as they revised their drafts. At the 49 same time, working group participants were making suggestions for 50 incremental changes that might affect more than one proposal. 52 Because of this mixing and matching, it was decided that this IDN 53 comparisons document should compare features that might end up in the 54 final protocol, not full protocol suggestions themselves. The features 55 that had been discussed in the working group were divided by function, 56 and appear in this document in separate sections. For each function, 57 there are multiple suggestions for protocol elements that might meet the 58 requirements that are described in [IDN-REQ]. 60 This document is being discussed on the "idn" mailing list. To join the 61 list, send a message to with the words 62 "subscribe idn" in the body of the message. Archives of the mailing list 63 can also be found at ftp://ops.ietf.org/pub/lists/idn*. 65 1.1 Format of this document 67 Each section covers one feature that has been discussed as being part of 68 the final IDN solution. Within each section, alternate proposals are 69 listed with the major perceived pros and cons of the proposal. Also, 70 each proposal is given a label to make discussion of this document (and 71 of the proposals themselves) easier. 73 References to the numbered requirements in [IDN-REQ] are from version 74 -02 of that document. These numbers are expected to change and the 75 requirements document evolves. In this draft, the requirements are show 76 as "[#n-02]", where "n" is the requirement number from draft -02 of 77 [IDN-REQ]. This document only lists where particular proposals don't 78 meet particular requirmenents from [IDN-REQ], not the ones that they 79 fulfill. 81 Note that this document is supposed to reflect the discussion of all 82 proposed alternatives, not just the ones that fully match the 83 requirements in [IDN-REQ]. It will serve as a summary of the discussion 84 in the IDN WG for readers in the future who may want to know why certain 85 alternatives were not chosen for the eventual protocol. 87 The proposal drafts covered in this document are: 89 [DUERST] Character Normalization in IETF Protocols, 90 draft-duerst-i18n-norm-03 92 [IDNE] Internationalized domain names using EDNS (IDNE), 93 draft-ietf-idn-idne-01 95 [KWAN] Using the UTF-8 Character Set in the Domain Name System, 96 draft-skwan-utf8-dns-03 98 [RACE] RACE: Row-based ASCII Compatible Encoding for IDN, 99 draft-ietf-idn-race-00 101 [SENG] UTF-5, a transformation format of Unicode and ISO 10646, 102 draft-jseng-utf5-01 104 [UDNS] Using the Universal Character Set in the Domain Name System 105 (UDNS), draft-ietf-idn-udns-00 107 2. Architecture 109 One of the biggest questions raised early in the IDN discussion was what 110 the format of internationalized name parts would be on the wire, that 111 is, between the user's computer and the DNS resolvers. It was agreed 112 that the DNS protocols certainly allow non-ASCII octets in domain name 113 parts and resource records, but there was also acknowledgement that many 114 protocols that rely on the DNS could not handle non-ASCII names due to 115 the design of the protocol. Section 3.1 of this document describes the 116 proposed encodings for the non-ASCII name parts. 118 Because of requirement [#2-02], there were proposals for 119 ASCII-compatible encodings (ACEs) of non-ASCII characters. Different 120 ACEs were proposed (and are discussed in Section 4 of this document), 121 but they all have the same goal: to allow non-ASCII characters to be 122 represented in host names that conform to RFC 1034 [RFC1034]. 124 2.1 arch-1: Just send binary 126 [KWAN] proposes beginning to send characters outside the range allowed 127 in RFC 1034. 129 Pro: Easiest to describe. Only changes host name syntax, not any of the 130 related DNS protocols. 132 Con: Doesn't work with many exiting protocols that relies on DNS. 133 Violates requirement [#9-02]. 135 2.2 arch-2: Send binary or ACE 137 [UDNS] (and, later, [IDNE]) proposes using both binary and ACE formats 138 on the wire. 140 Pro: Allows protocols that can handle binary name parts to use them 141 directly, while allowing protocols that cannot use binary name parts to 142 also handle names without conversion. Allows domain names in free text 143 to be displayed in binary even in systems that require ACE-formatted 144 names on the wire. 146 Con: Requires all software that uses domain names to handle both 147 formats. Requires processing time for conversion of ACE formats into the 148 format must likely used internally to the software. 150 2.3 arch-3: Just send ACE 152 [RACE] and [SENG] propose that host naming rules remain the same and 153 that all internationalize domain names be sent in ACE format. 155 Pro: No changes at all to current DNS protocols. 157 Con: Requires all software to recognize ACE domain names and convert 158 them to human-readable for display. This is true not only in domain 159 names used on the wire but also domain names used in free text. 161 3. Names in binary 163 Both arch-1 and arch-2 include domain name parts that are represented on 164 the wire in a binary format. This section describes some of the features 165 of such names. 167 3.1 bin-1: Format 169 There are many different charsets and encodings for the scripts of the 170 world. The WG has discussed which binary encoding should be used on the 171 wire. 173 3.1.1 bin-1.1: UTF-8 175 The IETF policy on character sets [RFC2277] states that UTF-8 [RFC2279] 176 is the preferred charset for IETF protocols. UTF-8 encodes all 177 characters in the ISO 10646 repertoire. 179 Pro: Well-supported in other IETF protocols. Compact for most scripts. 180 Wide implementation in programming languages. US-ASCII characters have 181 the same encoding in UTF-8 as they do in US-ASCII. Because it is based 182 on ISO 10646, expansion of the repertoire comes from respected 183 international standards bodies. 185 Con: Asian scripts require three octets per character. 187 3.1.2 bin-1.2: Labelled charsets 189 Mailing list discussion mentioned using multiple charsets for the binary 190 representation. Each name part would be labelled with the charset used. 192 Pro: Allows users to specify names in the charsets they are most 193 familiar with. 195 Con: All resolvers would have to know all charsets. Thus, the number of 196 charsets would probably have to be limited and never expand. Mapping of 197 characters between charsets would have to be exact and not change over 198 time. 200 3.2 bin-2: Distinguishing binary from current format 202 Software built for current domain names might give unexpected results 203 when dealing with non-ASCII characters in domain names. For example, it 204 was reported on the mailing list that some software crashes when a 205 non-ASCII domain name is returned for in-addr.arpa requests. Thus, there 206 may be a need for IDN to prevent software that is not binary-aware from 207 receiving domain names with binary parts. This would only apply to an 208 IDN that used arch-2, not arch-1. 210 3.2.1 bin-2.1: Don't mark binary 212 [KWAN] does not specify any way of changing requests to prevent binary 213 name parts from being transmitted. 215 Pro: No changes to current DNS requests and responses. 217 Con: Likely to cause disruption in software that is not binary-aware. 218 Likely to cause systems to misread names and possibly (and incorrectly) 219 convert them to ASCII names by stripping off the high bit in octets; 220 this in turn would lead to security problems due to mistaken identities. 221 Returning binary host names to DNS queries is known to break some 222 current software. 224 3.2.2 bin-2.2: Mark binary with IN bit 226 [UDNS] describes using a bit from the header of DNS queries to mark the 227 query as possibly containing a binary name part and indicating that the 228 response to the query can contain binary name parts. 230 Pro: This bit is currently unused and must be set to zero, so current 231 software won't use it accidentally. No changes to any other part of the 232 query or RRs. 234 Con: It's the last unused bit in the header and DNS folks have indicated 235 that they are very hesitant to give it up. 237 3.2.3 bin-2.3: Mark binary with new QTYPEs 239 [UDNS] using new QTYPEs to mark the query as possibly containing a 240 binary name part and indicating that the response to the query can 241 contain binary name parts. QTYPEs are two octets long, and no QTYPEs to 242 date use more than the lower eight bits, so one of the bits from the 243 upper octet could be used to indicate binary names. 245 Pro: These bits are currently unused and must be set to zero, so current 246 software won't use them accidentally. No changes to any other part of 247 the query or RRs. Uses a bit that isn't as prized as the IN bit. 249 Con: Software must pay more attention to the QTYPEs than it might have 250 previously. 252 3.2.4 bin-2.4: Mark binary with EDNS 254 [IDNE] uses EDNS [RFC2671] to mark the query and response as containing 255 a binary name part. 257 Pro: There is little use of EDNS at this point, so it is very unlikely 258 to have bad interactions with old software. EDNS allows longer name 259 parts, and allows additional information (such as IDN version number) 260 in each name part. 262 Con: There is little use of EDNS and this might make implementation 263 harder. 265 4. Names in ASCII-compatible encoding (ACE) 267 Both arch-2 and arch-3 include domain name parts that are represented on 268 the wire in an ASCII-compatible encoding (ACE). This section describes 269 some of the features of such names. 271 4.1 ace-1: Format 273 A variety of proposals for the format of ACE have been proposed. Each 274 proposal has different features, such as how many characters can be 275 encoded within the 63 octet limit for each name part. The length 276 descriptions in this section assume that there is no distinguishing of 277 ACE from current names; this is not a likely outcome of the WG work. 279 The descriptions of lengths is based on script block names from 280 [BLOCK-NAMES]. 282 4.1.1 ace-1.1: UTF-5 284 [SENG] Describes UTF-5, which is a fairly direct encoding of ISO 10646 285 characters using a system similar to UTF-8. Characters from Basic Latin 286 and Latin-1 Supplement take 2 octets; Latin Extended-A through Tibetan 287 take 3 octets; Myanmar through the end of BMP take 4 octets; non-BMP 288 characters take 5 octets. This means that names using all characters 289 in the Myanmar through the end of BMP are limited to 15 characters. 291 Pro: Extremely simple. 293 Con: Poor compression, particularly for Asian scripts. 295 4.1.2 ace-1.2: RACE 297 [RACE] describes RACE, which is a two-step algorithm that first 298 compresses the name part, then converts the compressed string into and 299 ACE. Name parts in all scripts other than Han, Yi, Hangul syllables, 300 Ethiopic, and non-BMP take up ceil(1.6*(n+1)) octets; name parts in 301 those scripts and any name that mixes characters from different rows in 302 ISO 10646 take up ceil(3.2*(n+1)) octets. This means that names using 303 Han, Yi, Hangul syllables, or Ethiopic, are limited to 18 characters. 304 (Note: this document used to be called CIDNUC.) 306 Pro: Best compression for most scripts, and similar compression for the 307 scripts where it is not the best. 309 Con: More complicated than UTF-5. Not well optimized for names that have 310 mixed scripts, such as non-Latin names that use hyphen or ASCII digits. 312 4.1.3 ace-1.3: Hex of UTF-8 314 An early draft described "hex of UTF-8", which is a straight-forward 315 hexadecimal encoding of UTF-8. Characters in Basic Latin (other than 316 non-US-ASCII and hyphen) take 3 octets; Latin Extended-A through Tibetan 317 take 5 octets; Myanmar through end of BMP take 7 octets; non-BMP 318 characters take 9 octets. This means that names using all characters 319 in the Myanmar through the end of BMP are limited to 9 characters. 321 Pros: Very simple to describe. 323 Cons: Very poor compression for all scripts. 325 4.1.4 ace-1.5: SACE 327 A message on the mailing list pointed to code for SACE, an ASCII 328 encoding that purports to compact to about the same size as UTF-8. 330 Pros: Similar compression to UTF-8. 332 Cons: No description of how the algorithm works. 334 4.2 ace-2: Distinguishing ACE from current names 336 Software that finds ACE name parts in free text probably should 337 display the name part using the actual characters, not the ACE 338 equivalent. Thus, software must be able to identify which ASCII name 339 parts are ACE and which are non-ACE ASCII parts (such as current names). 340 This would only apply to an IDN proposal that used arch-2, not arch-3. 342 4.2.1 ace-2.1: Currently legal names 344 Name parts that are currently legal in RFC 1034 can be tagged to 345 indicate the part is encoded with ACE. 347 4.2.1.1 ace-2.1.1: Add hopefully-unique legal tag 349 [RACE] proposes adding a hopefully-unique legal tag to the beginning 350 of the name. The proposal would also work with such a tag at the end of 351 the name part, but it is easier for most people to recognize at the 352 beginning of name parts. 354 Pros: Easy for software (and humans) to recognize. 356 Cons: There is no way to prevent people from beginning non-ACE names 357 with the tag. Unless the tag is very unlikely to appear in any name in 358 any human language, non-ACE names that begin with the tag will display 359 oddly or be rejected by some systems. 361 4.2.1.2 ace-2.1.2: Add a checksum 363 Off-list discussion has mentioned the possibility of creating a checksum 364 mechanism where the checksum would be added to the beginning (or end) of 365 ACE name parts. 367 4.2.2 ace-2.2: Currently illegal names 369 Instead of creating names that are currently legal, another proposal is 370 to create names that use the current ASCII characters but are illegal. 372 4.2.2.1 ace-2.2.1: Add trailing hyphen 374 An earlier draft described using a trailing hyphen as a signifier of an 375 ACE name. 377 Pros: It is surmised that most current software does not reject names 378 that are illegal in this fashion. Thus, there would be little disruption 379 to current systems. This mechanism takes up fewer characters than any 380 proposed in ace-2.1. 382 Cons: Some current software is will probably break with this mechanism. 383 It goes against some current protocols that match the rules in RFC 1034. 385 5. Prohibited characters 387 There was a short but active discussion on the mailing list about which 388 characters from the ISO 10646 character set should never appear in host 389 names. To date, there are no Internet Drafts on the subject. This 390 section summarizes some of the suggestions. 392 5.1 prohib-1: Identical and near-identical characters 394 Some characters are visually identical or incredibly similar to other 395 characters, thus making it impossible to accurately enter host names 396 that are seen in print. 398 5.2 prohib-2: Separators 400 Horizontal and vertical spacing characters would make it unclear where a 401 host name begins and ends. Also, allowing periods and period-like 402 characters as characters within a name part would also cause similar 403 confusion. 405 5.3 prohib-3: Non-displaying and non-spacing characters 407 There are many characters that cannot be seen in the ISO 10646 character 408 set. These include control characters, non-breaking spaces, formatting 409 characters, and tagging characters. These characters would certainly 410 cause confusion if allowed in host names. 412 5.4 prohib-4: Private use characters 414 Private use characters from ISO 10646 inherently have no specified 415 visual form (and in fact can be used for non-displaying characters). 416 Thus, there could be no visual interoperability for characters in the 417 private use areas. 419 5.5 prohib-5: Punctuation 421 Some punctuation characters are disallowed in URLs because they are used 422 in URL syntax. 424 5.6 prohib-6: Symbols 426 Some mailing list discussion stated that characters that do not normally 427 appear in human or company names should not be allowed in host names. 428 This includes symbols and non-name punctuation. 430 6. Canonicalization 432 The working group has a spirited discussion on the need for 433 canonicalization. [IDN-REQ] describes many requirements for when and what 434 type of canonicalization might be performed. 436 6.1 canon-1: Type of canonicalization 438 The Unicode Consortium's recommendations and definitions of 439 canonicalization [UTR-15] describes many forms of canonicalization that 440 can be performed on character strings. [DUERST] covers much of the same 441 ground but makes more focused requirements for canonicalization on the 442 Internet. 444 6.1.1 canon-1.1: Normalization Form C 446 [DUERST] recommends Normalization Form C, as described in [UTR-15], for 447 use on the Internet. This form is a canonical decomposition, followed by 448 canonical composition. 450 6.1.2 canon-1.2: Normalization Form KC 452 Discussion on the mailing list recommended Normalization Form KC. This 453 form is a compatibility decomposition, followed by canonical 454 composition. Compatibility decomposition makes characters that have 455 compatibility equivalence the same after decomposing. 457 6.2 canon-2: Other canonicalization 459 Host names may have special canonicalization needs that can be added to 460 those given in canon-1. 462 6.2.1 canon-2.1: Case folding in ASCII 464 RFC 1034 specifies that there is no difference between host names that 465 have the same letters but the letters have different case. Thus, the 466 name part "example" is considered the same as "Example" and "EXamPLe". 467 Neither uppercase nor lowercase is specified as being canonical. 469 6.2.2 canon-2.2: Case folding in non-ASCII 471 Discussion on the mailing list has raised the issue of whether or not 472 non-ASCII Latin characters should have the same case-folding rules as 473 ASCII. Such rules would match the expectations of native speakers of 474 some languages, but would go counter to the expectations of native 475 speakers of other languages. 477 6.2.3 canon-2.3: Han folding 479 Discussion on the mailing list has raised the issue of equivalences in 480 some languages use of Han characters. For example, in Chinese, there are 481 many traditional characters that have equivalent simplified characters. 482 Similarly, there are some Han ideographs for which there are multiple 483 representations in ISO 10646. There are no well-established rules for 484 such folding, and some of the proposed folding would be locale-specific. 486 6.3 canon-3: Location of canonicalization 488 Canonicalization can be performed in any system in the DNS. Because it 489 is not a trivial operation and can require large tables, the location of 490 where canonicalization is performed is important. 492 6.3.1 canon-3.1: Canonicalize only in the application 494 Early canonicalization is a cleaner architecture design. Spending the 495 cycles on the end systems puts less burden on resolvers or servers in 496 the DNS service. When IDN is first adopted, the applications need to be 497 updated anyway to handle the new format for the names. It is easier for 498 people to upgrade their applications than their resolvers if they need a 499 new IDN feature. 501 6.3.2 canon-3.2: Canonicalize only in the resolver 503 Updating a single resolver provides new service to large number of 504 applications and (possibly) users. It is easier to find canonicalization 505 bugs in resolvers than in applications because the resolver has 506 predictable programmatic interfaces. IDN will probably be revised often 507 as new characters are added to ISO 10646, so updating smaller number of 508 resolvers is better than revising more applications. When an end user 509 has a problem with resolving an IDN name, it is much easier to test if 510 the problem is in the resolver than in the user's application. 512 6.3.3 canon-3.3: Canonicalize in the DNS service 514 Canonicalization should happen as late as possible so that changes in 515 the canonicalization algorithm don't orphan all applications and 516 resolvers. Some canonicalization discards information and so should be 517 delayed as long as possible. Canonicalization is practically free, 518 computationally (although it involves some large tables). Because adding 519 IDN to the DNS will happen over time, canonicalizing at the server will 520 minimize the number of things that need to be changed, and simplify and 521 centralize the process of change. 523 7. Transitions 525 Early in the working group discussion, there was active debate about how 526 the transition from the current host name rules to IDN would be handled. 527 Given requirement [#1-02], this transition is quite important to 528 deciding which proposals might be feasible. 530 7.1 trans-1: Always do current plus new architecture 532 In this proposal, IDN will be used at the same time as the current DNS 533 forever. That is, IDN will be in addition to the current DNS. 535 7.2 trans-2: Transition period 537 In this proposal, IDN will be used at the same time as the current DNS 538 for a specified period of time, after which only IDN will exist. That 539 is, IDN will replace the current DNS. 541 8. Root server considerations 543 DNS root servers receive all requests for top-level domains that are not 544 in the local DNS cache. They are critical to the Internet. Care must be 545 taken to ensure that root servers will not be affected by new mechanisms 546 introduced. 548 Any IDN proposal that includes a binary encoding will have an impact on 549 the root servers. The binary requests will affect the root servers 550 because the current root server software is designed to handle current 551 host names. Further, the root zone files which contain ccTLDs and gTLDs 552 would have to support binary domain names and possibly binary host names 553 for NS records. Because all the root servers are equivalent, they would 554 have to be synchronized to support the binary domain names at the same 555 time. 557 Proposals that only use ACE and use tagging with currently-legal names 558 would, by definition, not affect the root servers. 560 9. Security considerations 562 All security considerations listed in [IDN-REQ] apply to this document. 563 Further, all security considerations listed in each of the IDN proposals 564 must be considered when comparing the proposals. 566 Some proposals described in this document may create new security 567 considerations. However, these considerations will have to be addressed 568 in the eventual protocol document. All the proposals described here are 569 still incomplete and security considerations may be added to them as 570 they are revised. All the proposals listed in this document use the ISO 571 10646 character set, so the proposals inherit any security 572 characteristics of that character set. 574 Many protocols and applications rely on domain names to identify the 575 parties involved in a network transaction. For example, a user who 576 connects to a web site by entering or selecting a URL expects that their 577 software will select the web site named in the URL. The uniqueness of 578 domain names are crucial to ensure identification of Internet entities. 580 To make round-trip translation between local charsets and ISO 10646, the 581 ISO 10646 specification has assigned multiple code points to individual 582 glyphs. Moreover, some glyphs might look similar to some users, but look 583 clearly different by other users. This means that it would be simple for 584 an attacker to mimic a domain name by using similar-looking but 585 different glyphs and guessing that some users will not see the 586 difference in their user interface. 588 Some IDN protocols may have denial of service attacks, such as by using 589 non-identified chars, exception characters, or under-specified behavior 590 in using some special characters. 592 10. IANA considerations 594 This document does not create any new IANA registries. However, it is 595 possible that a character property registry may need to be set up when 596 the IDN protocol is created in order to list prohibited characters 597 (section 5) and canonicalization mappings (section 6). 599 11. Acknowledgements 601 James Seng and Marc Blanchet gave many helpful suggestions on the 602 pre-release versions of this document. 604 12. References 606 [BLOCK-NAMES] Unicode Consortium, 607 . 609 [DUERST] Character Normalization in IETF Protocols, 610 draft-duerst-i18n-norm-03 612 [IDN-REQ] Requirements of Internationalized Domain Names, 613 draft-ietf-idn-requirements-02 615 [IDNE] Internationalized domain names using EDNS (IDNE), 616 draft-ietf-idn-idne-01 618 [KWAN] Using the UTF-8 Character Set in the Domain Name System, 619 draft-skwan-utf8-dns-03 621 [RACE] RACE: Row-based ASCII Compatible Encoding for IDN, 622 draft-ietf-idn-race-00 624 [RFC2277] IETF Policy on Character Sets and Languages, RFC 2277 626 [RFC2279] UTF-8, a transformation format of ISO 10646, RFC 2279 628 [RFC2671] Extension Mechanisms for DNS (EDNS0), RFC 2671 630 [SENG] UTF-5, a transformation format of Unicode and ISO 10646, 631 draft-jseng-utf5-01 633 [UDNS] Using the Universal Character Set in the Domain Name System 634 (UDNS), draft-ietf-idn-udns-00 636 [UTR15] Unicode Normalization Forms, Unicode Technical Report #15 638 A. Differences Between -00 and -01 Drafts 640 Throughout: Changed references from [HOFFMAN] to [RACE]. 642 Throughout: Changed references from [OSCARSSON] to [UDNS]. 644 Throughout: Added [IDNE]. 646 Removed section 1.2. 648 3.2.3: Updated to mention [UDNS]. 650 3.2.4: Updated with [IDNE], changed "EDNS0" to "EDNS", and reworded. 652 4.1.2: Added Ethiopic to the list of scripts that require two octets per 653 character. 655 4.1.3: Removed reference to [OSCARSSON] because that is no longer in the 656 [UDNS] draft. 658 4.2.2.1: Removed reference to [OSCARSSON] because that is no longer in 659 the [UDNS] draft. 661 6.1.1: Reworded first sentence. 663 6.3: Added entire section and subsections. 665 8: Fixed typo in first sentence. 667 B. Author Contact 669 Paul Hoffman 670 IMC & VPNC 671 127 Segre Place 672 Santa Cruz, CA 95060 673 phoffman@imc.org or paul.hoffman@vpnc.org