idnits 2.17.1 draft-alvestrand-idna-bidi-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 702. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 713. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 720. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 726. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 99: '... character MUST be the first char...' RFC 2119 keyword, line 100: '...dALCat character MUST be the last char...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (Feb 14, 2008) is 5916 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Normative reference to a draft: ref. 'I-D.klensin-idnabis-issues' -- Possible downref: Non-RFC (?) normative reference: ref. 'UAX9' -- Obsolete informational reference (is this intentional?): RFC 3454 (Obsoleted by RFC 7564) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group H. Alvestrand, Ed. 3 Internet-Draft Google 4 Intended status: Standards Track C. Karp, Ed. 5 Expires: August 17, 2008 Swedish Museum of Natural History 6 Feb 14, 2008 8 An updated IDNA criterion for right-to-left scripts 9 draft-alvestrand-idna-bidi-04 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on August 17, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2008). 40 Abstract 42 The use of right-to-left scripts in internationalized domain names 43 has presented several challenges. This memo discusses some problems 44 with these scripts, and some shortcomings in the 2003 IDNA BIDI 45 criterion. Based on this discussion, it proposes a new BIDI 46 criterion for IDNA labels. 48 Table of Contents 50 1. Introduction and problem description . . . . . . . . . . . . . 3 51 1.1. Purpose and applicability . . . . . . . . . . . . . . . . 3 52 1.2. Background and history . . . . . . . . . . . . . . . . . . 3 53 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 54 2. Detailed examples . . . . . . . . . . . . . . . . . . . . . . 4 55 2.1. Dhivehi . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.2. Yiddish . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 2.3. Strings with numbers . . . . . . . . . . . . . . . . . . . 6 58 3. An expanded justification for the bidi rule . . . . . . . . . 7 59 4. A replacement for the RFC 3454 criterion . . . . . . . . . . . 10 60 5. Other issues in need of resolution . . . . . . . . . . . . . . 11 61 6. Compatibility considerations . . . . . . . . . . . . . . . . . 11 62 6.1. Backwards compatibility considerations . . . . . . . . . . 11 63 6.2. Forward compatibiltiy considerations . . . . . . . . . . . 12 64 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 65 8. Security Considerations . . . . . . . . . . . . . . . . . . . 13 66 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 67 Appendix A. Change log . . . . . . . . . . . . . . . . . . . . . 14 68 A.1. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 14 69 A.2. Changes from -01 to -02 . . . . . . . . . . . . . . . . . 14 70 A.3. Changes from -02 to -03 . . . . . . . . . . . . . . . . . 14 71 A.4. Changes from -03 to -04 . . . . . . . . . . . . . . . . . 14 72 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 73 10.1. Normative references . . . . . . . . . . . . . . . . . . . 15 74 10.2. Informative references . . . . . . . . . . . . . . . . . . 15 75 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 76 Intellectual Property and Copyright Statements . . . . . . . . . . 17 78 1. Introduction and problem description 80 1.1. Purpose and applicability 82 This document's purpose is to establish a test that can be applied to 83 Internationalized Domain Name (IDN) labels in Unicode form (U-labels) 84 containing right-to-left characters. 86 When labels pass the test, they can be used with a minimal chance of 87 these labels being displayed in a confusing way by a bidirectional 88 display algorithm. In order to achieve this stability, it is also 89 necessary that the test be applied to labels occuring before or after 90 the label containing right-to-left characters, which prohibits some 91 LDH-labels that are permitted in other contexts. 93 1.2. Background and history 95 The IDNA specification "Stringprep", [RFC3454] makes the following 96 statement in its section 6 on the bidi algorithm, : 98 3) If a string contains any RandALCat character, a RandALCat 99 character MUST be the first character of the string, and a 100 RandALCat character MUST be the last character of the string. 102 (A RandAlCat character is a character with unambiguously right-to- 103 left directionality.) 105 The reasoning behind this prohibition was to ensure that every 106 component of a displayed domain name has an unambiguously preferred 107 direction. However, this makes certain words in languages written 108 with right-to-left scripts invalid as IDN labels, and in at least one 109 case means that all the words of an entire language are forbidden as 110 IDN labels. 112 This will be illustrated below with examples taken from the Dhivehi 113 and Yiddish languages, as written with the Thaana and Hebrew scripts, 114 respectively. 116 In investigating this problem, it was realized that the RFC 3454 117 specification did not exactly specify what the requirement to be 118 fulfilled was, and therefore, it was impossible to tell whether a 119 simple relaxation of the rule would continue to fulfil the 120 requirement. A further investigation led to the conclusion that for 121 one reasonable set of requirements, IDNA2003's BIDI restriction did 122 not fulfil the requirements. This document therefore proposes 123 replacing the RFC 3454 BIDI requirement in its entirety. 125 While the document proposes completely new text, most reasonable 126 labels that were allowed under the old criterion will also be allowed 127 under the new criterion, so the operational impact of the rule change 128 is limited. 130 1.3. Terminology 132 In this memo, we use "network order" to describe the sequence of 133 characters as transmitted on the wire or stored in a file; the terms 134 "first", "next" and "previous" are used to refer to the relationship 135 of characters in network order. 137 We use "display order" to talk about the sequence of characters as 138 imaged on a display medium; the terms "left" and "right" are used to 139 refer to the relationship of characters in display order. 141 Most of the time, the examples use the abbreviations for the Unicode 142 Bidi classes to denote the directionality of the characters; in some 143 examples, the convention that uppercase characters are of class R or 144 AL, and lowercase characters are of class L is used - thus, the 145 example string ABC.abc would consist of 3 right-to-left characters 146 and 3 left-to-right characters. 148 The other terminology used to describe IDNA concepts is defined in 149 [I-D.klensin-idnabis-issues] 151 2. Detailed examples 153 2.1. Dhivehi 155 Dhivehi, the official language of the Maldives, is written with the 156 Thaana script. This displays some of the characteristics of Arabic 157 script, including its directional properties, and the indication of 158 vowels by the diacritical marking of consonantal base characters. 159 This marking is obligatory, and both double vowels and syllable-final 160 consonants are indicated by the marking of special unvoiced 161 characters. Every Dhivehi word therefore ends with a combining mark. 163 The word for "computer", which is romanized as "konpeetaru", is 164 written with the following sequence of Unicode code points: 166 U+0786 THAANA LETTER KAAFU (AL) 168 U+07AE THAANA OBOFILI (NSM) 170 U+0782 THAANA LETTER NOONU (AL) 171 U+07B0 THAANA SUKUN (NSM) 173 U+0795 THAANA LETTER PAVIYANI (AL) 175 U+07A9 THAANA LETTER EEBEEFILI (AL) 177 U+0793 THAANA LETTER TAVIYANI (AL) 179 U+07A6 THAANA ABAFILI (NSM) 181 U+0783 THAANA LETTER RAA (AL) 183 U+07AA THANAA UBIUFILI (NSM) 185 The directionality class of U+07AA in the Unicode database is NSM 186 (non-spacing mark), which is not R or AL; a conformant implementation 187 of the IDNA2003 algorithm will say that "this is not in RandALCat", 188 and refuse to encode the string. 190 2.2. Yiddish 192 Yiddish is one of several languages written with the Hebrew script 193 (others include Hebrew and Ladino). This is basically a consonantal 194 alphabet (also termed an "abjad") but Yiddish is written using an 195 extended form that is fully vocalic. The vowels are indicated in 196 several ways, of which one is by repurposing letters that are 197 consonants in Hebrew. Other letters are used both as vowels and 198 consonants, with combining marks, called "points", used to 199 differentiate between them. Finally, some base characters can 200 indicate several different vowels, which are also disambiguated by 201 combining marks. Pointed characters can appear in word-final 202 position and may therefore also be needed at the end of labels. This 203 is not an invariable attribute of a Yiddish string and there is thus 204 greater latitude here than there is with Dhivehi. 206 The organization now known as the "YIVO Institute for Jewish 207 Research" developed orthographic rules for modern Standard Yiddish 208 during the 1930s on the basis of work conducted in several venues 209 since earlier in that century. These are given in, "The Standardized 210 Yiddish Orthography: Rules of Yiddish Spelling, 6th ed., YIVO 211 Institute for Jewish Research, New York, 1999, ISBN 0-914512-25-0", 212 ("SYO") and are taken as normatively descriptive of modern Standard 213 Yiddish in any context where that notion is deemed relevant. They 214 have been applied exclusively in all Yiddish dictionaries published 215 since their establishment, and are similarly dominant in academic and 216 bibliographic regards. 218 It therefore appears appropriate for this repertoire also to be 219 supported fully by IDNA. This presents no difficulty with characters 220 in initial and medial positions, but pointed characters are regularly 221 used in final position as well. All of the characters in the SYO 222 repertoire appear in both marked and unmarked form with one 223 exception: the HEBREW LETTER PE (U+05E4). The SYO only permits this 224 with a HEBREW POINT DAGESH (U+05BC), providing the Yiddish equivalent 225 to the Latin letter "p", or a HEBREW POINT RAFE (U+05BF), equivalent 226 to the Latin letter "f". There is, however, a separate unpointed 227 allograph, the HEBREW LETTER FINAL PE (U+05E3), for the latter 228 character when it appears in final position. The constraint on the 229 use of the SYO repertoire resulting from the proscription of 230 combining marks at the end of RTL strings thus reduces to nothing 231 more, or less, than the equivalent of saying that a string of Latin 232 characters cannot end with the letter "p". It must also be noted 233 that the HEBREW LETTER PE with HEBREW POINT DAGESH is characteristic 234 of almost all traditional Yiddish orthographies that predate (or 235 remain in use in parallel to) the SYO, being the first pointed 236 character to appear in any of them. 238 A more general instantiation of the basic problem can be seen in the 239 representation of the YIVO acronym. This is written with the Hebrew 240 letters YOD YOD HIRIQ VAV VAV ALEF QAMATS, where HIRIQ and QAMATS are 241 combining points: 243 U+05D9 HEBREW LETTER YOD (R) 245 U+05B4 HEBREW POINT HIRIQ (NSM) 247 U+05D5 HEBREW LETTER VAV (R) 249 U+05D0 HEBREW LETTER ALEF (R) 251 U+05B8 HEBREW POINT QAMATS (NSM) 253 The directionality class of U+05B8 HEBREW POINT QAMATS in the Unicode 254 database is NSM, which again causes the IDNA2003 algorithm to reject 255 the string. 257 It may also be noted that all of the combined characters mentioned 258 above exist in precomposed form at separate positions in the Unicode 259 chart. However, by invoking Stringprep, the IDNA2003 algorithm also 260 rejects those codepoints, for reasons not discussed here. 262 2.3. Strings with numbers 264 RFC 3454, in its insistence that the first or last character of a 265 string be category R or AL, prohibited strings that contained right- 266 to-left characters and numbers at the end. 268 Consider the strings ALEF 5 (HEBREW LETTER ALEF + DIGIT FIVE) and 5 269 ALEF. Displayed in a LTR context, the first one will be displayed 270 from left to right as 5 ALEF (with the 5 being considered right-to- 271 left because of the leading ALEF), while 5 ALEF will be displayed in 272 exactly the same order (5 taking the direction from context). 273 Clearly, only one of those should be permitted as a registered label. 275 3. An expanded justification for the bidi rule 277 One issue with RFC 3454 was that it did not give an explicit 278 justification for the bidi rule, thus it was hard to tell if a 279 modified rule would continue to fulfil the purpose for which the RFC 280 3454 rule was written. 282 This document proposes an explicit justification, by stating a set of 283 requirements for which it is possible to test whether or not the 284 modified rule fulfils the requirement. 286 All the text in this document assumes that text containing the labels 287 under consideration will be displayed using the Unicode bidirectional 288 algorithm [UAX9]. 290 The justification proposed is this: 292 o No two labels, when presented in display order, should have the 293 same sequence of characters without also having the same sequence 294 of characters in network order. (This is the criterion that is 295 explicit in RFC 3454). 297 o In a display of a string of labels, the characters of each label 298 should remain grouped between the characters delimiting the 299 labels. 301 o These properties should hold true both when the string is embedded 302 in a paragraph with LTR direction and when it's embedded in a 303 paragraph with RTL direction, as long as explicit directional 304 controls are not used within the same paragraph. 306 Several stronger statements were considered and rejected, because 307 they seem to be impossible to fulfil within the constraints of the 308 Unicode bidirectional algorithm. These include: 310 o The appearance of a label should be unaffected by its embedding 311 context. This proved impossible even for ASCII labels; the label 312 "123-456" will have a different display order in an RTL context 313 than in a LTR context. 315 o The sequence of labels should be consistent with network order. 316 This proved impossible - a domain name consisting of the labels 317 (in network order) L1.R1.R2.L2 will be displayed as L1.R2.R1.L2 in 318 an LTR context. 320 o The "remain grouped" property should remain true when directional 321 controls (LRE, RLE, RLO, LRO, PDF) are used in the same paragraph 322 (outside of the labels). Because these controls affect 323 presentation order in non-obvious ways, by affecting the "sor" and 324 "eor" properties of the Unicode BIDI algorithm, the conditions 325 above would be very hard to satisfy for an useful set of strings 326 if this was true. As long as these controls have no influence 327 over the display of the domain name, no problem will be caused, 328 but the exact criterion for "will not influence" is hard to 329 codify. 331 o The "no two labels display the same" should hold true between LTR 332 paragraphs and RTL paragraphs. This was shown to be unsound. 334 o No two domain names should be displayed the same, even under 335 differing directionality. This was shown to be unsound, since the 336 domain name (network) ABC.abc will have display order CBA.abc in 337 an LTR context and abc.CBA in an RTL context, while the domain 338 name (network) abc.ABC will display as abc.CBA in an LTR context 339 and as CBA.abc in an RTL context. 341 For reference, here are the values that the Unicode BIDI property can 342 have: 344 o L - Left-to-right - most letters in LTR scripts 346 o R - Right-to-left - most letters in non-Arabic RTL scripts 348 o AL - Arabic letters - most letters in the Arabic script 350 o EN - European Number (0-9) 352 o ES - European Number Separator (+ and -) 354 o ET - European Number Terminator (currency symbols, the hash sign, 355 the percent sign and so on) 357 o AN - Arabic Number 359 o CS - Common Number Separator (. , / : et al) 361 o NSM - Nonspacing Mark - most combining accents 362 o BN - Boundary Neutral - control characters 364 o B - Paragraph Separator 366 o S - Segment Separator 368 o WS - Whitespace, including the SPACE character 370 o ON - Other Neutrals, including @, &, parentheses, MIDDLE DOT 372 o LRE, LRO, RLE, RLO, PDF - these are "directional control 373 characters", and are not used in IDNA labels. 375 The "remain grouped" property can be more formally stated as: 377 o Let "Delimiter chars" be a set of characters with the Unicode BIDI 378 properties CS, WS, ON. (These are commonly used to delimit labels 379 - both the FULL STOP and the space are included.) 381 * ET, though it commonly occurs next to domain names in practice, 382 is problematic: the context R CS L EN ET (for instance A.a1%) 383 makes the label L EN grow unstable. 385 * ES commonly occurs in labels as HYPHEN-MINUS, but could also be 386 used as a delimiter (for instance, the plus sign). It is left 387 out here. 389 o Let "Position" be the position of a character in a string (in 390 network order) 392 o Let "Bidi position" be the position computed by the Unicode Bidi 393 algorithm 395 In a paragraph with an embedded string formed from the substrings A B 396 L C D, where A and D are (possibly zero-length) legal labels, and B 397 and C are single "Delimiter chars", the label L is a legal label if, 398 for all A, B, C and D, the bidi position of all characters in L is 399 within the range of positions for the characters of L in the string, 400 for both the LTR and RTL paragraph direction. 402 (The "zero-length" case represents the case where a domain name is 403 next to something that isn't a domain name, separated by a delimiter 404 character). 406 The "No two labels" property can be formally stated as: 408 If two labels L and L', embedded as for the test above, displayed in 409 a paragraph with the same directionality, are rearranged into the 410 same sequence of codepoints, neither L nor L' is a legal label. 412 4. A replacement for the RFC 3454 criterion 414 A set of rules that satisfies the tests above is as follows. The 415 main bullets give the rule, subordinate bullets (if any) give 416 justifications or examples of things that break if this rule is not 417 present. The term "unstable" means that it fails to satisfy the 418 "remain grouped" property defined above. 420 Exhaustive testing has verified that strings that satisfy this 421 criterion satisfy both the requirements above at least for all 422 strings up to 6 characters. 424 o Only characters with the BIDI properties L, R, AL, AN, EN, ES, BN, 425 ON and NSM are allowed. 427 * B, S and WS are excluded because they are separators or spaces. 429 * LRE, LRO, RLE, RLO, PDF are excluded because they are bidi 430 controls. 432 * ET is excluded because the string L ET is unstable. 434 * CS is excluded because the string L CS is unstable. 436 o ES and ON are not allowed in the first position 438 * ES R and ON R are both unstable. 440 o ES and ON, followed by zero or more NSM, is not allowed in the 441 last position 443 * L ON and L ES are both unstable. 445 o If an L is present, no R, AL or AN may be present, and vice versa. 447 o If an EN is present, no AN may be present, and vice versa. 449 o The first character may not be an NSM 451 o The first character may not be an EN (European Number) or an AN 452 (Arabic Number). 454 * If the character on both sides of a CS is an EN or an AN, the 455 labels turn unstable. 457 * Some domain names where some of the labels use leading EN and 458 AN may be problem-free, but there's no way of verifying this 459 while looking at a single label in isolation. 461 * NOTE: This is a restriction on ASCII labels when used together 462 with IDNA labels. This is a change from the existing rules for 463 ASCII labels. 465 * We could achieve stability by barring numbers at the end of 466 labels, but this may be more disruptive in practice. 468 5. Other issues in need of resolution 470 This document concerns itself only with the rules that are needed 471 when dealing with domain names with characters that have differing 472 Bidi properties, and considers characters only in terms of their Bidi 473 properties. All other issues with these scripts have to be 474 considered in other contexts. 476 Another set of issues concerns the proper display of IDNs with a 477 mixture of LTR and RTL labels, or only RTL labels. 479 It is unrealistic to expect that domain names will be written using 480 embedded formatting codes between their labels; thus, the display 481 order will be determined by the bidirectional algorithm. Thus, a 482 sequence (in network order) of R1.R2.ltr will be displayed in the 483 order 2R.1R.ltr in a LTR context, which might surprise someone 484 expecting to see labels displayed in hierarchical order. Again, this 485 memo does not attempt to suggest a solution to this problem. 487 6. Compatibility considerations 489 6.1. Backwards compatibility considerations 491 As with any change to an existing standard, it is important to 492 consider what happens with existing implementations when the change 493 is introduced. The following troublesome cases have been noted: 495 o Old program used to input the newly allowed string. If the old 496 program checks the input against RFC 3454, the string will not be 497 allowed, and that domain name will remain inaccessible. 499 o Old program is asked to display the newly allowed string, and 500 checks it against RFC 3454 before displaying. The program will 501 perform some kind of fallback, most likely displaying the Punycode 502 form of the string. 504 o Old program tries to display the newly allowed string. If the old 505 program has code for displaying the last character of a string 506 that is different from the code used to display the characters in 507 the middle of the string, display may be inconsistent and cause 508 confusion. 510 One particular example of the last case is if a program chooses to 511 examine the last character (in network order) of a string in order to 512 determine its directionality, rather than its first; if it finds an 513 NSM character and tries to display the string as if it was a left-to- 514 right string, the resulting display may be interesting, but not 515 useful. 517 The editors believe that these cases will have less harmful impact in 518 practice than continuing to deny the use of words from the languages 519 for which these strings are necessary as IDN labels. 521 This specification forbids using leading European numbers in ASCII- 522 only labels; this is in conflict with a large installed base of such 523 labels. The harm resulting from violating this rule is seen when a 524 label at the next level down in the hierarchy ends with a number 525 (Arabic or European). Zone managers, both registries and private 526 zone managers, can check for this particular condition before they 527 allow registration of any string with right-to-left characters in it; 528 generally it is best to not allow registration of any right-to-left 529 strings in a zone where the label at the level above begins with a 530 digit. 532 6.2. Forward compatibiltiy considerations 534 This text is, intentionally, specified strictly in terms of the 535 Unicode BIDI properties. The determination that the condition is 536 sufficient to fulfil the criteria depends on the Unicode BIDI 537 algorithm; it is unlikely that drastic changes will be made to this 538 algorithm. 540 However, the determination of validity for any string depends on the 541 Unicode BIDI property values, which are not declared immutable by the 542 Unicode Consortium. Furthermore, the behaviour of the algorithm for 543 any given character is likely to be linguistically and culturally 544 sensitive, so that it's not unlikely that later versions of the 545 Unicode standard may change the bidi properties assigned to certain 546 Unicode characters. 548 This memo does not propose a solution for this problem. 550 7. IANA Considerations 552 This document makes no request of IANA. 554 Note to RFC Editor: this section may be removed on publication as an 555 RFC. 557 8. Security Considerations 559 This modification will allow some strings to be used in Stringprep 560 contexts that are not allowed today. It is possible that differences 561 in the interpretation of the specification between old and new 562 implementations could pose a security risk, but it is difficult to 563 envision any specific instantiation of this. 565 Any rational attempt to compute, for instance, a hash over an 566 identifier processed by Stringprep would use network order for its 567 computation, and thus be unaffected by the changes proposed here. 569 While it is not believed to pose a problem, if display routines had 570 been written with specific knowledge of the RFC 3454 Stringprep 571 prohibitions, it is possible that the potential problems noted under 572 "backwards compatibility" could cause new kinds of confusion. 574 The rule about leading numbers, which is more restrictive than 575 current practice for domain names, has a peculiar interaction with 576 the DNAME record; a DNAME record can point to a zone where right-to- 577 left labels are registered without the knowledge or consent of the 578 zone owner; if the name of the DNAME begins with a number, this can 579 cause display of the right-to-left labels in the zone to be 580 confusing. It is recommended that DNAMEs pointing to zones allowing 581 right-to-left labels should not start with a digit, but a pointed-to 582 zone owner has no way of enforcing this. 584 9. Acknowledgements 586 While the listed editors held the pen, this document represents the 587 joint work and conclusions of an ad hoc design team. In addition to 588 the editors this consisted of, in alphabetic order, Tina Dam, Patrik 589 Faltstrom, and John Klensin. Many further specific contributions and 590 helpful comments were received from the people listed below, and 591 others who have contributed to the development and use of the IDNA 592 protocols. 594 The team wishes in particular to thank Roozbeh Pournader for calling 595 its attention to the issue with the Thaana script, Paul Hoffmann for 596 pointing out the need to be explicit about backwards compatibility 597 considerations, Ken Whistler for suggesting the basis of the 598 formalized "remain grouped" requirement, and Erik van der Poel for 599 careful review, comments and verification of the rulesets. 601 Appendix A. Change log 603 This appendix is intended to be removed when this document is 604 published as an RFC. 606 A.1. Changes from -00 to -01 608 Suggested a possible new algorithm. 610 Multiple smaller changes. 612 A.2. Changes from -01 to -02 614 Date of publication updated. 616 Change log added. 618 A.3. Changes from -02 to -03 620 Intro changed to reflect addressing the deeper issues with the Bidi 621 algorithm. 623 Gave formalized criteria for "valid strings", and documented the new 624 set of requirements for strings that satisfy the criteria. 626 Removed most of section 5, "Other problems", and noted that this memo 627 focuses ONLY on issues that can be evaluated by looking at the bidi 628 properties of characters. 630 A.4. Changes from -03 to -04 632 Added back AN to the list of allowed characters; it had been left out 633 by accident in -03. 635 Removed some rules that were redundant. 637 Added some considerations for backwards compatibility and interaction 638 with ASCII labels that start with a number. 640 Mentioned the issue with DNAME pointing to a zone containing RTL 641 labels in the security considerations section. 643 Wording updates in multiple places, including some spelling errors. 645 Rewrote the introduction section. 647 Split references into "normative" and "informative". 649 10. References 651 10.1. Normative references 653 [I-D.klensin-idnabis-issues] 654 Klensin, J., "Internationalizing Domain Names for 655 Applications (IDNA): Issues, Explanation, and Rationale", 656 draft-klensin-idnabis-issues-07 (work in progress), 657 February 2008. 659 [UAX9] Davis, M., "Unicode Standard Annex #9: The Bidirectional 660 Algorithm, revision 15", 03 2005. 662 10.2. Informative references 664 [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of 665 Internationalized Strings ("stringprep")", RFC 3454, 666 December 2002. 668 Authors' Addresses 670 Harald Tveit Alvestrand (editor) 671 Google 672 Beddingen 10 673 Trondheim, 7014 674 Norway 676 Email: harald@alvestrand.no 677 Cary Karp (editor) 678 Swedish Museum of Natural History 679 Frescativ. 40 680 Stockholm, 10405 681 Sweden 683 Phone: +46 8 5195 4055 684 Fax: 685 Email: ck@nrm.museum 686 URI: 688 Full Copyright Statement 690 Copyright (C) The IETF Trust (2008). 692 This document is subject to the rights, licenses and restrictions 693 contained in BCP 78, and except as set forth therein, the authors 694 retain all their rights. 696 This document and the information contained herein are provided on an 697 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 698 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 699 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 700 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 701 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 702 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 704 Intellectual Property 706 The IETF takes no position regarding the validity or scope of any 707 Intellectual Property Rights or other rights that might be claimed to 708 pertain to the implementation or use of the technology described in 709 this document or the extent to which any license under such rights 710 might or might not be available; nor does it represent that it has 711 made any independent effort to identify any such rights. Information 712 on the procedures with respect to rights in RFC documents can be 713 found in BCP 78 and BCP 79. 715 Copies of IPR disclosures made to the IETF Secretariat and any 716 assurances of licenses to be made available, or the result of an 717 attempt made to obtain a general license or permission for the use of 718 such proprietary rights by implementers or users of this 719 specification can be obtained from the IETF on-line IPR repository at 720 http://www.ietf.org/ipr. 722 The IETF invites any interested party to bring to its attention any 723 copyrights, patents or patent applications, or other proprietary 724 rights that may cover technology that may be required to implement 725 this standard. Please address the information to the IETF at 726 ietf-ipr@ietf.org. 728 Acknowledgment 730 Funding for the RFC Editor function is provided by the IETF 731 Administrative Support Activity (IASA).