idnits 2.17.1 draft-farah-adntf-ling-guidelines-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 5, 2009) is 5557 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3490 (ref. '1') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (ref. '2') (Obsoleted by RFC 5891) == Outdated reference: A later version (-17) exists of draft-ietf-idnabis-rationale-06 == Outdated reference: A later version (-18) exists of draft-ietf-idnabis-protocol-08 == Outdated reference: A later version (-07) exists of draft-ietf-idnabis-bidi-03 == Outdated reference: A later version (-09) exists of draft-ietf-idnabis-tables-05 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. El-Sherbiny 3 Internet-Draft M. Farah 4 Intended status: Informational UN-ESCWA 5 Expires: August 9, 2009 I. Oueichek 6 Syrian Telecom Establishment 7 A. Al-Zoman 8 SaudiNIC, CITC 9 February 5, 2009 11 Linguistic Guidelines for the Use of the Arabic Language in Internet 12 Domains 13 draft-farah-adntf-ling-guidelines-04.txt 15 Status of this Memo 17 This Internet-Draft is submitted to IETF in full conformance with 18 the provisions of BCP 78 and BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Copyright (c) 2009 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. 46 This Internet-Draft will expire on August 9, 2009. 48 Abstract 50 This document constitutes technical specifications for the use of 51 Arabic in Internet Domain names and provides linguistic guidelines 52 for Arabic Domain Names. It addresses Arabic-specific linguistic 53 issues pertaining to the use of Arabic language in domain names. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Arabic Language-Specific Issues . . . . . . . . . . . . . . . 4 59 2.1. Linguistic Issues . . . . . . . . . . . . . . . . . . . . 4 60 2.1.1. Diacritics (tashkeel) and Shadda . . . . . . . . . . . 5 61 2.1.2. Kasheeda or Tatweel (Horizontal Character Size 62 Extension) . . . . . . . . . . . . . . . . . . . . . . 5 63 2.1.3. Character Folding . . . . . . . . . . . . . . . . . . 5 64 2.2. Supported Character Set . . . . . . . . . . . . . . . . . 6 65 2.3. Arabic Linguistic Issues Affected By Technical 66 Constraints . . . . . . . . . . . . . . . . . . . . . . . 8 67 2.3.1. Numerals . . . . . . . . . . . . . . . . . . . . . . . 8 68 2.3.2. The Space Character . . . . . . . . . . . . . . . . . 8 69 3. Summary and Conclusion . . . . . . . . . . . . . . . . . . . . 9 70 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 71 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 72 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 9 73 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 74 7.1. Normative References . . . . . . . . . . . . . . . . . . . 10 75 7.2. Informative References . . . . . . . . . . . . . . . . . . 10 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 77 Intellectual Property and Copyright Statements . . . . . . . . . . 12 79 1. Introduction 81 The Internet Engineering Task Force (IETF) issued in March 2003 a set 82 of RFCs for Internationalized Domain Names (IDN) [1],[2], [3] which 83 were planned to become the de facto standard for all languages. In 84 2007 and 2008, new versions of the internet-drafts proposing the 85 revisions to the IDNA protocol have been released and are as follows: 87 o Internationalizing Domain Names for Applications (IDNA): Issues 88 and Rationale [5] 90 o Internationalizing Domain Names in Applications (IDNA): Protocol 91 [6] 93 o An IDNA problem in right-to-left scripts [7] 95 o The Unicode Codepoints and IDN [8] 97 Those documents are known collectively as "IDNA2008". 99 This document constitutes a technical specification for the 100 implementation of the IDN standards in the case of the Arabic 101 Language. It will allow the use of standard language tables to write 102 domain names in Arabic characters. Therefore, it should be 103 considered as a logical extension to the IDN standards. It thus 104 presents guidelines for the proper use of Arabic characters with the 105 IDN standards in an Arabic language context. 107 This document reflects the recommendations of the Arab Working Group 108 on Arabic Domain Names (AWG-ADN) established by the League of Arab 109 States (LAS), based on standardisation efforts of the United Nations 110 Economic and Social Commission for Western Asia (UN-ESCWA) and its 111 Internet- Draft, "Guidelines for an Arabic Internet Domain Name" [9]. 112 It is also in full harmony with recent rigorous discussions that took 113 place with the major language communities that also use the Arabic 114 script in their languages. 116 This document provides guidelines for the ways Arabic characters may 117 be used for registering Internet Domain Names and how linguistic 118 specific issues should be handled. A few rules are recommended for 119 application at the protocol level. 121 The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", and "MAY" 122 in this document are to be interpreted as described in RFC 2119 [4]. 124 Comments on this document are solicited and should be addressed to 125 the working group's mailing list at ESCWA-ICTD@un.org and/or the 126 author(s). 128 2. Arabic Language-Specific Issues 130 The main objective of the creation of Arabic Domain Names is to have 131 a vehicle to increase Internet use amongst all strata of the Arabic- 132 speaking communities. 134 Furthermore, a non-user friendly Domain Name would further add to the 135 ambiguity and the eccentricity of the Internet to the Arabic-speaking 136 communities, thus contributing negatively to the spread of the 137 Internet and leading to further isolation of these communities at the 138 global level. 140 Hence, there have been intensive efforts especially those spearheaded 141 by Dr. Al-Zoman and contributed to by UN-ESCWA and its Arabic Domain 142 Names Task Force (ADN-TF) to reach consensus on a multitude of 143 linguistic issues with the following goals: 145 o To define the accepted Arabic character set to be used for writing 146 domain names in Arabic; which is the subject of this document. 148 o To define the top-level domains of the Arabic domain name tree 149 structure (i.e., Arabic gTLDs and ccTLDs). This goal will be 150 handled in a separate document. 152 The first meeting of the AWG-ADN, held in Damascus January-February 153 2005, gave special attention to the following: 155 a. Simplification of the domain names, whenever possible, to 156 facilitate the interaction of the Arabic user with the Internet. 158 b. Adoption of solutions that do not lead to confusion either in 159 reading or in writing, provided that this does not compromise the 160 linguistic correctness of used words. 162 c. Mixing Arabic and non-Arabic letters in the domain name label is 163 not acceptable. 165 2.1. Linguistic Issues 167 There are a number of linguistic issues that have been proposed with 168 respect to the use of the Arabic language in domain names. This 169 section will highlight some of them. This section is based on the 170 papers of Dr. Al-Zoman [10] [11] and the report of the first meeting 171 of AWG-ADN [12]. For details the reader is encouraged to review the 172 references. 174 2.1.1. Diacritics (tashkeel) and Shadda 176 Tashkeel and Shadda are accent marks placed above or below Arabic 177 letters to produce proper pronunciation. They are thus used to 178 differentiate different meanings for different words with the same 179 base characters. 181 Neither Tashkeel nor Shadda are permitted in zone files when 182 registering domain names in the Arabic language, although they are 183 permitted in the current edition of IDNA2008. They can be supported 184 or ignored, if necessary, in the user interface with local mappings 185 and stripped before IDNA processing. 187 The following are their Unicode presentations: 188 U+064B ARABIC FATHATAN 189 U+064C ARABIC DAMMATAN 190 U+064D ARABIC KASRATAN 191 U+064E ARABIC FATHA 192 U+064F ARABIC DAMMA 193 U+0650 ARABIC KASRA 194 U+0651 ARABIC SHADDA 195 U+0652 ARABIC SUKUN 197 2.1.2. Kasheeda or Tatweel (Horizontal Character Size Extension) 199 Kasheeda (U+0640 ARABIC TATWEEL) must not be used in Arabic domain 200 names and should be disallowed for Arabic language domain names. The 201 Kasheeda is not a letter and does not have an effect on 202 pronunciation. It is used to extend the horizontal length or change 203 the shape of the preceding letter for graphical representation 204 purposes in Arabic writing. Accordingly, it has no value for the 205 writing of domain names. The same applies to all languages using the 206 Arabic script. The authors recommend that it should be disallowed 207 at the protocol level. 209 2.1.3. Character Folding 211 Character folding is the process where multiple letters (that may 212 have some similarity with respect to their shapes) are folded into 213 one shape. Examples of such Arabic characters include: 215 o Folding Teh Marbuta (U+0629) and Heh (U+0647) at the end of a 216 word; 218 o Folding different forms of Hamzah (U+0622, U+0623, U+0625, 219 U+0627); 221 o Folding Alef Maksura (U+0649) and Yeh (U+064A) at the end of a 222 word; 224 o Folding Waw with Hamzah Above (U+0624) and Waw (U+0648). 226 With respect to the Arabic language, character folding is not 227 acceptable because it changes the meaning of words and it is against 228 the principle of spelling rules. Replacing a character valid for use 229 in domain names with another character also valid for use in domain 230 names, which may have a similar shape, will give a different meaning. 231 This will lead to have only one word representing several words 232 consisting of all the combinations of folded characters. Hence, the 233 other words will be masked by a single word [10]. 235 Mis-spelling or handwriting errors do occur leading to mixing 236 different characters despite the fact that this is not the case in 237 published and printed materials. One of the motivations of this 238 effort is to preserve the language particularly with the spread of 239 the globalization movement. Within this context, character folding 240 is working against this motivation since it is going to have a 241 negative affect on the principle and ethics of the language. 242 Technology should work for preserving the language and not for 243 destroying it. Thus, character folding should not be allowed. The 244 case of digits is treated in a separate section below. 246 2.2. Supported Character Set 248 A domain name to be written in Arabic must be composed of a sequence 249 of the following UNICODE characters and the FULL STOP (u+002E) to 250 seperate the labels. These are based on UNICODE version 5.0. The 251 tables below are constructed using an inclusion-based approach. 252 Thus, characters that are not part of the table are prohibited. 254 +---------+-------------------------------------+ 255 | Unicode | Character Name | 256 +---------+-------------------------------------+ 257 | 0621 | ARABIC LETTER HAMZA | 258 | 0622 | ARABIC LETTER ALEF WITH MADDA ABOVE | 259 | 0623 | ARABIC LETTER ALEF WITH HAMZA ABOVE | 260 | 0624 | ARABIC LETTER WAW WITH HAMZA ABOVE | 261 | 0625 | ARABIC LETTER ALEF WITH HAMZA BELOW | 262 | 0626 | ARABIC LETTER YEH WITH HAMZA ABOVE | 263 | 0627 | ARABIC LETTER ALEF | 264 | 0628 | ARABIC LETTER BEH | 265 | 0629 | ARABIC LETTER TEH MARBUTA | 266 | 062A | ARABIC LETTER TEH | 267 | 062B | ARABIC LETTER THEH | 268 | 062C | ARABIC LETTER JEEM | 269 | 062D | ARABIC LETTER HAH | 270 | 062E | ARABIC LETTER KHAH | 271 | 062F | ARABIC LETTER DAL | 272 | 0630 | ARABIC LETTER THAL | 273 | 0631 | ARABIC LETTER REH | 274 | 0632 | ARABIC LETTER ZAIN | 275 | 0633 | ARABIC LETTER SEEN | 276 | 0634 | ARABIC LETTER SHEEN | 277 | 0635 | ARABIC LETTER SAD | 278 | 0636 | ARABIC LETTER DAD | 279 | 0637 | ARABIC LETTER TAH | 280 | 0638 | ARABIC LETTER ZAH | 281 | 0639 | ARABIC LETTER AIN | 282 | 063A | ARABIC LETTER GHAIN | 283 | 0641 | ARABIC LETTER FEH | 284 | 0642 | ARABIC LETTER QAF | 285 | 0643 | ARABIC LETTER KAF | 286 | 0644 | ARABIC LETTER LAM | 287 | 0645 | ARABIC LETTER MEEM | 288 | 0646 | ARABIC LETTER NOON | 289 | 0647 | ARABIC LETTER HEH | 290 | 0648 | ARABIC LETTER WAW | 291 | 0649 | ARABIC LETTER ALEF MAKSURA | 292 | 064A | ARABIC LETTER YEH | 293 | 0660 | ARABIC-INDIC DIGIT ZERO | 294 | 0661 | ARABIC-INDIC DIGIT ONE | 295 | 0662 | ARABIC-INDIC DIGIT TWO | 296 | 0663 | ARABIC-INDIC DIGIT THREE | 297 | 0664 | ARABIC-INDIC DIGIT FOUR | 298 | 0665 | ARABIC-INDIC DIGIT FIVE | 299 | 0666 | ARABIC-INDIC DIGIT SIX | 300 | 0667 | ARABIC-INDIC DIGIT SEVEN | 301 | 0668 | ARABIC-INDIC DIGIT EIGHT | 302 | 0669 | ARABIC-INDIC DIGIT NINE | 303 +---------+-------------------------------------+ 305 Source: Supporting the Arabic Language in Domain Names [10] 307 Table 1: CHARACTERS FROM UNICODE ARABIC TABLE (0600-06FF) 308 +---------+-----------------+ 309 | Unicode | Digit Name | 310 +---------+-----------------+ 311 | 0030 | DIGIT ZERO | 312 | 0031 | DIGIT ONE | 313 | 0032 | DIGIT TWO | 314 | 0033 | DIGIT THREE | 315 | 0034 | DIGIT FOUR | 316 | 0035 | DIGIT FIVE | 317 | 0036 | DIGIT SIX | 318 | 0037 | DIGIT SEVEN | 319 | 0038 | DIGIT EIGHT | 320 | 0039 | DIGIT NINE | 321 | 002D | HYPHEN-MINUS | 322 +---------+-----------------+ 324 Source: Supporting the Arabic Language in Domain Names [11] 326 Table 2: CHARACTERS FROM UNICODE BASIC LATIN TABLE (0000-007F) 328 2.3. Arabic Linguistic Issues Affected By Technical Constraints 330 In this section, technical aspects of some linguistic issues are 331 discussed. 333 2.3.1. Numerals 335 In the Arab countries, there are two sets of numerical digits used: 337 o Set I: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) mostly used in the western 338 part of the Arab world. 340 o Set II: (u+0660, u+0661, u+0662, u+0663, u+0664, u+0665, u+0666, 341 u+0667, u+0668, u+0669) mostly used in the eastern part of the 342 Arab world. 344 Both sets may be supported in the user interface; however, the rule 345 of numeral homogeneity must be observed. The rule specifies that 346 digits from the Arabic-Indic set of numerals (u+0660 to u+0669) 347 should not be allowed to mix with ASCII digits (u+0030 to u+0039) 348 within the same Arabic domain name label. Thus the appearance of a 349 digit from one set prevents the use of any other digit from the other 350 set. 352 2.3.2. The Space Character 354 The space character is strictly disallowed in domain names, as it is 355 a control character. Instead, the hyphen (Al-sharta) (i.e.u+02D) is 356 proposed as a separator between Arabic words to avoid confusion that 357 can take place if the words are typed without a separator. 359 It is acceptable to use the hyphen to separate between words within 360 the same domain name label. 362 3. Summary and Conclusion 364 The proposed guidelines are in full accordance with the IETF IDN 365 standards and take into account Arabic language-specific issues 366 within a compromise between grammatical rules of the Arabic language 367 and the ease of use of the language on the Internet. 369 In summary, the guidelines specify that in Arabic domain names: 371 o Accent marks (Tashkeel and Shadda) are not permitted. 372 o Character folding is not permitted. 373 o If a numeral from the Arabic-Indic or ASCII digit sets appears 374 in a label, numeral homogeneity is required. 375 o The hyphen must be used as a word separator instead of space. 377 4. Security Considerations 379 No particular security considerations could be identified regarding 380 the use of Arabic characters in writing domain names. In particular, 381 any potential visual confusion between different character strings is 382 avoided using the guidelines proposed in this document. 384 5. IANA Considerations 386 This document has no action for IANA. 388 6. Acknowledgments 390 ESCWA ICT Division provided support and funding for the development 391 of this document with the objective of reaching a standard for a 392 comprehensive Arabic Domain Names. Thanks are due to SaudiNIC for 393 its continuous efforts in supporting the development of Arabic Domain 394 Names. 396 John Klensin and Harald Alvestrand reviewed the document and provided 397 useful editorial and substantive support to enrich it. 399 7. References 401 7.1. Normative References 403 [1] Faltstrom, P., Hoffman, P., and A. Costello, 404 "Internationalizing Domain Names in Applications (IDNA)", 405 RFC 3490, March 2003. 407 [2] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile 408 for Internationalized Domain Names (IDN)", RFC 3491, 409 March 2003. 411 [3] Costello, A., "Punycode: A Bootstring encoding of Unicode for 412 Internationalized Domain Names in Applications (IDNA)", 413 RFC 3492, March 2003. 415 [4] Bradner, S., "Key words for use in RFCs to Indicate Requirement 416 Levels", BCP 14, RFC 2119, March 1997. 418 7.2. Informative References 420 [5] Klensin, J., "Internationalized Domain Names for Applications 421 (IDNA): Definitions, Background and Rationale", 422 draft-ietf-idnabis-rationale-06 (work in progress), 423 September 2008. 425 [6] Klensin, J., "Internationalized Domain Names in Applications 426 (IDNA): Protocol", draft-ietf-idnabis-protocol-08 (work in 427 progress), September 2008. 429 [7] Alvestrand, H. and C. Karp, "An updated IDNA criterion for 430 right-to-left scripts", draft-ietf-idnabis-bidi-03 (work in 431 progress), July 2008. 433 [8] Faltstrom, P., "The Unicode Codepoints and IDNA", 434 draft-ietf-idnabis-tables-05 (work in progress), July 2008. 436 [9] United Nations Economic and Social Commission for Western Asia 437 (UN-ESCWA), "Guidelines for an Arabic Domain Name System 438 (ADNS)", Internet-Draft farah-adntf-adns-guidelines-03.txt, 439 November 2007. 441 [10] Al-Zoman, A., "Supporting the Arabic Language in Domain Names", 442 October 2003, . 445 [11] Al-Zoman, A., "Arabic Top-Level Domains", July 2003. 447 Paper presented in EGM on promotion of Digital Arabic Content, 448 the United Nations, ESCWA, Beirut 450 [12] League of Arab States, "Report of the first meeting of AWG-ADN, 451 Damascus", February 2005, 452 . 454 This document is in Arabic. 456 Authors' Addresses 458 Ayman El-Sherbiny 459 Information and Communication Technology Division ESCWA 460 UN-House 461 P.O. Box 11-8575 462 Beirut 463 Lebanon 465 Email: El-sherbiny@un.org 467 Mansour Farah 468 Information and Communication Technology Division ESCWA 469 UN-House 470 P.O. Box 11-8575 471 Beirut 472 Lebanon 474 Email: farah14@un.org 476 Ibaa Oueichek 477 Syrian Telecom Establishment 478 Damascus 479 Syria 481 Email: oueichek@scs-net.org 483 Abdulaziz H. Al-Zoman, PhD 484 SaudiNIC, General Directorate of Internet Services 485 IT Sector, CITC 486 King Abdulaziz City for Science and Technology 487 PO Box 6086 488 Riyadh 11442 489 Saudi Arabia 491 Email: azoman@citc.gov.sa 493 Copyright and License Notice 495 Copyright (c) 2009 IETF Trust and the persons identified as the 496 document authors. All rights reserved. 498 This document is subject to BCP 78 and the IETF Trust's Legal 499 Provisions Relating to IETF Documents 500 (http://trustee.ietf.org/license-info) in effect on the date of 501 publication of this document. Please review these documents 502 carefully, as they describe your rights and restrictions with respect 503 to this document. 505 All IETF Documents and the information contained therein are provided 506 on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 507 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE 508 IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL 509 WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 510 WARRANTY THAT THE USE OF THE INFORMATION THEREIN WILL NOT INFRINGE 511 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 512 FOR A PARTICULAR PURPOSE. 514 Intellectual Property 516 The IETF Trust takes no position regarding the validity or scope of 517 any Intellectual Property Rights or other rights that might be 518 claimed to pertain to the implementation or use of the technology 519 described in any IETF Document or the extent to which any license 520 under such rights might or might not be available; nor does it 521 represent that it has made any independent effort to identify any 522 such rights. 524 Copies of Intellectual Property disclosures made to the IETF 525 Secretariat and any assurances of licenses to be made available, or 526 the result of an attempt made to obtain a general license or 527 permission for the use of such proprietary rights by implementers or 528 users of this specification can be obtained from the IETF on-line IPR 529 repository at http://www.ietf.org/ipr. 531 The IETF invites any interested party to bring to its attention any 532 copyrights, patents or patent applications, or other proprietary 533 rights that may cover technology that may be required to implement 534 any standard or specification contained in an IETF Document. Please 535 address the information to the IETF at ietf-ipr@ietf.org.