idnits 2.17.1 draft-ietf-idn-mace-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There are 140 instances of lines with control characters in the document. ** The abstract seems to contain references ([UNICODE], [IDN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 185: '... [STD13]), it MUST NOT be converted....' RFC 2119 keyword, line 416: '... MUST treat uppercase leters and low...' RFC 2119 keyword, line 458: '... If it is, decoding process MUST fail....' RFC 2119 keyword, line 463: '... decoding process MUST fail....' RFC 2119 keyword, line 522: '...dditional checks MUST be performed aft...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 154 has weird spacing: '...decimal hexad...' == Line 220 has weird spacing: '...submode intro...' == Line 242 has weird spacing: '...submode chara...' == Line 252 has weird spacing: '...aracter subm...' == Line 265 has weird spacing: '...submode chara...' == (2 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC952' is defined on line 601, but no explicit reference was found in the text == Unused Reference: 'NAMEPREP' is defined on line 604, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'IDN' == Outdated reference: A later version (-13) exists of draft-ietf-idn-idna-01 ** Downref: Normative reference to an Unknown state RFC: RFC 952 == Outdated reference: A later version (-10) exists of draft-ietf-idn-nameprep-03 -- Possible downref: Normative reference to a draft: ref. 'ACEID' -- Possible downref: Normative reference to a draft: ref. 'BRACE' -- Possible downref: Normative reference to a draft: ref. 'DUDE' Summary: 9 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Draft M. Ishisone 3 draft-ietf-idn-mace-00.txt SRA 4 Jun 21, 2001 Y. Yoneya 5 Expires Dec 21, 2001 JPNIC 7 MACE: Modal ASCII Compatible Encoding for IDN 9 Status of this Memo 11 This document is an Internet-Draft and is subject to all provisions 12 of Section 10 of RFC2026. 14 Internet-Drafts are working documents of the Internet Engineering 15 Task Force (IETF), its areas, and its working groups. Note that 16 other groups may also distribute working documents as 17 Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet- Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at 25 http://www.ietf.org/1id-abstracts.html 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html 30 Abstract 32 MACE is a reversible transformation method from a sequence of Unicode 33 [UNICODE] characters to a sequence of ASCII letters, digits and 34 hyphens (LDH characters). It is designed to be used as an encoding 35 for internationalized domain names [IDN]. 37 Contents 39 1. Introduction 40 2. Terminology 41 3. Overview 42 4. Base32 format 43 5. Notations 44 6. Encoding Description 45 7. Encoding Procedure 46 8. Decoding Description 47 9. Decoding Procedure 48 10. ACE Identifier 49 11. Examples 51 Expires December 21th, 2001 [Page 1] 52 12. Security Considerations 53 13. References 54 14. Acknowlegdements 55 15. Authors' Address 57 1. Introduction 59 MACE is intended to be used as an ACE in the IDNA architecture 60 [IDNA], and encodes a sequence of Unicode (ISO/IEC 10646) characters 61 in the range U+0000-U+10FFFF as a sequence of LDH characters. 63 MACE is designed to have following features: 65 Completeness: Every Unicode string has a map to an LDH character 66 string. 68 Uniqueness: Every Unicode string maps to at most one LDH character 69 string. 71 Reversibility: The original Unicode string can be obtained from an 72 LDH character string to which the Unicode string maps. 74 Efficiency: The ratio of encoded size to original size is small. 75 If the code points of the Unicode string are clustered, a 76 compression algorithm enables a compact encoding. Even if they 77 are not, the encoded size is still kept small. 79 Simplicity: The encoding/decoding algorithms are fairly simple to 80 implement. 82 2. Terminology 84 LDH characters are the letters A-Z and a-z, the digits 0-9, and 85 hyphen-minus. 87 As in the Unicode Standard [UNICODE], Unicode characters are denoted 88 by "U+" followed by four to six hexadecimal digits representing its 89 UCS-4 code point. A range of Unicode characters is denoted by the 90 form "U+xxxx-U+yyyy". 92 3. Overview 94 MACE encodes a sequence of Unicode (ISO/IEC 10646) characters in the 95 range U+0000-U+10FFFF as a sequence of LDH characters. 97 MACE is a modal encoding. There are two major modes and one of which 98 has four submodes. Each character is encoded in a specific 99 mode/submode. The mode/submode is chosen according to the code point 101 Expires December 21th, 2001 [Page 2] 102 of the character and possibly its neiboring characters. The modal 103 encoding enables compact representation of each character, and the 104 modes are chosen so that mode change occurs rather infrequently as 105 long as the source string is written in a single language. 107 LDH characters are represented literally, for the compactness of the 108 encoded result. Other Unicode characters are represented as base32 109 format strings. Each of Unicode characters in Basic Multilingual 110 Plane (BMP, U+0000-U+FFFF) except LDH characters is encoded as a 111 3-octet base32 format sting, while each non-BMP (U+10000-U+10FFFF) 112 character is encoded as a 4-octet base32 format string. 114 To achieve fairly good compression for non-LDH charactes, there is 115 also a submode for differential encoding. Using this submode, 116 characters are encoded as bitwise-xor value between the code points 117 of the previous character and the current character. In this submode 118 a character is encoded as a 1 or 2 octet base32 format string. 120 So if the code points of the input string are clusterd in a small 121 region, an effective compression algorithm enables 1 or 2 122 octets/character encoding (plus some overhead for mode changes). 123 Even if the code points are widely scattered and difficult to 124 compress (such as CJK Han characters), 3 octets/character (for BMP) 125 or 4 octets/character (for Non-BMP) encoding (plus some overhead for 126 mode changes) can be achieved. 128 4. Base32 Format 130 MACE uses base32 format string to encode non-negative intergers. The 131 base32 format used for MACE is: 133 "0" = 0 = 0x00 = 00000 "g" = 16 = 0x10 = 10000 134 "1" = 1 = 0x01 = 00001 "h" = 17 = 0x11 = 10001 135 "2" = 2 = 0x02 = 00010 "i" = 18 = 0x12 = 10010 136 "3" = 3 = 0x03 = 00011 "j" = 19 = 0x13 = 10011 137 "4" = 4 = 0x04 = 00100 "k" = 20 = 0x14 = 10100 138 "5" = 5 = 0x05 = 00101 "l" = 21 = 0x15 = 10101 139 "6" = 6 = 0x06 = 00110 "m" = 22 = 0x16 = 10110 140 "7" = 7 = 0x07 = 00111 "n" = 23 = 0x17 = 10111 141 "8" = 8 = 0x08 = 01000 "o" = 24 = 0x18 = 11000 142 "9" = 9 = 0x09 = 01001 "p" = 25 = 0x19 = 11001 143 "a" = 10 = 0x0A = 01010 "q" = 26 = 0x1A = 11010 144 "b" = 11 = 0x0B = 01011 "r" = 27 = 0x1B = 11011 145 "c" = 12 = 0x0C = 01100 "s" = 28 = 0x1C = 11100 146 "d" = 13 = 0x0D = 01101 "t" = 29 = 0x1D = 11101 147 "e" = 14 = 0x0E = 01110 "u" = 30 = 0x1E = 11110 148 "f" = 15 = 0x0F = 01111 "v" = 31 = 0x1F = 11111 150 The encoding is big-endian (most-significant bits first). The 151 following shows some examples. 153 Expires December 21th, 2001 [Page 3] 154 decimal hexadecimal binary base32 string 155 ------------------------------------------------------- 156 40 0x28 00001 01000 "18" 157 9876 0x2694 01001 10100 10100 "9kk" 159 5. Notations 161 In the following description, following five functions are used. 163 base32_encode(N, LEN) 164 denotes a base32 format string of LEN octets representing number 165 N. If LEN is larger than what needs to represent N, "0" is 166 prepended. 168 base32_decode(S) 169 denotes a number which corresponds to a base32 format string S. 171 codepoint(C) 172 denotes a UCS-4 code point value for character C. 174 character(N) 175 denotes a Unicode character whose UCS-4 code point is N. 177 xor(N, M) 178 denotes a bit-wise XOR value of integer N and M. 180 6. Encoding Description 182 MACE can encode Unicode/ISO10646 characters in the range 183 U+0000-U+10FFFF. If the input string contains other characters, or 184 it represents a non-internationalized host name parts (conforms to 185 [STD13]), it MUST NOT be converted. 187 MACE has several encoding modes/submodes. There are two major modes, 188 `Literal' and `Non-Literal'. Non-Literal mode has four submodes, 189 while Literal mode has none. Each character is encoded in a specific 190 mode/submode. The encoding process of a character is: 192 1. Determine the mode/submode to encode the character. 193 2. If and only if it is necessary to change the current mode, 194 output ASCII hyphen-minus to change the mode. 195 3. If and only if it is necessary to change the current submode, 196 output the submode introducer octet (described below) to change 197 the submode. 198 4. Encode the character in the mode/submode. 200 ASCII letter and digit characters are encoded in Literal mode, while 201 non-LDH characters are encoded in Non-Literal mode. ASCII hyphen 203 Expires December 21th, 2001 [Page 4] 204 character (U+002D) can be encoded in either modes, and is always 205 encoded as a sequence of two hyphen-minus ("--"). Switching between 206 Literal mode and Non-Literal mode is indicated by an ASCII hyphen not 207 followed by another hyphen. The initial mode is Non-Literal. 209 In Literal mode, characters are encoded as they are. For example 210 ASCII character "a" is encoded as "a". In Non-Literal mode, 211 characters are encoded as a base32 format string. 213 Non-Literal mode further comprises four submodes, `BMP-A', `BMP-B', 214 `Non-BMP' and `Compress'. Every non-LDH character is encoded one of 215 these submodes. Shifting to each submode is indicated by a certain 216 octet (called introducer octet) shown below. These introducer octets 217 can be distinguished from the base32 string since they never appear 218 in the base32 string used by MACE. 220 submode introducer octet 221 --------------------------- 222 BMP-A "w" 223 BMP-B "x" 224 Non-BMP "y" 225 Compress "z" 227 Switching between Literal mode and Non-Literal mode doesn't affect 228 current submode, that is, on returning from the Literal mode, 229 previous submode is restored. This lowers the necessity of submode 230 changes. The initial submode is BMP-A. 232 BMP-A and BMP-B submodes are used for encoding characters in Unicode 233 Basic Multilingual Plane (U+0000-U+FFFF), except LDH characters. In 234 these submodes, a character is encoded as base32 format string of 3 235 octets. BMP-A is used for characters in the range U+0000-U+1FFF and 236 U+A000-U+FFFF, covering most of Western/Middle-Eastern scripts and 237 Hangul. BMP-B is used for characters in the range U+2000-U+9FFF, 238 covering CJK unification area. Those characters are first mapped to 239 integers of the range 0x0000-0x7fff (15bit integer), then converted 240 to base32 format string using the following scheme: 242 submode character range encoding 243 ----------------------------------------------------------------- 244 BMP-A U+0000-U+1FFF base32_encode(codepoint(C), 3) 245 U+A000-U+FFFF base32_encode(codepoint(C) - 0x8000, 3) 247 BMP-B U+2000-U+9FFF base32_encode(codepoint(C) - 0x2000, 3) 249 Expires December 21th, 2001 [Page 5] 250 Here are some examples: 252 character submode integer base32 string 253 --------------------------------------------- 254 U+00B0 BMP-A 0xb0 "05g" 255 U+5678 BMP-B 0x3678 "djo" 256 U+BCDE BMP-A 0x3CDE "f6u" 258 Non-BMP submode is used for encoding Unicode characters outside Basic 259 Multilingual Plane (U+10000-U+10FFFF). In this mode a character is 260 encoded as base 32 format string of 4 octets. Characters 261 U+10000-U+10FFFF are first mapped to intergers of the range 262 0x00000-0xFFFFF (20bit integer), then converted to bae32 format 263 string using the following scheme: 265 submode character range encoding 266 ------------------------------------------------------------------- 267 Non-BMP U+10000-U+10FFFF base32_encode(codepoint(C) - 0x10000, 4) 269 Compress submode is used for the efficient encoding of non-LDH 270 characters. This mode can be used for any non-LDH characters if 271 certain condition is met. In this mode, a character is encoded as a 272 bit-wise XOR value between the code point of the character (called C) 273 and the last non-LDH character before C (called PREV). The XOR value 274 (xor(codepoint(PREV), codepoint(C))) must be less than 0x200, or the 275 Compress submode cannot be used. If the XOR value is less than 16, 276 it is encoded as a base32 format string of 1 octet. Otherwise 0x200 277 is added to the XOR value, then it is encoded as a base32 format 278 string of 2 octets. When decoding, this encoding enables to determine 279 the encoded length by looking at the first octet. 281 submode character range encoding condition 282 ------------------------------------------------------------------ 283 Compress U+0000-U+10FFFF base32_encode(X, 1) if X < 16 284 base32_encode(X + 0x200, 2) if X >= 16 285 [where X is xor(codepoint(PREV), codepoint(C))] 287 There are two possible submodes for encoding a non-LDH character C, 288 one of which is Compress, and the other is one of the other three 289 (BMP-A, BMP-B, Non-BMP). The submode is determined using the 290 following algorithm. This algorithm is designed so that it chooses 291 the submode which produces shorter encoding result. 293 1. Let PREV be the last non-LDH character before C, and let NXT be 294 the first non-LDH character after C. In case C is the first 295 non-LDH character of the input string, let PREV be U+0000. 296 2. If xor(codepoint(PREV), codepoint(C)) > 0x1FF, go to 4. 297 3. If at least one of the following conditions holds, choose 298 `Compress'. Otherwise go to 4. 299 a) the current submode is `Compress' 300 b) C is non-BMP character (U+10000-U+10FFFF) 302 Expires December 21th, 2001 [Page 6] 303 c) xor(codepoint(PREV), codepoint(C)) is less than 16 304 d) NXT exists and xor(codepoint(C), codepoint(NXT)) <= 0x1ff 305 4. If C is in the range U+0000-U+1FFF or U+A000-U+FFFF, choose 306 `BMP-A'. 307 5. If C is in the range U+2000-U+9FFF, choose `BMP-B'. 308 6. Otherwise choose `Non-BMP'. 310 Initial state is set as follows. 312 mode : Non-Literal 313 submode : BMP-A 314 PREV : U+0000 316 7. Encoding Procedure 318 procedure encode(INPUT) 319 MODE = `Non-Literal' 320 SUBMODE = `BMP-A' 321 PREV = U+0000 323 while (is_not_empty(INPUT)) 324 C = read_one_character(INPUT) 325 if () 326 327 else if () 328 output("--") 329 else if () 330 if (MODE != `Literal') 331 output("-") 332 MODE = `Literal' 333 endif 334 output(C) 335 else 336 if (MODE != `Non-Literal') 337 output("-") 338 MODE = `Non-Literal' 339 endif 341 if (compressible(SUBMODE, C, PREV, INPUT) == TRUE) 342 NEW_SUBMODE = `Compress' 343 V = xor(codepoint(PREV), codepoint(C)) 344 if (V >= 16) 345 V = V + 0x200 346 LEN = 2 347 else 348 LEN = 1 349 endif 350 else 351 V = codepoint(C) 352 if (0x0000 <= V <= 0x1FFF) 353 NEW_SUBMODE = `BMP-A' 355 Expires December 21th, 2001 [Page 7] 356 LEN = 3 357 else if (0xA000 <= V <= 0xFFFF) 358 NEW_SUBMODE = `BMP-A' 359 V = V - 0x8000 360 LEN = 3 361 else if (0x2000 <= V <= 0x9FFF) 362 NEW_SUBMODE = `BMP-B' 363 V = V - 0x2000 364 LEN = 3 365 else 366 NEW_SUBMODE == `Non-BMP' 367 V = V - 0x10000 368 LEN = 4 369 endif 370 endif 371 if (NEW_SUBMODE != SUBMODE) 372 output() 373 SUBMODE = NEW_SUBMODE 374 endif 375 output(base32_encode(V, LEN)) 376 PREV = C 377 endif 378 end 379 end 381 function compressible(SUBMODE, C, PREV, INPUT) 382 if (xor(codepoint(C), codepoint(PREV)) > 0x1FF) 383 return (FALSE) 384 endif 386 # The differenct between C and PREV is confined to lower 9 bits. 387 if (SUBMODE == `Compress') 388 return (TRUE) 389 else if (codepoint(C) >= 0x10000) 390 return (TRUE) 391 else if (xor(codepoint(C), codepoint(PREV)) < 16) 392 return (TRUE) 393 else 394 395 if ( and 396 xor(codepoint(NXT), codepoint(C)) <= 0x1FF) 397 return (TRUE) 398 endif 399 endif 400 return (FALSE) 401 end 403 8. Decoding Description 405 Like encoding, MACE decoding process keeps track of the current 407 Expires December 21th, 2001 [Page 8] 408 mode/submode to decode each character. The initial state for 409 decoding is the same as that of encoding. 411 mode : Non-Literal 412 submode : BMP-A 413 PREV : U+0000 415 Because ASCII domain names are case-insensitive, decoding process 416 MUST treat uppercase leters and lowercase letters equally. 418 The consecutive two ASCII hyphen-minus characters are always decoded 419 as a single ASCII hyphen-minus, regardless of the current 420 mode/submode. ASCII hyphen-minus not followed by another 421 hyphen-minus indicates mode switching between Literal mode and 422 Non-Literal mode. 424 In Literal mode, all ASCII letter and digit characters are decoded as 425 they are. 427 In Non-Literal mode, every character is either a submode introducer 428 or a part of base32 format string. If a character is a submode 429 introducer, the current submode is changed to the corresponding 430 submode. If it isn't, it is a part of base32 format string. 432 To decode base32 format string in a certain submode, first determine 433 the length of the string which is decoded to a single Unicode 434 character. For submodes other than Compress, the number of octets 435 which encodes a character is fixed (3 for BMP-A and BMP-B, 4 for 436 Non-BMP). For Compress submode, the number of octets is variable (1 437 or 2), and can be determined by looking at the first octet. If the 438 first octet represents a number less than 16 in base32 (either 0-9, 439 a-f or A-F) the number of octets is one, otherwise two. The 440 following list shows the length of the string S and how to get the 441 decoded character in each submode. 443 submode length decoded character condition 444 -------------------------------------------------------------- 445 BMP-A 3 character(N) if N < 0x2000 446 character(N + 0x8000) if N >= 0x2000 447 BMP-B 3 character(N + 0x2000) 448 Non-BMP 4 character(N + 0x10000) 449 Compress 1 character(xor(P, N)) 450 2 character(xor(P, N - 0x200)) 451 [where N is base32_decode(S), P is codepoint(PREV)] 453 MACE decoding process can accept invalidly-encoded strings as well. 454 In order to guarantee the unique mapping, following two types of 455 check must be performed. 457 1) The decoded string must be checked if it is a [STD13] conforming 458 name. If it is, decoding process MUST fail. 460 Expires December 21th, 2001 [Page 9] 461 2) The decoded string must be re-encoded and compared to the input 462 string. If they are not equal (allowing case-difference), 463 decoding process MUST fail. 465 9. Decoding Procedure 467 procedure decode(input) 468 MODE = `Non-Literal' 469 SUBMODE = `BMP-A' 470 PREV = U+0000 472 while (is_not_empty(INPUT)) 473 C = read_one_character(INPUT) 474 if () 475 NXT = read_one_character(INPUT) 476 if () 477 output("-") 478 else 479 480 if (MODE == `Literal') 481 MODE = `Non-Literal' 482 else 483 MODE = `Literal' 484 endif 485 endif 486 else if (MODE == `Literal') 487 output(C) 488 else if () 489 SUBMODE = 490 else 491 492 if (SUBMODE == `BMP-A') 493 S = read_string_of_length(INPUT, 3) 494 V = base32_decode(S) 495 if (V >= 0x2000) 496 V = V + 0x8000 497 endif 498 else if (SUBMODE == `BMP-B') 499 S = read_string_of_length(INPUT, 3) 500 V = base32_decode(S) + 0x2000 501 else if (SUBMODE == `Non-BMP') 502 S = read_string_of_length(INPUT, 4) 503 V = base32_decode(S) + 0x10000 504 else if (SUBMODE == `Compress') 505 if () 506 S = read_string_of_length(INPUT, 1) 507 V = base32_decode(S) 508 else 509 S = read_string_of_length(INPUT, 2) 510 V = base32_decode(S) - 0x200 511 endif 512 V = PREV xor V 513 endif 514 output(character(V)) 515 PREV = character(V) 516 endif 517 end 518 end 520 The above decoding procedure accepts invalidly-encoded strings as 521 well. In order to guarantee the unique mapping, following two 522 additional checks MUST be performed after decoding: 524 1) that the decoding string is NOT a [STD13] conforming name. 525 2) that the string which is the result of re-encoding of the 526 decoded string matches the original string. 528 10. ACE Identifier 530 In order to use MACE as an ACE, there must be a certain prefix or 531 suffix string which is unlikely to be used in normal domain names and 532 thus identifies MACE-encoded domain name parts. Since MACE-encoded 533 names can begin with hyphen-minus and names beginning with 534 hyphen-minus do not conform [STD13], a prefix string should be used. 535 So if MACE is used for encoding domain name parts, the encoded names 536 should be prefixed by the prefix string. 538 This document does not specify the prefix string for MACE. The 539 actual selection should be left to certain authority such as IANA 540 [ACEID]. 542 For testing purpose, there is a registry of test prefix strings for 543 ACEs on IETF IDN working group web site [IDN]. 545 11. Examples 547 The following examples are meaningless strings, but they are designed 548 to exercise various aspects of the algorithm in order to verify the 549 correctness of the implementation. 551 (a) U+0200 U+4000 U+002D U+B001 U+40001 U+0061 552 MACE: g0x800--wc01y6001-a 554 (b) U+0061 U+002D U+0300 U+0062 U+0400 U+3000 U+002D U+5000 555 MACE: -a---0o0-b-100x400--c00 557 (c) U+1FFF U+2000 U+9FFF U+A000 U+FFFF U+10000 U+10FFFF 558 MACE: 7vvx000vvvw800vvvy0000vvvv 560 (d) U+0200 U+002F U+0030 U+0039 U+003A U+0200 U+0040 U+0041 \ 561 U+005A U+005B U+0200 U+0060 U+0061 U+007A U+007B 562 MACE: 0g001f-09-01q0g0020-AZ-02r0g0030-az-03r 564 (e) U+0061 U+0062 U+0063 U+002D U+1000 U+1200 U+002D \ 565 U+2000 U+2010 U+2200 U+002D U+3000 U+3010 566 MACE: -abc---4004g0--x00000g0g0--40040g 568 (f) U+0100 U+0102 U+0200 U+002D U+0201 U+002D U+03FE U+0061 U+0234 569 MACE: zo02w0g0--z1--vv-a-ua 571 (g) U+3000 U+002D U+3010 U+0061 U+3100 U+310F U+31FF 572 MACE: x400--zgg-a-ogfng 574 (h) U+20000 U+002D U+20100 U+0061 U+20010 U+20012 U+200FF 575 MACE: y2000--zo0-a-og2nd 577 12. Security Considerations 579 Users expect each domain name in DNS to be controlled by a single 580 authority. If a Unicode string intended for use as a domain label 581 could map to multiple ACE labels, then an internationalized domain 582 name could map to multiple ACE domain names, each controlled by a 583 different authority, some of which could be spoofs that hijack 584 service requests intended for another. Therefore MACE is designed so 585 that each Unicode string has a unique encoding. 587 13. References 589 [UNICODE] The Unicode Consortium, "The Unicode Standard", 590 http://www.unicode.org/unicode/standard/standard.html 592 [IDN] Internationalized Domain Names (IETF Working Group), 593 http://www.i-d-n.net/, idn@ops.ietf.org 595 [IDNA] Patrik Falstrom, Paul Hoffman, "Internationalizing Host 596 Names In Applications (IDNA)", draft-ietf-idn-idna-01 598 [STD13] Paul Mockapetris, "DOMAIN NAMES - IMPLEMENTATION AND 599 SPECIFICATION", Nov 1987, STD 13 (RFC 1035) 601 [RFC952] K. Harrenstien, M. Stahl, E. Feinler, "DOD Internet Host 602 Table Specification", Oct 1985, RFC 952 604 [NAMEPREP] Paul Hoffman, Marc Blanchet, "Preparation of 605 Internationalized Host Names", Feb 2001, 606 draft-ietf-idn-nameprep-03 608 [ACEID] Naomasa Maruyama, Yoshiro Yoneya, "Proposal for a determining 609 process of ACE identifier", Jun 2001, draft-ietf-idn-aceid-02 611 [BRACE] Adam M. Costello, "BRACE: Bi-mode Row-based 612 ASCII-Compatible Encoding for IDN", Sep 2000, 613 draft-ietf-idn-brace-00 615 [DUDE] Mark Welter, Brian W. Spolarich, Adam M. Costello, 616 "Differential Unicode Domain Encoding (DUDE)", Jun 2001, 617 draft-ietf-idn-dude-02 619 14. Acknowlegdements 621 Some of the ideas in MACE are taken from other ACE proposals. 623 The idea of Literal/Non-Literal mode is taken from BRACE draft 624 [BRACE] by Adam M. Costello. 626 The idea of differencial encoding used by Compress submode is taken 627 from DUDE [DUDE], by Mark Welter, Brian W. Spolarich and Adam M. 628 Costello. 630 The structure of this document and text of some sections are borrowed 631 from AMC-ACE- series draft (draft-ietf-idn-amc-ace-*) by Adam 632 M. Costello. 634 15. Authors' Address 636 Makoto Ishisone 637 Software Research Associates, Inc. 638 4-16-10, Chigasaki-Minami, Tsuzuki-ku, Yokohama, 639 Kanagawa 224-0037 Japan 640 642 Yoshiro Yoneya 643 Japan Network Information Center (JPNIC) 644 Fuundo Bldg 1F, 1-2 Kanda-ogawamachi, 645 Chiyoda-ku Tokyo 101-0052, Japan 646