idnits 2.17.1 draft-ietf-idn-mace-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 3 instances of too long lines in the document, the longest one being 1 character in excess of 72. ** There are 140 instances of lines with control characters in the document. ** The abstract seems to contain references ([UNICODE], [IDN]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 185: '... [STD13]), it MUST NOT be converted....' RFC 2119 keyword, line 416: '... MUST treat uppercase leters and low...' RFC 2119 keyword, line 458: '... If it is, decoding process MUST fail....' RFC 2119 keyword, line 463: '... decoding process MUST fail....' RFC 2119 keyword, line 522: '...dditional checks MUST be performed aft...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 154 has weird spacing: '...decimal hexad...' == Line 220 has weird spacing: '...submode intro...' == Line 242 has weird spacing: '...submode chara...' == Line 252 has weird spacing: '...aracter subm...' == Line 265 has weird spacing: '...submode chara...' == (2 more instances...) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '

' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC952' is defined on line 601, but no explicit
     reference was found in the text

  == Unused Reference: 'NAMEPREP' is defined on line 604, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'IDN'

  == Outdated reference: A later version (-13) exists of
     draft-ietf-idn-idna-01

  ** Downref: Normative reference to an Unknown state RFC: RFC  952

  == Outdated reference: A later version (-10) exists of
     draft-ietf-idn-nameprep-03

  -- Possible downref: Normative reference to a draft: ref. 'ACEID' 

  -- Possible downref: Normative reference to a draft: ref. 'BRACE' 

  -- Possible downref: Normative reference to a draft: ref. 'DUDE' 


     Summary: 9 errors (**), 0 flaws (~~), 11 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                             M. Ishisone
3	draft-ietf-idn-mace-00.txt                                         SRA
4	Jun 21, 2001                                                 Y. Yoneya
5	Expires Dec 21, 2001                                             JPNIC

7		    MACE: Modal ASCII Compatible Encoding for IDN

9	Status of this Memo

11	   This document is an Internet-Draft and is subject to all provisions
12	   of Section 10 of RFC2026.

14	   Internet-Drafts are working documents of the Internet Engineering
15	   Task Force (IETF), its areas, and its working groups.  Note that
16	   other groups may also distribute working documents as
17	   Internet-Drafts.

19	   Internet-Drafts are draft documents valid for a maximum of six months
20	   and may be updated, replaced, or obsoleted by other documents at any
21	   time.  It is inappropriate to use Internet- Drafts as reference
22	   material or to cite them other than as "work in progress."

24	   The list of current Internet-Drafts can be accessed at
25	   http://www.ietf.org/1id-abstracts.html

27	   The list of Internet-Draft Shadow Directories can be accessed at
28	   http://www.ietf.org/shadow.html

30	Abstract

32	   MACE is a reversible transformation method from a sequence of Unicode
33	   [UNICODE] characters to a sequence of ASCII letters, digits and
34	   hyphens (LDH characters).  It is designed to be used as an encoding
35	   for internationalized domain names [IDN].

37	Contents

39	   1. Introduction
40	   2. Terminology
41	   3. Overview
42	   4. Base32 format
43	   5. Notations
44	   6. Encoding Description
45	   7. Encoding Procedure
46	   8. Decoding Description
47	   9. Decoding Procedure
48	  10. ACE Identifier
49	  11. Examples

51	                      Expires December 21th, 2001              [Page  1]
52	  12. Security Considerations
53	  13. References
54	  14. Acknowlegdements
55	  15. Authors' Address

57	1. Introduction

59	   MACE is intended to be used as an ACE in the IDNA architecture
60	   [IDNA], and encodes a sequence of Unicode (ISO/IEC 10646) characters
61	   in the range U+0000-U+10FFFF as a sequence of LDH characters.

63	   MACE is designed to have following features:

65	      Completeness: Every Unicode string has a map to an LDH character
66	      string.

68	      Uniqueness: Every Unicode string maps to at most one LDH character
69	      string.

71	      Reversibility: The original Unicode string can be obtained from an
72	      LDH character string to which the Unicode string maps.

74	      Efficiency: The ratio of encoded size to original size is small.
75	      If the code points of the Unicode string are clustered, a
76	      compression algorithm enables a compact encoding.  Even if they
77	      are not, the encoded size is still kept small.

79	      Simplicity: The encoding/decoding algorithms are fairly simple to
80	      implement.

82	2. Terminology

84	   LDH characters are the letters A-Z and a-z, the digits 0-9, and
85	   hyphen-minus.

87	   As in the Unicode Standard [UNICODE], Unicode characters are denoted
88	   by "U+" followed by four to six hexadecimal digits representing its
89	   UCS-4 code point.  A range of Unicode characters is denoted by the
90	   form "U+xxxx-U+yyyy".

92	3. Overview

94	   MACE encodes a sequence of Unicode (ISO/IEC 10646) characters in the
95	   range U+0000-U+10FFFF as a sequence of LDH characters.

97	   MACE is a modal encoding.  There are two major modes and one of which
98	   has four submodes.  Each character is encoded in a specific
99	   mode/submode.  The mode/submode is chosen according to the code point

101	                      Expires December 21th, 2001              [Page  2]
102	   of the character and possibly its neiboring characters.  The modal
103	   encoding enables compact representation of each character, and the
104	   modes are chosen so that mode change occurs rather infrequently as
105	   long as the source string is written in a single language.

107	   LDH characters are represented literally, for the compactness of the
108	   encoded result.  Other Unicode characters are represented as base32
109	   format strings.  Each of Unicode characters in Basic Multilingual
110	   Plane (BMP, U+0000-U+FFFF) except LDH characters is encoded as a
111	   3-octet base32 format sting, while each non-BMP (U+10000-U+10FFFF)
112	   character is encoded as a 4-octet base32 format string.

114	   To achieve fairly good compression for non-LDH charactes, there is
115	   also a submode for differential encoding.  Using this submode,
116	   characters are encoded as bitwise-xor value between the code points
117	   of the previous character and the current character.  In this submode
118	   a character is encoded as a 1 or 2 octet base32 format string.

120	   So if the code points of the input string are clusterd in a small
121	   region, an effective compression algorithm enables 1 or 2
122	   octets/character encoding (plus some overhead for mode changes).
123	   Even if the code points are widely scattered and difficult to
124	   compress (such as CJK Han characters), 3 octets/character (for BMP)
125	   or 4 octets/character (for Non-BMP) encoding (plus some overhead for
126	   mode changes) can be achieved.

128	4. Base32 Format

130	   MACE uses base32 format string to encode non-negative intergers.  The
131	   base32 format used for MACE is:

133	       "0" =  0 = 0x00 = 00000      "g" = 16 = 0x10 = 10000
134	       "1" =  1 = 0x01 = 00001      "h" = 17 = 0x11 = 10001
135	       "2" =  2 = 0x02 = 00010      "i" = 18 = 0x12 = 10010
136	       "3" =  3 = 0x03 = 00011      "j" = 19 = 0x13 = 10011
137	       "4" =  4 = 0x04 = 00100      "k" = 20 = 0x14 = 10100
138	       "5" =  5 = 0x05 = 00101      "l" = 21 = 0x15 = 10101
139	       "6" =  6 = 0x06 = 00110      "m" = 22 = 0x16 = 10110
140	       "7" =  7 = 0x07 = 00111      "n" = 23 = 0x17 = 10111
141	       "8" =  8 = 0x08 = 01000      "o" = 24 = 0x18 = 11000
142	       "9" =  9 = 0x09 = 01001      "p" = 25 = 0x19 = 11001
143	       "a" = 10 = 0x0A = 01010      "q" = 26 = 0x1A = 11010
144	       "b" = 11 = 0x0B = 01011      "r" = 27 = 0x1B = 11011
145	       "c" = 12 = 0x0C = 01100      "s" = 28 = 0x1C = 11100
146	       "d" = 13 = 0x0D = 01101      "t" = 29 = 0x1D = 11101
147	       "e" = 14 = 0x0E = 01110      "u" = 30 = 0x1E = 11110
148	       "f" = 15 = 0x0F = 01111      "v" = 31 = 0x1F = 11111

150	   The encoding is big-endian (most-significant bits first).  The
151	   following shows some examples.

153	                      Expires December 21th, 2001              [Page  3]
154	      decimal  hexadecimal       binary         base32 string
155	      -------------------------------------------------------
156		  40       0x28           00001 01000   "18"
157		9876     0x2694     01001 10100 10100   "9kk"

159	5. Notations

161	   In the following description, following five functions are used.

163	   base32_encode(N, LEN)
164	      denotes a base32 format string of LEN octets representing number
165	      N.  If LEN is larger than what needs to represent N, "0" is
166	      prepended.

168	   base32_decode(S)
169	      denotes a number which corresponds to a base32 format string S.

171	   codepoint(C)
172	      denotes a UCS-4 code point value for character C.

174	   character(N)
175	      denotes a Unicode character whose UCS-4 code point is N.

177	   xor(N, M)
178	      denotes a bit-wise XOR value of integer N and M.

180	6. Encoding Description

182	   MACE can encode Unicode/ISO10646 characters in the range
183	   U+0000-U+10FFFF.  If the input string contains other characters, or
184	   it represents a non-internationalized host name parts (conforms to
185	   [STD13]), it MUST NOT be converted.

187	   MACE has several encoding modes/submodes.  There are two major modes,
188	   `Literal' and `Non-Literal'.  Non-Literal mode has four submodes,
189	   while Literal mode has none.  Each character is encoded in a specific
190	   mode/submode.  The encoding process of a character is:

192	      1. Determine the mode/submode to encode the character.
193	      2. If and only if it is necessary to change the current mode,
194		 output ASCII hyphen-minus to change the mode.
195	      3. If and only if it is necessary to change the current submode,
196		 output the submode introducer octet (described below) to change
197		 the submode.
198	      4. Encode the character in the mode/submode.

200	   ASCII letter and digit characters are encoded in Literal mode, while
201	   non-LDH characters are encoded in Non-Literal mode.  ASCII hyphen

203	                      Expires December 21th, 2001              [Page  4]
204	   character (U+002D) can be encoded in either modes, and is always
205	   encoded as a sequence of two hyphen-minus ("--").  Switching between
206	   Literal mode and Non-Literal mode is indicated by an ASCII hyphen not
207	   followed by another hyphen.  The initial mode is Non-Literal.

209	   In Literal mode, characters are encoded as they are.  For example
210	   ASCII character "a" is encoded as "a".  In Non-Literal mode,
211	   characters are encoded as a base32 format string.

213	   Non-Literal mode further comprises four submodes, `BMP-A', `BMP-B',
214	   `Non-BMP' and `Compress'.  Every non-LDH character is encoded one of
215	   these submodes.  Shifting to each submode is indicated by a certain
216	   octet (called introducer octet) shown below.  These introducer octets
217	   can be distinguished from the base32 string since they never appear
218	   in the base32 string used by MACE.

220	       submode  introducer octet
221	      ---------------------------
222	       BMP-A      "w"
223	       BMP-B      "x"
224	       Non-BMP    "y"
225	       Compress   "z"

227	   Switching between Literal mode and Non-Literal mode doesn't affect
228	   current submode, that is, on returning from the Literal mode,
229	   previous submode is restored.  This lowers the necessity of submode
230	   changes.  The initial submode is BMP-A.

232	   BMP-A and BMP-B submodes are used for encoding characters in Unicode
233	   Basic Multilingual Plane (U+0000-U+FFFF), except LDH characters.  In
234	   these submodes, a character is encoded as base32 format string of 3
235	   octets.  BMP-A is used for characters in the range U+0000-U+1FFF and
236	   U+A000-U+FFFF, covering most of Western/Middle-Eastern scripts and
237	   Hangul.  BMP-B is used for characters in the range U+2000-U+9FFF,
238	   covering CJK unification area.  Those characters are first mapped to
239	   integers of the range 0x0000-0x7fff (15bit integer), then converted
240	   to base32 format string using the following scheme:

242	      submode  character range  encoding
243	      -----------------------------------------------------------------
244	      BMP-A    U+0000-U+1FFF	base32_encode(codepoint(C), 3)
245		       U+A000-U+FFFF	base32_encode(codepoint(C) - 0x8000, 3)

247	      BMP-B    U+2000-U+9FFF	base32_encode(codepoint(C) - 0x2000, 3)

249	                      Expires December 21th, 2001              [Page  5]
250	   Here are some examples:

252	      character   submode  integer   base32 string
253	      ---------------------------------------------
254	      U+00B0	  BMP-A    0xb0      "05g"
255	      U+5678	  BMP-B    0x3678    "djo"
256	      U+BCDE	  BMP-A    0x3CDE    "f6u"

258	   Non-BMP submode is used for encoding Unicode characters outside Basic
259	   Multilingual Plane (U+10000-U+10FFFF).  In this mode a character is
260	   encoded as base 32 format string of 4 octets.  Characters
261	   U+10000-U+10FFFF are first mapped to intergers of the range
262	   0x00000-0xFFFFF (20bit integer), then converted to bae32 format
263	   string using the following scheme:

265	      submode  character range   encoding
266	      -------------------------------------------------------------------
267	      Non-BMP  U+10000-U+10FFFF  base32_encode(codepoint(C) - 0x10000, 4)

269	   Compress submode is used for the efficient encoding of non-LDH
270	   characters.  This mode can be used for any non-LDH characters if
271	   certain condition is met.  In this mode, a character is encoded as a
272	   bit-wise XOR value between the code point of the character (called C)
273	   and the last non-LDH character before C (called PREV).  The XOR value
274	   (xor(codepoint(PREV), codepoint(C))) must be less than 0x200, or the
275	   Compress submode cannot be used.  If the XOR value is less than 16,
276	   it is encoded as a base32 format string of 1 octet.  Otherwise 0x200
277	   is added to the XOR value, then it is encoded as a base32 format
278	   string of 2 octets.  When decoding, this encoding enables to determine
279	   the encoded length by looking at the first octet.

281	      submode   character range  encoding                     condition
282	      ------------------------------------------------------------------
283	      Compress  U+0000-U+10FFFF  base32_encode(X, 1)	      if X < 16
284	                                 base32_encode(X + 0x200, 2)  if X >= 16
285	   	  [where X is xor(codepoint(PREV), codepoint(C))]

287	   There are two possible submodes for encoding a non-LDH character C,
288	   one of which is Compress, and the other is one of the other three
289	   (BMP-A, BMP-B, Non-BMP).  The submode is determined using the
290	   following algorithm.  This algorithm is designed so that it chooses
291	   the submode which produces shorter encoding result.

293	      1. Let PREV be the last non-LDH character before C, and let NXT be
294		 the first non-LDH character after C.  In case C is the first
295		 non-LDH character of the input string, let PREV be U+0000.
296	      2. If xor(codepoint(PREV), codepoint(C)) > 0x1FF, go to 4.
297	      3. If at least one of the following conditions holds, choose
298		 `Compress'.  Otherwise go to 4.
299		  a) the current submode is `Compress'
300		  b) C is non-BMP character (U+10000-U+10FFFF)

302	                      Expires December 21th, 2001              [Page  6]
303		  c) xor(codepoint(PREV), codepoint(C)) is less than 16
304		  d) NXT exists and xor(codepoint(C), codepoint(NXT)) <= 0x1ff
305	      4. If C is in the range U+0000-U+1FFF or U+A000-U+FFFF, choose
306		 `BMP-A'.
307	      5. If C is in the range U+2000-U+9FFF, choose `BMP-B'.
308	      6. Otherwise choose `Non-BMP'.

310	   Initial state is set as follows.

312		mode    : Non-Literal
313		submode : BMP-A
314		PREV    : U+0000

316	7. Encoding Procedure

318	   procedure encode(INPUT)
319	       MODE = `Non-Literal'
320	       SUBMODE = `BMP-A'
321	       PREV = U+0000

323	       while (is_not_empty(INPUT))
324		   C = read_one_character(INPUT)
325		   if ()
326		       
327		   else if ()
328		       output("--")
329		   else if ()
330		       if (MODE != `Literal')
331			   output("-")
332			   MODE = `Literal'
333		       endif
334		       output(C)
335		   else
336		       if (MODE != `Non-Literal')
337			   output("-")
338			   MODE = `Non-Literal'
339		       endif

341		       if (compressible(SUBMODE, C, PREV, INPUT) == TRUE)
342			   NEW_SUBMODE = `Compress'
343			   V = xor(codepoint(PREV), codepoint(C))
344			   if (V >= 16)
345			       V = V + 0x200
346			       LEN = 2
347			   else
348			       LEN = 1
349			   endif
350		       else
351			   V = codepoint(C)
352			   if (0x0000 <= V <= 0x1FFF)
353			       NEW_SUBMODE = `BMP-A'

355	                      Expires December 21th, 2001              [Page  7]
356			       LEN = 3
357			   else if (0xA000 <= V <= 0xFFFF)
358			       NEW_SUBMODE = `BMP-A'
359			       V = V - 0x8000
360			       LEN = 3
361			   else if (0x2000 <= V <= 0x9FFF)
362			       NEW_SUBMODE = `BMP-B'
363			       V = V - 0x2000
364			       LEN = 3
365			   else
366			       NEW_SUBMODE == `Non-BMP'
367			       V = V - 0x10000
368			       LEN = 4
369			   endif
370		       endif
371		       if (NEW_SUBMODE != SUBMODE)
372			   output()
373			   SUBMODE = NEW_SUBMODE
374		       endif
375		       output(base32_encode(V, LEN))
376		       PREV = C
377		   endif
378	       end
379	   end

381	   function compressible(SUBMODE, C, PREV, INPUT)
382	       if (xor(codepoint(C), codepoint(PREV)) > 0x1FF)
383		   return (FALSE)
384	       endif

386	       # The differenct between C and PREV is confined to lower 9 bits.
387	       if (SUBMODE == `Compress')
388		   return (TRUE)
389	       else if (codepoint(C) >= 0x10000)
390		   return (TRUE)
391	       else if (xor(codepoint(C), codepoint(PREV)) < 16)
392		   return (TRUE)
393	       else
394		   
395		   if ( and
396		       xor(codepoint(NXT), codepoint(C)) <= 0x1FF)
397		       return (TRUE)
398		   endif
399	       endif
400	       return (FALSE)
401	   end

403	8. Decoding Description

405	   Like encoding, MACE decoding process keeps track of the current

407	                      Expires December 21th, 2001              [Page  8]
408	   mode/submode to decode each character.  The initial state for
409	   decoding is the same as that of encoding.

411		mode    : Non-Literal
412		submode : BMP-A
413		PREV    : U+0000

415	   Because ASCII domain names are case-insensitive, decoding process
416	   MUST treat uppercase leters and lowercase letters equally.

418	   The consecutive two ASCII hyphen-minus characters are always decoded
419	   as a single ASCII hyphen-minus, regardless of the current
420	   mode/submode.  ASCII hyphen-minus not followed by another
421	   hyphen-minus indicates mode switching between Literal mode and
422	   Non-Literal mode.

424	   In Literal mode, all ASCII letter and digit characters are decoded as
425	   they are.

427	   In Non-Literal mode, every character is either a submode introducer
428	   or a part of base32 format string.  If a character is a submode
429	   introducer, the current submode is changed to the corresponding
430	   submode.  If it isn't, it is a part of base32 format string.

432	   To decode base32 format string in a certain submode, first determine
433	   the length of the string which is decoded to a single Unicode
434	   character. For submodes other than Compress, the number of octets
435	   which encodes a character is fixed (3 for BMP-A and BMP-B, 4 for
436	   Non-BMP).  For Compress submode, the number of octets is variable (1
437	   or 2), and can be determined by looking at the first octet.  If the
438	   first octet represents a number less than 16 in base32 (either 0-9,
439	   a-f or A-F) the number of octets is one, otherwise two.  The
440	   following list shows the length of the string S and how to get the
441	   decoded character in each submode.

443		submode   length  decoded character             condition
444	        --------------------------------------------------------------
445		BMP-A     3       character(N)                  if N < 0x2000
446			          character(N + 0x8000)         if N >= 0x2000
447		BMP-B     3       character(N + 0x2000)
448	        Non-BMP   4       character(N + 0x10000)
449		Compress  1       character(xor(P, N))
450		          2       character(xor(P, N - 0x200))
451		   [where N is base32_decode(S), P is codepoint(PREV)]

453	   MACE decoding process can accept invalidly-encoded strings as well.
454	   In order to guarantee the unique mapping, following two types of
455	   check must be performed.

457	     1) The decoded string must be checked if it is a [STD13] conforming
458	        name.  If it is, decoding process MUST fail.

460	                      Expires December 21th, 2001              [Page  9]
461	     2) The decoded string must be re-encoded and compared to the input
462	        string.  If they are not equal (allowing case-difference),
463	        decoding process MUST fail.

465	9. Decoding Procedure

467	   procedure decode(input)
468	       MODE = `Non-Literal'
469	       SUBMODE = `BMP-A'
470	       PREV = U+0000

472	       while (is_not_empty(INPUT))
473		   C = read_one_character(INPUT)
474		   if ()
475		       NXT = read_one_character(INPUT)
476		       if ()
477			   output("-")
478		       else
479			   
480			   if (MODE == `Literal')
481			       MODE = `Non-Literal'
482			   else
483			       MODE = `Literal'
484			   endif
485		       endif
486		   else if (MODE == `Literal')
487		       output(C)
488		   else if ()
489		       SUBMODE = 
490		   else
491		       
492		       if (SUBMODE == `BMP-A')
493			   S = read_string_of_length(INPUT, 3)
494			   V = base32_decode(S)
495			   if (V >= 0x2000)
496			       V = V + 0x8000
497			   endif
498		       else if (SUBMODE == `BMP-B')
499			   S = read_string_of_length(INPUT, 3)
500			   V = base32_decode(S) + 0x2000
501		       else if (SUBMODE == `Non-BMP')
502			   S = read_string_of_length(INPUT, 4)
503			   V = base32_decode(S) + 0x10000
504		       else if (SUBMODE == `Compress')
505		           if ()
506			       S = read_string_of_length(INPUT, 1)
507			       V = base32_decode(S)
508			   else
509			       S = read_string_of_length(INPUT, 2)
510			       V = base32_decode(S) - 0x200
511			   endif
512			   V = PREV xor V
513		       endif
514		       output(character(V))
515		       PREV = character(V)
516		   endif
517	       end
518	   end

520	   The above decoding procedure accepts invalidly-encoded strings as
521	   well.  In order to guarantee the unique mapping, following two
522	   additional checks MUST be performed after decoding:

524	     1) that the decoding string is NOT a [STD13] conforming name.
525	     2) that the string which is the result of re-encoding of the
526	        decoded string matches the original string.

528	10. ACE Identifier

530	   In order to use MACE as an ACE, there must be a certain prefix or
531	   suffix string which is unlikely to be used in normal domain names and
532	   thus identifies MACE-encoded domain name parts.  Since MACE-encoded
533	   names can begin with hyphen-minus and names beginning with
534	   hyphen-minus do not conform [STD13], a prefix string should be used.
535	   So if MACE is used for encoding domain name parts, the encoded names
536	   should be prefixed by the prefix string.

538	   This document does not specify the prefix string for MACE.  The
539	   actual selection should be left to certain authority such as IANA
540	   [ACEID].

542	   For testing purpose, there is a registry of test prefix strings for
543	   ACEs on IETF IDN working group web site [IDN].

545	11. Examples

547	   The following examples are meaningless strings, but they are designed
548	   to exercise various aspects of the algorithm in order to verify the
549	   correctness of the implementation.

551	   (a) U+0200 U+4000 U+002D U+B001 U+40001 U+0061
552	       MACE: g0x800--wc01y6001-a

554	   (b) U+0061 U+002D U+0300 U+0062 U+0400 U+3000 U+002D U+5000
555	       MACE: -a---0o0-b-100x400--c00

557	   (c) U+1FFF U+2000 U+9FFF U+A000 U+FFFF U+10000 U+10FFFF
558	       MACE: 7vvx000vvvw800vvvy0000vvvv

560	   (d) U+0200 U+002F U+0030 U+0039 U+003A U+0200 U+0040 U+0041 \
561	         U+005A U+005B U+0200 U+0060 U+0061 U+007A U+007B
562	       MACE: 0g001f-09-01q0g0020-AZ-02r0g0030-az-03r

564	   (e) U+0061 U+0062 U+0063 U+002D U+1000 U+1200 U+002D \
565		  U+2000 U+2010 U+2200 U+002D U+3000 U+3010
566	       MACE: -abc---4004g0--x00000g0g0--40040g

568	   (f) U+0100 U+0102 U+0200 U+002D U+0201 U+002D U+03FE U+0061 U+0234
569	       MACE: zo02w0g0--z1--vv-a-ua

571	   (g) U+3000 U+002D U+3010 U+0061 U+3100 U+310F U+31FF
572	       MACE: x400--zgg-a-ogfng

574	   (h) U+20000 U+002D U+20100 U+0061 U+20010 U+20012 U+200FF
575	       MACE: y2000--zo0-a-og2nd

577	12. Security Considerations

579	   Users expect each domain name in DNS to be controlled by a single
580	   authority.  If a Unicode string intended for use as a domain label
581	   could map to multiple ACE labels, then an internationalized domain
582	   name could map to multiple ACE domain names, each controlled by a
583	   different authority, some of which could be spoofs that hijack
584	   service requests intended for another.  Therefore MACE is designed so
585	   that each Unicode string has a unique encoding.

587	13. References

589	   [UNICODE]  The Unicode Consortium, "The Unicode Standard",
590	   http://www.unicode.org/unicode/standard/standard.html

592	   [IDN]  Internationalized Domain Names (IETF Working Group),
593	   http://www.i-d-n.net/,  idn@ops.ietf.org

595	   [IDNA]  Patrik Falstrom, Paul Hoffman, "Internationalizing Host
596	   Names In Applications (IDNA)",  draft-ietf-idn-idna-01

598	   [STD13]  Paul Mockapetris, "DOMAIN NAMES - IMPLEMENTATION AND
599	   SPECIFICATION",  Nov 1987,  STD 13 (RFC 1035)

601	   [RFC952]  K. Harrenstien, M. Stahl, E. Feinler,  "DOD Internet Host
602	   Table Specification",  Oct 1985,  RFC 952

604	   [NAMEPREP]  Paul Hoffman, Marc Blanchet,  "Preparation of
605	   Internationalized Host Names",  Feb 2001,
606	   draft-ietf-idn-nameprep-03

608	   [ACEID] Naomasa Maruyama, Yoshiro Yoneya, "Proposal for a determining
609	   process of ACE identifier", Jun 2001, draft-ietf-idn-aceid-02

611	   [BRACE]  Adam M. Costello, "BRACE: Bi-mode Row-based
612	   ASCII-Compatible Encoding for IDN", Sep 2000,
613	   draft-ietf-idn-brace-00

615	   [DUDE]  Mark Welter, Brian W. Spolarich, Adam M. Costello,
616	   "Differential Unicode Domain Encoding (DUDE)", Jun 2001,
617	   draft-ietf-idn-dude-02

619	14. Acknowlegdements

621	   Some of the ideas in MACE are taken from other ACE proposals.

623	   The idea of Literal/Non-Literal mode is taken from BRACE draft
624	   [BRACE] by Adam M. Costello.

626	   The idea of differencial encoding used by Compress submode is taken
627	   from DUDE [DUDE], by Mark Welter, Brian W. Spolarich and Adam M.
628	   Costello.

630	   The structure of this document and text of some sections are borrowed
631	   from AMC-ACE- series draft (draft-ietf-idn-amc-ace-*) by Adam
632	   M. Costello.

634	15. Authors' Address

636	   Makoto Ishisone
637	   Software Research Associates, Inc.
638	   4-16-10, Chigasaki-Minami, Tsuzuki-ku, Yokohama,
639	   Kanagawa 224-0037 Japan
640	   

642	   Yoshiro Yoneya
643	   Japan Network Information Center (JPNIC)
644	   Fuundo Bldg 1F, 1-2 Kanda-ogawamachi,
645	   Chiyoda-ku Tokyo 101-0052, Japan
646