idnits 2.17.1 

draft-ietf-idn-sace-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Line 73 has weird spacing: '...  value  chara...'

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (27 August 2000) is 8642 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2279' is defined on line 174, but no explicit
     reference was found in the text

  == Unused Reference: 'Unicode' is defined on line 181, but no explicit
     reference was found in the text

  == Unused Reference: 'IDNREQ' is defined on line 186, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2279 (Obsoleted by RFC 3629)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO10646'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Unicode'

  -- No information found for draft-ietf-idn-requirement - is the name
     correct?

  -- Possible downref: Normative reference to a draft: ref. 'IDNREQ' 

  -- Possible downref: Normative reference to a draft: ref. 'RACE' 


     Summary: 5 errors (**), 0 flaws (~~), 6 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Draft                                            Dan Oscarsson
2	draft-ietf-idn-sace-00.txt                                Telia ProSoft
3	Expires: 27 February 2001                                 27 August 2000

5	                Simple ASCII Compatible Encoding (SACE)

7	Status of this memo

9	   This document is an Internet-Draft and is in full conformance with
10	   all provisions of Section 10 of RFC2026.

12	   Internet-Drafts are working documents of the Internet Engineering
13	   Task Force (IETF), its areas, and its working groups. Note that other
14	   groups may also distribute working documents as Internet-Drafts.

16	   Internet-Drafts are draft documents valid for a maximum of six months
17	   and may be updated, replaced, or obsoleted by other documents at any
18	   time. It is inappropriate to use Internet-Drafts as reference
19	   material or to cite them other than as "work in progress."

21	     The list of current Internet-Drafts can be accessed at
22	     http://www.ietf.org/ietf/1id-abstracts.txt

24	     The list of Internet-Draft Shadow Directories can be accessed at
25	     http://www.ietf.org/shadow.html.

27	Abstract

29	   This document describes a way to encode non-ASCII characters in host
30	   names in a way that is completely compatible with the current ASCII
31	   only host names that are used in DNS. It can be used both with DNS to
32	   support software only handling ASCII host names and as a way to
33	   downgrade from 8-bit text to ASCII in protocols.

35	1. Introduction

37	   This document defines an ASCII Compatible Encoding (ACE) of names
38	   that can be used when communicating with DNS. It is needed during a
39	   transition period when non-ASCII names are introduced in DNS to avoid
40	   breaking programs expecting ASCII only.

42	   The Simple ASCII Compatible Encoding (SACE) defined here can be
43	   compared to [RACE]. The main differences are:
44	    - RACE encodes by first compressing and the encoding the resulting
45	      bit stream into ASCII. SACE encodes each character directly in one
46	      pass.
47	    - SACE recognises that at lot of latin based names are mostly
48	      composed of ASCII characters and gives a higher compression for
49	      those.  In the 63 byte limit of DNS RACE will allow 36 characters
50	      for ISO 8859-1 and less if characters from the additional Latin
51	      characters are needed. SACE will allow around 40 characters if
52	      about 10 % of a Latin name is non-ASCII (in the UCS [ISO10646]
53	      range 0-0x217).  SACE is closer to the compression that UTF-8 have
54	      than RACE.
55	    - Most ASCII characters will not be encoded so Latin based names
56	      composed of mostly ASCII characters will be somewhat readable.

58	1.1 Terminology

60	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
61	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
62	   document are to be interpreted as described in [RFC2119].

64	2. Simple ASCII Compatible Encoding

66	   The encoding encodes values using the available characters allowed in
67	   a ASCII host name (a-z0-9 and hyphen).

69	   Values are encoded as follows:

71	                    Character - value mapping

73	      value  character               value  character
74	         0       a                     18       s
75	         1       b                     19       t
76	         2       c                     20       u
77	         3       d                     21       v
78	         4       e                     22       w
79	         5       f                     23       x
80	         6       g                     24       y
81	         7       h                     25       z
82	         8       i                     26       1
83	         9       j                     27       2
84	        10       k                     28       3
85	        11       l                     29       4
86	        12       m                     30       7
87	        13       n                     31       9
88	        14       o                     32       0
89	        15       p                     33       8
90	        16       q                     34       5
91	        17       r                     35       6

93	   In the following description the following syntax will be used:
94	      B => one value in the range 0-35 mapped to a character as above
95	      X => one value in the range 0-31 mapped to a character as above

97	   Each UCS character is identified as follows:
98	      latin  => a character in the range 0-0x217
99	      10bit  => a character in the range 0x218-0x2FFF
100	      base36 => all other characters

102	   During encoding/decoding a string a current mode is used. In each
103	   mode characters are encoded like this:
104	      latin  => as themselves, 00 for 0, 88 for 8 or as 10 bit value
105	                encoded as 0XX (two 5 bit values)
106	      10bit  => as 15 bits represented by its current prefix of 5 bits
107	                followed by 10 bits encoded as XX
108	                (the value is the 15 bits of prefix and
109	                10 bits concatenated)
110	      base36 => as a base 36 value represented by its current base 36
111	                prefix followed by three base 36 digits encoded as BBB
112	                (the value is prefix*36*36*36*36+B*36*36+B*36+B)
113	                Before encoding the character value must first be
114	                reduced:
115	                  if >= 0xd800 reduce by 8192 (private/surrogate start)
116	                  then reduce by 0x2FFF.
117	                After decoding the character value need to be restored
118	   as
119	                  add 0x2FFF
120	                  followed by adding 8192 if >= 0xd800

122	2.1 Decoding a string

124	   During decode you start with:
125	      Mode: latin
126	      10bit prefix: 0
127	      base36 prefix: 0

129	   Then the characters in an encoded string are interpreted as follows
130	   depending on current mode:

132	    When in latin mode:
133	      00  => the character 0
134	      0XX => XX represents 10 bits which decodes to one character
135	      88  => the character 8
136	      85  => switch to 10bit mode with same prefix as last time
137	      8X5 => switch 10 10bit mode setting X as current 10bit prefix
138	      87  => switch to base36 mode with same prefix as last time
139	      8X7 => switch to base36 mode setting X as current base36 prefix
140	      other  => the characters represent itself

142	    When in 10bit mode
143	      - => the character -
144	      0 => switch to latin mode
145	      X5 => switch 10 10bit mode using X as current prefix
146	      7  => switch to base36 mode with same prefix as last time
147	      X7 => switch to base36 mode using X as current prefix
148	      XX => current 10bit prefix plus XX gives the character

150	    When in base36 mode
151	      -- => the character -
152	      -0 => switch to latin mode
153	      -5 => switch to 10bit mode with same prefix as last time
154	      -X5 => switch 10 10bit mode setting X as current prefix
155	      -X7 => switch to base36 mode setting X as current prefix
156	      XXX => current base36 prefix plus XXX as base 36 values gives
157	   character

159	   2.2 Encoding a string

161	   To encode a string you start with the data as UCS characters and:
162	      Mode: latin
163	      10bit prefix: 0
164	      base36 prefix: 0

166	   Then for each UCS character, the mode and/or prefix is switched if
167	   needed and then the character is encoded as defined above.

169	3. References

171	   [RFC2119]  Scott Bradner, "Key words for use in RFCs to Indicate
172	              Requirement Levels", March 1997, RFC 2119.

174	   [RFC2279]  F. Yergeau, "UTF-8, a transformation format of ISO 10646",
175	              RFC 2279, January 1998.

177	   [ISO10646] ISO/IEC 10646-1:2000. International Standard --
178	              Information technology -- Universal Multiple-Octet Coded
179	              Character Set (UCS)

181	   [Unicode]  The Unicode Consortium, "The Unicode Standard -- Version
182	              3.0", ISBN 0-201-61633-5. Described at
183	              http://www.unicode.org/unicode/standard/versions/
184	              Unicode3.0.html

186	   [IDNREQ]   James Seng, "Requirements of Internationalized Domain
187	              Names", draft-ietf-idn-requirement.

189	   [RACE]     Paul Hoffman, "RACE: Row-based ASCII Compatible Encoding
190	   for IDN", draft-ietf-idn-race.

192	4. Acknowledgements

194	   Paul Hoffman for many good ideas.

196	Author's Address

198	   Dan Oscarsson
199	   Telia ProSoft AB
200	   Box 85
201	   201 20 Malmo
202	   Sweden

204	   E-mail: Dan.Oscarsson@trab.se