idnits 2.17.1 draft-ietf-idn-ace-report-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 270 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 214 has weird spacing: '...one row all ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 14, 2001) is 8346 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'ACE' on line 166 looks like a reference Summary: 5 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Draft Paul Hoffman, Editor 2 draft-ietf-idn-ace-report-00.txt 3 June 14, 2001 4 Expires in six months 6 Report of the IDN ACE Design Team 8 Status of this memo 10 This document is an Internet-Draft and is in full conformance with all 11 provisions of Section 10 of RFC2026. 13 Internet-Drafts are working documents of the Internet Engineering Task 14 Force (IETF), its areas, and its working groups. Note that other groups 15 may also distribute working documents as Internet-Drafts. 17 Internet-Drafts are draft documents valid for a maximum of six months 18 and may be updated, replaced, or obsoleted by other documents at any 19 time. It is inappropriate to use Internet-Drafts as reference material 20 or to cite them other than as "work in progress." 22 The list of current Internet-Drafts can be accessed at 23 http://www.ietf.org/ietf/1id-abstracts.txt 25 The list of Internet-Draft Shadow Directories can be accessed at 26 http://www.ietf.org/shadow.html. 28 Abstract 30 This document is a summary of the work of the ACE design team of the 31 Internationalized Domain Name (IDN) Working Group. If the IDN WG selects 32 a single ACE, the design team suggests DUDE. There are many factors that 33 the IDN WG might consider that may lead it to choose a different ACE; 34 those factors and proposals that some of the design team members favored 35 are also described in this document. 37 1. Introduction 39 The chairs of the IDN WG appointed an ACE design team to study the many 40 ACE proposals that had come to the working group and to make a 41 recommendation based on that study. The design team consisted of Adam 42 Costello, Paul Hoffman, Makoto Ishisone, David Lawrence, Brian 43 Spolarich, and Rick Wesson. There were three advisors: Marc Blanchet, 44 Patrik Faltstrom, and Erik Nordmark. 46 The design team evaluated the large number of ACEs that have been 47 proposed in the IDN WG. In comparing them, we looked primarily at two 48 factors: 50 - how easy they are to understand and implement 52 - whether they would restrict long names that are likely to be used 54 Our discussions led us to discover that neither factor was particularly 55 easy to measure. Given that it was hard to measure either factor, it was 56 also difficult to decide how to weigh the two of them against each 57 other. 59 2. Recommendation 61 Based on the two factors, the design team recommends to the IDN WG that 62 it picks the DUDE algorithm as the ACE to be used in its protocol. There 63 was general agreement that DUDE was fairly easy to implement 64 (particularly with the design changes starting with the -02 draft) and 65 did not restrict long names that were likely to be used in domain names. 67 There was not complete agreement in the design team on recommending 68 DUDE. Members disagreed about how much less complex DUDE was 69 compared to the other proposed ACEs. In addition, some members felt that 70 price of higher complexity of other proposed ACEs was worth the greater 71 compression that they give. 73 The design team chose the DUDE algorithm after the release of the -02 74 draft, which has some significant design changes from earlier drafts. 75 Even among the DUDE supporters, there was not universal acclaim. Some 76 felt that the discussion of "mixed-case annotation" should be removed, 77 but were willing to recommend the protocol anyway and ask the IDN WG to 78 remove that optional part of the protocol later. 80 It is important to note that the ACEs we considered most strongly do not 81 provide for special treatment of any particular script or language. The 82 design team members felt that there was no way to provide for such 83 handling that would not dramatically increase the complexity of the 84 protocol, and the apparent benefits in efficiency were relatively 85 modest. All the algorithms provide for relatively efficient treatment of 86 all scripts, and do not impose unreasonable limitations on label size 87 for users of particular scripts; the variation for particular scripts is 88 small in the proposed ACEs. 90 3. Weighing the Design Goals 92 3.1 Complexity 94 It is very difficult to analyze how complex an algorithm is. The 95 proposed ACE algorithms had different types of complexity and were 96 therefore difficult to compare accurately. For example, it is not clear 97 how to compare the complexity of a two-pass algorithm such as RACE or 98 LACE with one-pass algorithm with binary arithmetic such as DUDE. It was 99 pointed out that other algorithms that are quite complex have been 100 implemented well on the Internet. 102 It is not clear how important complexity is in the long run. One 103 argument says that most applications that use ACE will use an ACE 104 conversion toolkit supplied by an outside source, and there is likely to 105 be only a small number of such toolkits. An opposing argument is that, 106 even if that is true in most cases, there still has to be dozens if not 107 hundreds of toolkits for the various platforms on which IDN will be 108 supported. Further, many companies insist on writing all their own 109 software, even when it is complex (the IPsec market is a good example of 110 this). 112 3.2 Restrictions on long names that are likely to be used 114 The IDN WG had earlier agreed with the statement that the purpose of 115 compression is not to reduce the number of octets on the wire, but to 116 allow longer sensible name parts within the 63-octet limit. 117 Unfortunately, it is impossible to determine how long "sensible name 118 parts" would be in various scripts and languages. Some of what makes a 119 name part sensible is its usefulness in non-computer environments such 120 as on billboards, business cards, and radio commercials. Stringing 121 together many words is common in most languages, but it reduces the 122 reproducibility of a name. 124 The other side of this argument is that the domain name system requires 125 every name at a particular level of the name hierarchy to be unique. It 126 is quite common to see English names in the .com zone that clearly are 127 not the first choice of the companies or people who got them, most 128 likely because the desired (shorter) name was already taken. Because of 129 name exhaustion and the currently tightly-restricted choice in the TLD 130 zone, the length of sensible names is higher than it might be with more 131 TLDs available. 133 After asking many language experts, some of the people on the design 134 team came to the conclusion that 15 characters for Han-based languages 135 and 30 characters for alphabetic-based languages would put very few 136 restrictions on names that would reasonably be expected to be used. Of 137 course, any limit can be viewed as too restrictive, even the 63 138 character limit for current names. For example, the name: 140 computerengineeringdepartmentatuniversityofcaliforniasantabarbara 142 makes linguistic sense, but is unlikely to be used because it runs 143 together too many words, and would be unwieldy to type. 145 4. Analysis of the ACEs 147 The ACE drafts considered by the design team are listed here. 148 Note that these are not long-term documents and are therefore 149 not listed in the references section of this document. 151 4.1 All ACE proposals 153 draft-ietf-idn-altdude-00.txt -- AltDUDE. Withdrawn by author. 155 draft-ietf-idn-amc-ace-*-00.txt -- A series of one-step encodings with 156 varying degrees of complexity and compression. 158 draft-ietf-idn-brace-00.txt -- BRACE: Bi-mode Row-based ASCII-Compatible 159 Encoding for IDN. Withdrawn by author. 161 draft-ietf-idn-dude-02.txt -- Differential Unicode Domain Encoding 162 (DUDE). Uses a one-step encoding that uses the binary XOR of successive 163 characters, encoded with Base32. 165 draft-ietf-idn-dunce-00.txt -- DUNCE: A proposal for a Definitely 166 Unencumbered New Compatible [ACE] Encoding. Specifies multiple different 167 ways to encode strings directly but does not say how to make the 168 encoding unique. Also, does not specify a compression mechanism. 170 draft-ietf-idn-lace-01.txt -- LACE: Length-based ASCII Compatible 171 Encoding for IDN. Uses a two-step encoding: first compress (using a 172 simple run-length encoding algorithm), then use Base32 on the compressed 173 string. 175 draft-ietf-idn-race-03.txt -- RACE: Row-based ASCII Compatible Encoding 176 for IDN. This document expired. 178 draft-ietf-idn-sace-01.txt -- Simple ASCII Compatible Encoding (SACE). 179 This document expired. 181 draft-ietf-idn-step-00.txt -- StepCode- A User Access Oriented IDN 182 Encoding. Denotes Chinese characters with their phonetic elements. It 183 does not apply to other languages or scripts and is not based on the 184 ISO/IEC 10646 character repertoire. 186 draft-ietf-idn-utf6-00.txt -- UTF-6 - Yet Another ASCII-Compatible 187 Encoding for IDN. This document expired. 189 draft-ietf-idn-vidn-01.txt -- Virtually Internationalized Domain Names 190 (VIDN). Uses phonetic transliteration to create ACEs. There were many 191 problems for many languages that were pointed out on the WG mailing 192 list. The proposal is at least partially covered by a patent. 194 A draft on MACE, Modal ASCII-Compatible Encoding, is expected to be 195 published soon. The design team considered a preliminary version of the 196 encoding described my Makoto Ishisone. 198 4.2 Primary choices 200 The design team focused on three classes of ACE: LACE, DUDE, and the AMC 201 series. The ACEs had different levels of complexity and different 202 amounts of compression for mixes of one-row and multi-row input. 204 The following table summarizes the maximum length for an input string 205 for two cases: the entire string is a typical mix from one row (such as 206 a single-row script), and the entire string is in Han, which usually is 207 a mix of widely-divergent rows. Other comparisons are possible, of 208 course; you might compare how well each ACE does for primarily Latin 209 names (which use a mix from two rows), or names that are mostly 210 non-ASCII characters but use an occasional ASCII character such as a 211 dash. 213 Equation for Max for Equation Max for Han 214 all one row all one row for Han typical 215 typical typical typical 216 DUDE 1.5n 39 3.8n 15 217 AMC-W 1.5n 39 1+3n 19 218 AMC-V 1.5n 39 1+3n 19 219 LACE 3.2+1.6n 34 1.6+3.2n 17 221 Two observations come out of this: 223 - All of the proposals give 34 or more characters for one row typical. 224 Except for strung-together names and some very long German or Thai 225 nouns, that is probably sufficient for most typical names. 227 - All of the proposals give 15 or more characters for Han typical. 228 Again, that is probably be fine for the vast majority of names, even 229 those with a few sub-names strung together. 231 Although LACE allows names with two more Han characters than DUDE, the 232 authors of LACE feel that the two-step process is indeed more 233 complicated and therefore did not warrant its use when compared to DUDE. 235 When compared to DUDE, AMC-W, and AMC-V get four more Han characters 236 with no loss of one-row characters. However, they are both more 237 complicated than DUDE. The members of the group disagreed as to how much 238 more complicated they were, with one group saying that they were "much 239 more complicated" and another group saying "only a little more 240 complicated". 242 5. Security Considerations 244 The design team did not perform security reviews on the ACE candidates. 245 A cursory review was done to see whether every Unicode string could 246 result in only one ACE string, and every ACE string could result in zero 247 or one Unicode strings. It is assumed that the authors of each ACE 248 proposal did more intense testing for the one-to-one correspondence. 250 6. References 252 References to particular ACE implementations are not given here because 253 none are currently RFCs and it is assumed that only one (or a small 254 number) will eventually reach RFC status. 256 7. Editor Contact Information 258 Paul Hoffman 259 Internet Mail Consortium and VPN Consortium 260 127 Segre Place 261 Santa Cruz, CA 95060 USA 262 paul.hoffman@imc.org and paul.hoffman@vpnc.org