idnits 2.17.1 

draft-klensin-ima-constraints-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 15.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 694.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 671.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 678.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 684.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 26, 2006) is 6627 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC2119' is defined on line 579, but no explicit
     reference was found in the text

  == Unused Reference: 'Klensin-emailaddr' is defined on line 617, but no
     explicit reference was found in the text

  ** Obsolete normative reference: RFC 2821 (Obsoleted by RFC 5321)

  ** Obsolete normative reference: RFC 3490 (Obsoleted by RFC 5890, RFC 5891)

  == Outdated reference: A later version (-01) exists of
     draft-klensin-ima-framework-00

  == Outdated reference: A later version (-06) exists of
     draft-iab-idn-nextsteps-03

  -- Obsolete informational reference (is this intentional?): RFC 1341
     (Obsoleted by RFC 1521)


     Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         J. Klensin
3	Internet-Draft                                         February 26, 2006
4	Expires: August 30, 2006

6	 Internationalization in Internet Applications: Issues, Tradeoffs, and
7	                            Email Addresses
8	                 draft-klensin-ima-constraints-00.txt

10	Status of this Memo

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been or will be disclosed, and any of which he or she becomes
15	   aware will be disclosed, in accordance with Section 6 of BCP 79.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as "work in progress."

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt.

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   This Internet-Draft will expire on August 30, 2006.

35	Copyright Notice

37	   Copyright (C) The Internet Society (2006).

39	Abstract

41	   The discussions of internationalized email addresses in the IETF have
42	   led to a number of stated requirements.  This document identifies
43	   some of those requirements in the context of general issues of
44	   internationalization of Internet name spaces, demonstrates that the
45	   combination of all of the requirements that appear reasonable on
46	   first glance adds up to a null solution space, and then suggests a
47	   different model for proceeding.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	   2.  Environment for Internationalization and Fragmentation
53	       Risks  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
54	     2.1.  Climate for Internationalization: The DNS History  . . . .  5
55	     2.2.  Technology . . . . . . . . . . . . . . . . . . . . . . . .  7
56	   3.  Consequences and Implications  . . . . . . . . . . . . . . . .  8
57	     3.1.  Choosing and mixing scripts and languages  . . . . . . . .  9
58	     3.2.  Confusable characters and communcations accuracy . . . . . 10
59	     3.3.  Communication across languages and cultures  . . . . . . . 10
60	     3.4.  The place of internationalization in a global Internet . . 11
61	   4.  Specific Impact of I18N Email Addressing . . . . . . . . . . . 12
62	   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
63	   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
64	   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
65	     7.1.  Normative References . . . . . . . . . . . . . . . . . . . 13
66	     7.2.  Informative References . . . . . . . . . . . . . . . . . . 14
67	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
68	   Intellectual Property and Copyright Statements . . . . . . . . . . 17

70	1.  Introduction

72	   In general, internationalization has been approached in the IETF on
73	   the assumption that, if one can get the character sets and perhaps
74	   language tags right, other issues will take care of themselves.  An
75	   "internationalization considerations" section is strongly suggested
76	   for RFCs (see RFC 2277, Section 6 [RFC2277] and note that Section 3.1
77	   of that document requires UTF-8 support of all protocols, hence all
78	   protocols hence all protocol documents "deal with
79	   internationalization issues at all"), but there are no real
80	   guidelines about what should be in it and the requirement has not
81	   always been enforced.  There are also some additional requirements,
82	   e.g., for UTF-8 support [RFC2277].  Particular protocols have gone
83	   beyond these guidelines.  In particular, the standards for
84	   internationalized domain names, IDNA [RFC3490], use Unicode as a base
85	   but utilize their own encoding of Unicode, punycode [RFC3492].  Those
86	   standards carefully avoid identification of languages, since domain
87	   names inherently consist of more or less arbitrary strings, not
88	   "words" or other language elements.

90	   That body of work generally ignores an important observation and its
91	   consequences.  When user-chosen words, names, and non-ASCII scripts
92	   are used at the applications layer, users will often treat them as
93	   language elements having meaning, and often pronunciations, in those
94	   languages, not merely as strings of characters.  The assumptions of
95	   meaning or pronunciation, in turn, will often introduce age-old
96	   problems of cross-language reading and understanding into the design
97	   of applications, or applications protocols, that are intended to work
98	   globally: if one person cannot read or understand the language of
99	   another, the fact imposes limitations on communication that, in
100	   general, cannot be solved by protocol design.  In the most extreme
101	   cases, differences in the languages and character sets that people
102	   find normal and convenient impose practical limits on
103	   interoperability: choices must be made between compatibility and
104	   convenience within a linguistic and cultural community and global
105	   interoperability that will, inevitably, be less convenient for some
106	   groups and cultures than others.  In some cases, solutions are
107	   feasible that make things convenient within a cultural or linguistic
108	   group and provide a less-convenient mechanism for getting between
109	   groups, in others, even more difficult choices will need to be made.
110	   And, in some cases (fortunately a gradually declining number), the
111	   realities of character codings, presentation, and operating systems
112	   make obvious solutions to problems impractical.

114	   While these issues have appeared in the context of internationalized
115	   domain names and in other applications, recent work to permit non-
116	   ASCII local parts of electronic mail addresses without violating the
117	   constraints of the mail protocols themselves have brought several of
118	   the issues into better focus.  This document discusses some of the
119	   issues and problems -- both technical and in terms of user
120	   expectations -- in general form and then reviews some of the
121	   implications for email and other protocols that impose their own
122	   constraints on strings and their interpretation.

124	   While changes in lower-level Internet protocols and interfaces must
125	   almost always occur at the protocol level (i.e., be visible "on the
126	   wire" -- see below), there are at least three choices for
127	   internationalization at the applications layer.  Picking the right
128	   one requires some understanding of how the features will be used, the
129	   degree to which localization will be appropriately overlaid on the
130	   basic internationalization features, and some general wisdom about
131	   design.  The option that is obvious at first is not necessarily the
132	   best choice.  The options are:

134	   o  Protocol changes, i.e., features that appear "on the wire" in the
135	      interactions between client and server or between peer hosts.  The
136	      internationalization provisions for MIME body parts [RFC1341] are
137	      examples of protocol-level mechanisms, since they appear in the
138	      client-server interactions.
139	   o  Client-side changes, i.e., features that have characteristics
140	      similar to protocol ones, but that are implemented entirely on the
141	      client, without "on the wire" visibility.  Domain name
142	      internationalization using the IDNA specification [RFC3490] is an
143	      example of a strictly client-side mechanism since non-ASCII
144	      characters do not appear on the wire and the DNS server is not
145	      required to be aware that internationalized names are being used.
146	   o  Adding a new layer or new abstraction, i.e., accomplishing
147	      internationalization or localization not by somehow
148	      internationalizing an existing protocol or introducing a
149	      replacement protocol, but by adding new facilities that rest on
150	      top of an unmodified non-internationalized protocol.  Localization
151	      facilities might also be added as a new layer on top of an
152	      internationalized lower layer.  Various efforts to add "keywords"
153	      or other "above DNS search" mechanisms, the standardization of a
154	      internationalized version of the URI [RFC3986] as an IRI
155	      [RFC3987], and similar arrangements are "new layer" approaches.

157	2.  Environment for Internationalization and Fragmentation Risks

159	   In looking at the combination of efforts to internationalize the
160	   Internet, especially at the protocol level, we encounter two large
161	   groups of issues.  One has to do with the social, cultural, and
162	   political climate associated with the making of any decision about
163	   internationalization in recent years and the other is about the
164	   technology.  The subsections that follow address both since, in
165	   practice, it is impossible to deal with them separately.  In
166	   particular, as this document illustrates, if one examines the
167	   technical issues, the desire to avoid constraints on global end to
168	   end communications, and to minimize the risks of incorrect
169	   identification of destination hosts or users, the conclusion would be
170	   likely to be that almost any internationalization at the protocol
171	   level is a bad idea.  On the other hand, if the social and cultural
172	   context is examined, it becomes clear that avoiding any
173	   internationalization at the protocol level will lead to a different
174	   type of fragmentation and, if that context is examined alone, demands
175	   will arise for protocol changes that are not plausible in practice.

177	2.1.  Climate for Internationalization: The DNS History

179	   The biggest potential for network fragmentation due to introduction
180	   of mutually-incomprehensible scripts occurred with the development of
181	   domain names that are not intended to be presented as ASCII strings.
182	   There was considerable resistance in the technical community to that
183	   set of decisions based on the belief that domain names were
184	   ultimately protocol elements that should remain, at least for
185	   application purposes, in a restricted subset of ASCII (a subset that
186	   is compatible with ISO 646 BV [ISO.646.1991]).  At least part of that
187	   community also concluded that internationalization should occur in a
188	   protocol layer closer to the user, i.e., "above the DNS" [RFC3467].
189	   This layer might be thought of as the "presentation layer" of the
190	   classical OSI model although the analogy is not exact.  Those who
191	   resisted DNS changes suggested that it might make sense to
192	   distinguish what actions were taken in the DNS from a presentation
193	   layer in which some new name spaces or resource identifiers might
194	   occur.  In that context, URIs [RFC3986], with their potentially
195	   elaborate syntax, are no one's idea of "user friendly" even if one
196	   ignores the desire for non-ASCII scripts entirely.  The
197	   internationalized form, IRIs [RFC3987] solve part of the non-ASCII
198	   script problem, but are really no better: they permit
199	   internationalization of the strings that make up URIs, but do not
200	   address the complexity of the syntax or the ASCII syntax elements.
201	   Such a presentation layer could make more culturally-reasonable forms
202	   visible to the user while preserving clear layering over the
203	   fundamental URI types and domain names that would remain unchanged.
204	   That model would provide at least the potential for good localization
205	   while preserving a common script, syntax, and set of conventions for
206	   dealing with the actual elements of the network.

208	   Although the idea of layering internationalization on top of an ASCII
209	   protocol substrate seems to come back each time an application issue
210	   is examined carefully, it has not gained significant traction in
211	   practice other than as, e.g., DNS alternatives.  Hence, the argument
212	   has been lost, several times and in several different ways.  It
213	   became clear that, if the IETF had not provided some rational and
214	   standardized ways to represent internationalized (non-ASCII) domain
215	   names, we would have ended up with chaos -- different coded character
216	   sets in different zones with some of them probably treated as binary
217	   labels.  We would see some shift-JIS form in Japan, GB forms in
218	   China, ISO 8859-1 in Western Europe and other ISO 8859 variations in
219	   some other areas, and unpredictable other variations in the rest of
220	   the world.  Worse, the only way to determine which particular coded
221	   character set (CCS) was being used would be out of band knowledge,
222	   since none of the people promoting those approaches came forward with
223	   any realistic plans for how to label "charsets" (essentially a
224	   combination of a script and a coding system for those who have not
225	   followed the MIME version of that discussion; see [RFC2978] and
226	   [RFC2277] for more precise definitions, further discussion, and
227	   references) in the DNS.  Indeed, in spite of the standard, we have
228	   already seen the beginnings of fragmenting developments in some
229	   domains along with special "improved, enhanced, and
230	   internationalized" (and not quite interoperable) DNS servers being
231	   offered by some companies.

233	   So, despite some misgivings, the IETF defined IDNs via IDNA [RFC3490]
234	   (including exclusive use of Unicode as the defining character set).
235	   From the standpoint of this discussion, the interesting thing about
236	   IDNA is that it doesn't change the DNS at all.  It is a strictly
237	   client-side protocol, with Unicode strings being pushed through a
238	   canonicalization process and then transformed into an "ASCII-
239	   compatible" form (called "punycode") that, to the DNS and
240	   applications that have not been upgraded, looks like (and is)
241	   hostname-format names, i.e., ASCII letters, digits and hyphens.  It
242	   was done that way because of a belief that the coding system would
243	   lead to very rapid deployment without any negative impact on systems
244	   or applications that had not been upgraded.  Its most passionate
245	   advocates were convinced that, once there was wide deployment, no one
246	   would ever see the internal coding.

248	   From the standpoint of global interoperability, the good news is that
249	   they were wrong -- we have some other problems to cope with, but one
250	   of them is not "you can't get there because you can't read or type
251	   the string".  If the application permits you to get to it, you can
252	   always access and type the punycode string rather than whatever might
253	   show up in characters you can't read, can't type, and maybe can't
254	   even render.  Of course, this requires that all applications support
255	   entry of Roman characters, even if such entry is not convenient.

257	   The choice of Unicode was, however, very important, not because it is
258	   wonderful as a character set, but because it avoids the issues of
259	   identifying what CCS is being used and, the WG hoped, of picking
260	   which characters would be valid and which ones would not be.

262	   Avoiding determining which characters should be valid and which ones
263	   should not has also been less successful than one might have hoped;
264	   both the IAB (see [IDN-Nextsteps]) and the Unicode Consortium (see
265	   [UTR36] and [UTR39]) are struggling with approaches to that problem
266	   for which they did not foresee a need when IDNA was adopted.

268	   But, ultimately, it is important to remember as we talk about any of
269	   this that the choice was never between "figure out some way to
270	   internationalize the DNS" and "don't do it because it was a bad
271	   idea".  The choice was only between whether we did it on in a global,
272	   standard, way that was fairly safe as far as DNS operations were
273	   concerned or whether we ended up with a collection of different
274	   mechanisms that would not interoperate cleanly and unambiguously
275	   within a single domain name system.

277	2.2.  Technology

279	   As the result of these factors and tensions, IDNA became a completely
280	   client-side IDN protocol.  Several of the worst fears of the
281	   pessimists have come true: we have confusion over look-alike
282	   characters, we have the potential to receive and see characters we
283	   can't read or type, the Unicode Consortium's beliefs about how widely
284	   Unicode is available and about smooth conversions between codings
285	   are, at best, very controversial, some implementers have "improved"
286	   on the standard tables, and so on.  Email MIME textual body parts
287	   should be safe against character set problems due to the presence of
288	   the "charset" parameter.  However, in practice, problems in which one
289	   character is mapped into an entirely different one are fairly
290	   routine, most notably as the result of forwarding or otherwise
291	   including all or part of one message in a body part that is
292	   constructed locally according to different character set conventions.
293	   Copying of text that was developed in one character coding context
294	   and pasting it into another is not completely reliable for related
295	   reasons.  These problems are symptomatic of those we will certainly
296	   encounter in the future as the Internet becomes increasingly
297	   international and multilingual.  Probably the worst is yet to come.

299	   As was the case with the pre-MIME internationalized mail body
300	   approaches and with the development of IDNA, the local solutions
301	   --the ones that are not interoperable globally-- will work, and work
302	   well, within the relevant cultural and linguistic communities.
303	   Realistically, the IETF cannot ignore the issues and problems and
304	   either hope they will go away or decide to do nothing because the
305	   problems will cause disruption.  To do so is to guarantee that local
306	   solutions will be developed and that that people who use them will be
307	   unable to communicate internationally (at least with the same tools
308	   they use locally) and that people outside their communities will be
309	   unable to communicate with them.

311	   The key question is what the difficulties with the global solutions
312	   or the development of local solutions actually do to
313	   interoperability.  The Internet community is probably in for a bad
314	   time as reality catches up with many fantasies and delusions about
315	   how systems and people work, but there is some reason for optimism
316	   about the long term.  To take one (admittedly-extreme) reality as an
317	   example, suppose one user's primary language were written only in Old
318	   Futhark Runic and that user does not read or speak any other
319	   languages or write any other script.  Assume further, stretching the
320	   imagination a bit, that the only keyboards available to that user
321	   have only runes on them.  That user would have some serious problems
322	   in communications.  In particular, she would have been dead for
323	   centuries: as far as is known, no living person really knows how
324	   those languages and scripts worked (although there is a lot of
325	   speculation) and it is unclear whether some of the Unicode decisions
326	   in coding the runes are actually correct, much less optimal.  She is
327	   also not on the Internet in any significant way: the hypothetical
328	   keyboard does not exist, there is no way to type a URL or email
329	   address on it, etc.  So, for that user, the net effect of permitting
330	   IDNs in Runic, which IDNA now permits, is going to be just about zero
331	   except maybe in terms of helping with her cultural pride.  More
332	   important, if she can find a few other living exclusive users of the
333	   relevant scripts and languages, her ability to use those scripts and
334	   languages in either content or domain names _might_ enhance their
335	   ability to communicate with each other, but they certainly are not
336	   going to increase or decrease anyone else's ability to communicate
337	   with any of them.

339	   On the other hand, suppose a different user can speak, read, and
340	   write Russian as well as Old Viking Runic, but nothing else.  If he
341	   wants to communicate on the Internet, he can send notes (and use
342	   domain names, etc.) that some reasonably large number of people will
343	   be able to read easily, and a larger number will be able to get
344	   through with a struggle, but, for anyone who does not read Russian or
345	   recognize Cyrillic characters, he might as well have used Runic --
346	   the symbols are useless either way.  This problem is, of course,
347	   centuries old.  IDNs don't make it any worse although they don't help
348	   either.

350	   While Runic is a far-fetched example, some of the African languages
351	   and scripts are not.  And, unlike Runic, some of those African
352	   scripts have not even been coded into Unicode yet.

354	3.  Consequences and Implications

356	   The Internet community is probably in for a nasty learning curve, but
357	   things should work out as people accept reality.  Within a language
358	   and cultural community, IDNs --and, even more important, email
359	   addresses with non-ASCII characters in the local parts-- are almost
360	   certain to be very important, especially among groups of people who
361	   are not comfortable with Roman-based characters.  They are going to
362	   prove helpful just as the ability to use native/local characters in
363	   content has proven helpful.  That helpfulness is going to be
364	   important to spreading accessibility to the Internet into some
365	   population groups (although, until there is a great deal of content
366	   in their languages, probably not as much as some of the IDN advocates
367	   around WSIS and ICANN have believed).  But, for communication between
368	   different language and cultural groups, we are going to find that we
369	   need to do what people have done through history, even before
370	   computer networking entered the equation: we will have to figure out,
371	   probably out of band, what languages and scripts we share with
372	   particular correspondents and then pick a member of that set.

374	3.1.  Choosing and mixing scripts and languages

376	   The choice of a common and shared script or language is going to be
377	   far more complicated for many cases than any of our existing content-
378	   negotiation ideas anticipate.  We will need to remember that some
379	   people may be able understand a spoken language but not read it in
380	   some or all of the scripts in which it is normally written and that,
381	   especially for alphabetic scripts, the ability to read the script
382	   (and even to crudely pronounce the sounds it implies) does not imply
383	   the ability to understand any of the languages normally written in
384	   it.  These differences may relate to the ability to recognize
385	   characters in a table, use a keyboard, recognize characters that
386	   might appear in an IRI or email address, and so on.  Ugly and nasty
387	   as punycode may be, we will need to pass domain names around in it
388	   unless we know in advance that our readers will know the relevant
389	   scripts well and be able to type them, cut and paste them accurately,
390	   and so on.  If we choose to use non-ASCII email local parts, we will
391	   discover that we need to keep ASCII alternative aliases around for
392	   communicating more broadly and that those ASCII alternatives will
393	   not, in the general case, be derivable algorithmically.  Once we get
394	   the email internationalization situation under control, nothing
395	   should prevent a speaker of Norwegian, say Torbjorn Torbjornson (with
396	   slashes across the second "o" in each name), from having an email
397	   address of torbjorn@example.com (U+00F8 as the sixth character, i.e.,
398	   with a slash across the "o") but, if he and a Russian-speaker want to
399	   communicate with each other, he would be well-advised to retain the
400	   ability to receive mail at torbjorn@example.com (or some other
401	   address), especially if the software of the Russian reader is going
402	   to magically transform the U+00F8 character into "j", which would be
403	   predicted by getting ISO 8859-1 and ISO 8859-5 confused.  And, if his
404	   alternative is not torbjorn@example.com but
405	   torbjorn@torbjorn.example.com (with a slash over the sixth character
406	   in the domain name), then the Russian users or their software must be
407	   able to generate and use torbjorn@xn--torbjrn-u1a.example.com
408	   instead.

410	   It may be useful to note that "have an alternate address available
411	   and let people know" bears a strong resemblance to the traditional
412	   two-sided Asian business cards.  The Chinese, Korean, or Japanese
413	   characters on the front may be the correct ones but, if the owner of
414	   the card wants to have communications with illiterate westerners, the
415	   Roman characters on the back will rapidly become very important.  Of
416	   course, many people in those populations make exactly that choice:
417	   their business cards do not have Roman characters on them.
418	   Consequently, they have no expectations of communication with people
419	   who do not read and speak the relevant languages.

421	3.2.  Confusable characters and communcations accuracy

423	   The common example of similarity between the printed form of a
424	   Cyrillic "A" and a Roman one raises issues similar to the Norwegian
425	   example above.  If one sees the character in a domain name in context
426	   with other Cyrillic (or Roman) characters, it will probably lead to
427	   the right guess unless someone is being deliberately deceptive or
428	   cute.  If the context is not available, a good guess might still be
429	   possible based on whether the character appears on a sign in a rural
430	   community in Russia or the US (in Moscow or New York, one would
431	   probably need to know about specific neighborhoods and the guess
432	   would be less reliable).  Reducing the odds of a deception based on
433	   confusion between the characters that some would consider similar in
434	   appearance is a topic of active discussion, mostly about what DNS
435	   registries should be permitted to register.  But, if the person
436	   writing that message out is really concerned about accuracy, then
437	   either some explicit hints or, for domain names the punycode string,
438	   had best appear on the business card or sign... if they do not, the
439	   negative reinforcement from confused and irritated users will
440	   gradually get the message across that they should.

442	3.3.  Communication across languages and cultures

444	   All of this implies that those who communicate across language and
445	   cultural groups will be required to learn, if they do not understand
446	   already, to be quite self-aware about the use of internationalized
447	   identifiers, as well as other examples of characters or languages,
448	   across those boundaries.  There will be a lower level of demands on
449	   those who communicate only in a single language and within a single
450	   culture.  This is, of course, not an issue that originated with the
451	   introduction of the Internet: it has been this way since languages
452	   and scripts started to differentiate from each other and since
453	   different cultures came into contact.  As we internationalize the
454	   network, a user of a given language that cannot be fully expressed in
455	   ASCII will always be faced with a choice between insisting on the
456	   purism of an email address local part and domain name in the script
457	   associated with the local language and maximizing the number of
458	   people who can communicate with her conveniently.  In some cases, the
459	   right answer will be "local language", in others, it will be "ASCII",
460	   and in still others it will be "maintain two addresses".  We are not
461	   required, and should not try, to make that choice for users: the
462	   users should make the best choices for their own needs, preferably
463	   after understanding the consequences of the choices.  As a community,
464	   we will need to be very clever about user interfaces.  As an example
465	   much more general than email, if someone with no ability to read
466	   Chinese characters sees a domain name written in those characters and
467	   decides she wants to copy and paste it somewhere, the copy mechanism
468	   is probably going to need to provide for both "copy the Chinese" and
469	   "convert quietly to punycode and copy that".  Either choice, by
470	   itself, will be wrong sometimes.  Users who both want to use Chinese-
471	   script domain names and communicate outside that language or script
472	   or culture are going to either learn to understand the difference and
473	   relationship, or develop some good rituals that work, or the network
474	   will keep slapping them in the head with failed lookups or bounced
475	   mail until they do learn.  Of course, substantially any language or
476	   script could be substituted for "Chinese" in that example.

478	3.4.  The place of internationalization in a global Internet

480	   Does that make internationalized domain names a bad idea and
481	   internationalized email addresses an even worse idea?  Globally,
482	   maybe... perhaps even probably if our exclusive focus is on global
483	   uses of the Internet.  But that is where we get back to examples
484	   similar to the Runic one.  If we have a population in an Arabic-
485	   speaking country that only reads and writes in Arabic and only wants
486	   to communicate with each other, internationalization extensions let
487	   them get themselves onto the Internet and communicate with each other
488	   and to do so without causing any harm to the rest of the Internet.
489	   It appears that is A Good Thing or at least not harmful in any
490	   significant way.  Will it help them communicate with someone who
491	   cannot read Arabic or help that person communicate with them?  Not a
492	   bit, at least in the absence of a translator who competent in Arabic
493	   and has the right computer tools.  The alternative, stated in its
494	   most extreme form, is "everyone who really wants to be an effective
495	   user of the global Internet had better be able to function in
496	   English".  At one level, that is probably true, politically-incorrect
497	   though it may be.  But, at another, it is a very different statement
498	   than requiring that everyone who wants to communicate in Amharic,
499	   with other Amharic-speakers, be forced to translate to and from
500	   English (or at least to and from a subset of ASCII characters) to
501	   manage that communication rather than being able to use their own
502	   language and (Ethiopic) script.

504	   We need to be very careful to not make interoperability (or
505	   reliability of references and the like) worse among those who can now
506	   communicate.  It does not appear that either IDNs or i18n email
507	   addresses will necessarily make things worse, but we should remain
508	   vigilant to be sure that doesn't change.  Until everyone learns good
509	   habits we may rediscover an important part of the X.400 model-in-
510	   practice: sooner or later, a non-speaker of Chinese will get a
511	   message from a Chinese colleague with a return address that is all-
512	   Chinese.  The recipient will have no hope of using it in a reply
513	   unless cut and paste works, and will not be able to reliably verify
514	   whether or not it worked.  That user (message recipient) will have to
515	   deal with the message and replying to it by selecting an out-of-band
516	   communications path --a different address or the telephone are the
517	   most likely-- to get in touch with that person and either deliver the
518	   reply over that path or use it to say "I just got something from you,
519	   if in fact it was you, and I have no possible way to reply to it as
520	   written.  So what other address or path would you like me to use?"

522	   Clearly, that would not be ideal.  But there is no ideal solution as
523	   long as people persist in speaking different languages and writing in
524	   different scripts.  It does not appear that the use of different
525	   languages and scripts is likely to stop any time soon and, in
526	   general, it is not desirable that it do so.

528	4.  Specific Impact of I18N Email Addressing

530	   As discussed in [I18Nemail-Framework], the requirement that nothing
531	   inspect or alter an email local-part other than the final delivery
532	   server (see [RFC2821]) imposes strong constraints on automatic
533	   transformations of internationalized email addresses to ASCII form.
534	   If we insist on reliable cutting and pasting, regardless of the
535	   operational character coding of mail user agents, we are probably
536	   constrained to avoid non-ASCII forms entirely: only putting the
537	   internationalized string in encoded words and leaving the address
538	   exclusively in ASCII will work in a large number of cases, but even
539	   that can fail occasionally.  So, if we try to impose a rule in which
540	   the only email addresses that are permitted are those that will
541	   always be usable globally, the consequence will be a conclusion that
542	   non-ASCII local parts are impossible.

544	   Unfortunately, that conclusion is a recipe for local, non-
545	   interoperable, solutions -- probably ones based on "just use our
546	   local characters and character coding" -- and the consequent de facto
547	   network fragmentation that would follow from it, as discussed above.
548	   A better approach is adopt a more realistic set of goals, starting
549	   from the realization that people who have no need or desire to
550	   communicate outside their language or cultural group are not going to
551	   do so and then focusing on (i) permitting them to communicate as they
552	   wish without creating risks for other Internet users and (ii)
553	   providing reasonable facilities for those who do wish to communicate
554	   across language groups to do so.

556	5.  Security Considerations

558	   This document discusses a series of internationalization issues that
559	   bear on interoperability and might indirectly bear on security.  As
560	   such, it may suggest some issues that should be considered in
561	   security evaluations of internationalized protocols.  Its conclusions
562	   also reinforce the well-understood point that expanding the range of
563	   characters in which identifiers can be expressed will tend to
564	   complicate the design of security-related protocols, and user
565	   interfaces to them, that utilize such internationalized identifiers.
566	   However, it raises no new security issues in itself.

568	6.  Acknowledgements

570	   The author would like to thank Alex Zinin and Dmitry Burkov for
571	   initiating a conversation about the relationship between Internet
572	   internationalization and fragmentation.  That conversation ultimately
573	   led to this memo. ...More to be supplied...

575	7.  References

577	7.1.  Normative References

579	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
580	              Requirement Levels'", RFC 2119, March 1997.

582	   [RFC2821]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
583	              April 2001.

585	   [RFC2978]  Freed, N. and J. Postel, "IANA Charset Registration
586	              Procedures", BCP 19, RFC 2978, October 2000.

588	   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
589	              "Internationalizing Domain Names in Applications (IDNA)",
590	              RFC 3490, March 2003.

592	   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
593	              for Internationalized Domain Names in Applications
594	              (IDNA)", RFC 3492, March 2003.

596	7.2.  Informative References

598	   [I18Nemail-Framework]
599	              Klensin, J. and Y. Ko, "Overview and Framework for
600	              Internationalized Email",
601	              draft-klensin-ima-framework-00.txt (work in progress),
602	              September 2005, <http://www.ietf.org/internet-drafts/
603	              draft-klensin-ima-framework-00.txt>.

605	   [IDN-Nextsteps]
606	              Klensin, J. and P. Faltstrom, "Review and Recommendations
607	              for Internationalized Domain Names (IDN)",
608	              draft-iab-idn-nextsteps-03.txt (work in progress),
609	              February 2006, <http://www.ietf.org/internet-drafts/
610	              draft-iab-idn-nextsteps-03.txt>.

612	   [ISO.646.1991]
613	              International Organization for Standardization,
614	              "Information technology - ISO 7-bit coded character set
615	              for information interchange", ISO Standard 646, 1991.

617	   [Klensin-emailaddr]
618	              Klensin, J., "Internationalization of Email Addresses",
619	              draft-klensin-emailaddr-i18n-03 (work in progress),
620	              July 2005.

622	   [RFC1341]  Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
623	              Mail Extensions): Mechanisms for Specifying and Describing
624	              the Format of Internet Message Bodies", RFC 1341,
625	              June 1992.

627	   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
628	              Languages", BCP 18, RFC 2277, January 1998.

630	   [RFC3467]  Klensin, J., "Role of the Domain Name System (DNS)",
631	              RFC 3467, February 2003.

633	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
634	              Resource Identifier (URI): Generic Syntax", STD 66,
635	              RFC 3986, January 2005.

637	   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
638	              Identifiers (IRIs)", RFC 3987, January 2005.

640	   [UTR36]    Davis, M. and M. Suignard, "Unicode Technical Report #36:
641	              Unicode Security Considerations", November 2005,
642	              <http://www.unicode.org/draft/reports/tr36/tr36.html>.

644	              Working Draft for Proposed Update

646	   [UTR39]    Davis, M. and M. Suignard, "Unicode Technical Standard #39
647	              (proposed): Unicode Security Considerations", July 2005,
648	              <http://www.unicode.org/draft/reports/tr39/tr39.html>.

650	              Working Draft for Proposed Draft

652	Author's Address

654	   John C Klensin
655	   1770 Massachusetts Ave, #322
656	   Cambridge, MA  02140
657	   USA

659	   Phone: +1 617 491 5735
660	   Email: john-ietf@jck.com

662	Intellectual Property Statement

664	   The IETF takes no position regarding the validity or scope of any
665	   Intellectual Property Rights or other rights that might be claimed to
666	   pertain to the implementation or use of the technology described in
667	   this document or the extent to which any license under such rights
668	   might or might not be available; nor does it represent that it has
669	   made any independent effort to identify any such rights.  Information
670	   on the procedures with respect to rights in RFC documents can be
671	   found in BCP 78 and BCP 79.

673	   Copies of IPR disclosures made to the IETF Secretariat and any
674	   assurances of licenses to be made available, or the result of an
675	   attempt made to obtain a general license or permission for the use of
676	   such proprietary rights by implementers or users of this
677	   specification can be obtained from the IETF on-line IPR repository at
678	   http://www.ietf.org/ipr.

680	   The IETF invites any interested party to bring to its attention any
681	   copyrights, patents or patent applications, or other proprietary
682	   rights that may cover technology that may be required to implement
683	   this standard.  Please address the information to the IETF at
684	   ietf-ipr@ietf.org.

686	Disclaimer of Validity

688	   This document and the information contained herein are provided on an
689	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
690	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
691	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
692	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
693	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
694	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

696	Copyright Statement

698	   Copyright (C) The Internet Society (2006).  This document is subject
699	   to the rights, licenses and restrictions contained in BCP 78, and
700	   except as set forth therein, the authors retain all their rights.

702	Acknowledgment

704	   Funding for the RFC Editor function is currently provided by the
705	   Internet Society.