Network Working Group                                         J. Klensin
Internet-Draft                                         February 26, 2006
Expires: August 30, 2006


 Internationalization in Internet Applications: Issues, Tradeoffs, and
                            Email Addresses
                 draft-klensin-ima-constraints-00.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 30, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2006).

Abstract

   The discussions of internationalized email addresses in the IETF have
   led to a number of stated requirements.  This document identifies
   some of those requirements in the context of general issues of
   internationalization of Internet name spaces, demonstrates that the
   combination of all of the requirements that appear reasonable on
   first glance adds up to a null solution space, and then suggests a
   different model for proceeding.


Klensin                  Expires August 30, 2006                [Page 1]

Internet-Draft           I18N Email Constraints            February 2006


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Environment for Internationalization and Fragmentation
       Risks  . . . . . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Climate for Internationalization: The DNS History  . . . .  5
     2.2.  Technology . . . . . . . . . . . . . . . . . . . . . . . .  7
   3.  Consequences and Implications  . . . . . . . . . . . . . . . .  8
     3.1.  Choosing and mixing scripts and languages  . . . . . . . .  9
     3.2.  Confusable characters and communcations accuracy . . . . . 10
     3.3.  Communication across languages and cultures  . . . . . . . 10
     3.4.  The place of internationalization in a global Internet . . 11
   4.  Specific Impact of I18N Email Addressing . . . . . . . . . . . 12
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
   6.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
     7.1.  Normative References . . . . . . . . . . . . . . . . . . . 13
     7.2.  Informative References . . . . . . . . . . . . . . . . . . 14
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
   Intellectual Property and Copyright Statements . . . . . . . . . . 17


Klensin                  Expires August 30, 2006                [Page 2]

Internet-Draft           I18N Email Constraints            February 2006


1.  Introduction

   In general, internationalization has been approached in the IETF on
   the assumption that, if one can get the character sets and perhaps
   language tags right, other issues will take care of themselves.  An
   "internationalization considerations" section is strongly suggested
   for RFCs (see RFC 2277, Section 6 [RFC2277] and note that Section 3.1
   of that document requires UTF-8 support of all protocols, hence all
   protocols hence all protocol documents "deal with
   internationalization issues at all"), but there are no real
   guidelines about what should be in it and the requirement has not
   always been enforced.  There are also some additional requirements,
   e.g., for UTF-8 support [RFC2277].  Particular protocols have gone
   beyond these guidelines.  In particular, the standards for
   internationalized domain names, IDNA [RFC3490], use Unicode as a base
   but utilize their own encoding of Unicode, punycode [RFC3492].  Those
   standards carefully avoid identification of languages, since domain
   names inherently consist of more or less arbitrary strings, not
   "words" or other language elements.

   That body of work generally ignores an important observation and its
   consequences.  When user-chosen words, names, and non-ASCII scripts
   are used at the applications layer, users will often treat them as
   language elements having meaning, and often pronunciations, in those
   languages, not merely as strings of characters.  The assumptions of
   meaning or pronunciation, in turn, will often introduce age-old
   problems of cross-language reading and understanding into the design
   of applications, or applications protocols, that are intended to work
   globally: if one person cannot read or understand the language of
   another, the fact imposes limitations on communication that, in
   general, cannot be solved by protocol design.  In the most extreme
   cases, differences in the languages and character sets that people
   find normal and convenient impose practical limits on
   interoperability: choices must be made between compatibility and
   convenience within a linguistic and cultural community and global
   interoperability that will, inevitably, be less convenient for some
   groups and cultures than others.  In some cases, solutions are
   feasible that make things convenient within a cultural or linguistic
   group and provide a less-convenient mechanism for getting between
   groups, in others, even more difficult choices will need to be made.
   And, in some cases (fortunately a gradually declining number), the
   realities of character codings, presentation, and operating systems
   make obvious solutions to problems impractical.

   While these issues have appeared in the context of internationalized
   domain names and in other applications, recent work to permit non-
   ASCII local parts of electronic mail addresses without violating the
   constraints of the mail protocols themselves have brought several of


Klensin                  Expires August 30, 2006                [Page 3]

Internet-Draft           I18N Email Constraints            February 2006


   the issues into better focus.  This document discusses some of the
   issues and problems -- both technical and in terms of user
   expectations -- in general form and then reviews some of the
   implications for email and other protocols that impose their own
   constraints on strings and their interpretation.

   While changes in lower-level Internet protocols and interfaces must
   almost always occur at the protocol level (i.e., be visible "on the
   wire" -- see below), there are at least three choices for
   internationalization at the applications layer.  Picking the right
   one requires some understanding of how the features will be used, the
   degree to which localization will be appropriately overlaid on the
   basic internationalization features, and some general wisdom about
   design.  The option that is obvious at first is not necessarily the
   best choice.  The options are:

   o  Protocol changes, i.e., features that appear "on the wire" in the
      interactions between client and server or between peer hosts.  The
      internationalization provisions for MIME body parts [RFC1341] are
      examples of protocol-level mechanisms, since they appear in the
      client-server interactions.
   o  Client-side changes, i.e., features that have characteristics
      similar to protocol ones, but that are implemented entirely on the
      client, without "on the wire" visibility.  Domain name
      internationalization using the IDNA specification [RFC3490] is an
      example of a strictly client-side mechanism since non-ASCII
      characters do not appear on the wire and the DNS server is not
      required to be aware that internationalized names are being used.
   o  Adding a new layer or new abstraction, i.e., accomplishing
      internationalization or localization not by somehow
      internationalizing an existing protocol or introducing a
      replacement protocol, but by adding new facilities that rest on
      top of an unmodified non-internationalized protocol.  Localization
      facilities might also be added as a new layer on top of an
      internationalized lower layer.  Various efforts to add "keywords"
      or other "above DNS search" mechanisms, the standardization of a
      internationalized version of the URI [RFC3986] as an IRI
      [RFC3987], and similar arrangements are "new layer" approaches.


2.  Environment for Internationalization and Fragmentation Risks

   In looking at the combination of efforts to internationalize the
   Internet, especially at the protocol level, we encounter two large
   groups of issues.  One has to do with the social, cultural, and
   political climate associated with the making of any decision about
   internationalization in recent years and the other is about the
   technology.  The subsections that follow address both since, in


Klensin                  Expires August 30, 2006                [Page 4]

Internet-Draft           I18N Email Constraints            February 2006


   practice, it is impossible to deal with them separately.  In
   particular, as this document illustrates, if one examines the
   technical issues, the desire to avoid constraints on global end to
   end communications, and to minimize the risks of incorrect
   identification of destination hosts or users, the conclusion would be
   likely to be that almost any internationalization at the protocol
   level is a bad idea.  On the other hand, if the social and cultural
   context is examined, it becomes clear that avoiding any
   internationalization at the protocol level will lead to a different
   type of fragmentation and, if that context is examined alone, demands
   will arise for protocol changes that are not plausible in practice.

2.1.  Climate for Internationalization: The DNS History

   The biggest potential for network fragmentation due to introduction
   of mutually-incomprehensible scripts occurred with the development of
   domain names that are not intended to be presented as ASCII strings.
   There was considerable resistance in the technical community to that
   set of decisions based on the belief that domain names were
   ultimately protocol elements that should remain, at least for
   application purposes, in a restricted subset of ASCII (a subset that
   is compatible with ISO 646 BV [ISO.646.1991]).  At least part of that
   community also concluded that internationalization should occur in a
   protocol layer closer to the user, i.e., "above the DNS" [RFC3467].
   This layer might be thought of as the "presentation layer" of the
   classical OSI model although the analogy is not exact.  Those who
   resisted DNS changes suggested that it might make sense to
   distinguish what actions were taken in the DNS from a presentation
   layer in which some new name spaces or resource identifiers might
   occur.  In that context, URIs [RFC3986], with their potentially
   elaborate syntax, are no one's idea of "user friendly" even if one
   ignores the desire for non-ASCII scripts entirely.  The
   internationalized form, IRIs [RFC3987] solve part of the non-ASCII
   script problem, but are really no better: they permit
   internationalization of the strings that make up URIs, but do not
   address the complexity of the syntax or the ASCII syntax elements.
   Such a presentation layer could make more culturally-reasonable forms
   visible to the user while preserving clear layering over the
   fundamental URI types and domain names that would remain unchanged.
   That model would provide at least the potential for good localization
   while preserving a common script, syntax, and set of conventions for
   dealing with the actual elements of the network.

   Although the idea of layering internationalization on top of an ASCII
   protocol substrate seems to come back each time an application issue
   is examined carefully, it has not gained significant traction in
   practice other than as, e.g., DNS alternatives.  Hence, the argument
   has been lost, several times and in several different ways.  It


Klensin                  Expires August 30, 2006                [Page 5]

Internet-Draft           I18N Email Constraints            February 2006


   became clear that, if the IETF had not provided some rational and
   standardized ways to represent internationalized (non-ASCII) domain
   names, we would have ended up with chaos -- different coded character
   sets in different zones with some of them probably treated as binary
   labels.  We would see some shift-JIS form in Japan, GB forms in
   China, ISO 8859-1 in Western Europe and other ISO 8859 variations in
   some other areas, and unpredictable other variations in the rest of
   the world.  Worse, the only way to determine which particular coded
   character set (CCS) was being used would be out of band knowledge,
   since none of the people promoting those approaches came forward with
   any realistic plans for how to label "charsets" (essentially a
   combination of a script and a coding system for those who have not
   followed the MIME version of that discussion; see [RFC2978] and
   [RFC2277] for more precise definitions, further discussion, and
   references) in the DNS.  Indeed, in spite of the standard, we have
   already seen the beginnings of fragmenting developments in some
   domains along with special "improved, enhanced, and
   internationalized" (and not quite interoperable) DNS servers being
   offered by some companies.

   So, despite some misgivings, the IETF defined IDNs via IDNA [RFC3490]
   (including exclusive use of Unicode as the defining character set).
   From the standpoint of this discussion, the interesting thing about
   IDNA is that it doesn't change the DNS at all.  It is a strictly
   client-side protocol, with Unicode strings being pushed through a
   canonicalization process and then transformed into an "ASCII-
   compatible" form (called "punycode") that, to the DNS and
   applications that have not been upgraded, looks like (and is)
   hostname-format names, i.e., ASCII letters, digits and hyphens.  It
   was done that way because of a belief that the coding system would
   lead to very rapid deployment without any negative impact on systems
   or applications that had not been upgraded.  Its most passionate
   advocates were convinced that, once there was wide deployment, no one
   would ever see the internal coding.

   From the standpoint of global interoperability, the good news is that
   they were wrong -- we have some other problems to cope with, but one
   of them is not "you can't get there because you can't read or type
   the string".  If the application permits you to get to it, you can
   always access and type the punycode string rather than whatever might
   show up in characters you can't read, can't type, and maybe can't
   even render.  Of course, this requires that all applications support
   entry of Roman characters, even if such entry is not convenient.

   The choice of Unicode was, however, very important, not because it is
   wonderful as a character set, but because it avoids the issues of
   identifying what CCS is being used and, the WG hoped, of picking
   which characters would be valid and which ones would not be.


Klensin                  Expires August 30, 2006                [Page 6]

Internet-Draft           I18N Email Constraints            February 2006


   Avoiding determining which characters should be valid and which ones
   should not has also been less successful than one might have hoped;
   both the IAB (see [IDN-Nextsteps]) and the Unicode Consortium (see
   [UTR36] and [UTR39]) are struggling with approaches to that problem
   for which they did not foresee a need when IDNA was adopted.

   But, ultimately, it is important to remember as we talk about any of
   this that the choice was never between "figure out some way to
   internationalize the DNS" and "don't do it because it was a bad
   idea".  The choice was only between whether we did it on in a global,
   standard, way that was fairly safe as far as DNS operations were
   concerned or whether we ended up with a collection of different
   mechanisms that would not interoperate cleanly and unambiguously
   within a single domain name system.

2.2.  Technology

   As the result of these factors and tensions, IDNA became a completely
   client-side IDN protocol.  Several of the worst fears of the
   pessimists have come true: we have confusion over look-alike
   characters, we have the potential to receive and see characters we
   can't read or type, the Unicode Consortium's beliefs about how widely
   Unicode is available and about smooth conversions between codings
   are, at best, very controversial, some implementers have "improved"
   on the standard tables, and so on.  Email MIME textual body parts
   should be safe against character set problems due to the presence of
   the "charset" parameter.  However, in practice, problems in which one
   character is mapped into an entirely different one are fairly
   routine, most notably as the result of forwarding or otherwise
   including all or part of one message in a body part that is
   constructed locally according to different character set conventions.
   Copying of text that was developed in one character coding context
   and pasting it into another is not completely reliable for related
   reasons.  These problems are symptomatic of those we will certainly
   encounter in the future as the Internet becomes increasingly
   international and multilingual.  Probably the worst is yet to come.

   As was the case with the pre-MIME internationalized mail body
   approaches and with the development of IDNA, the local solutions
   --the ones that are not interoperable globally-- will work, and work
   well, within the relevant cultural and linguistic communities.
   Realistically, the IETF cannot ignore the issues and problems and
   either hope they will go away or decide to do nothing because the
   problems will cause disruption.  To do so is to guarantee that local
   solutions will be developed and that that people who use them will be
   unable to communicate internationally (at least with the same tools
   they use locally) and that people outside their communities will be
   unable to communicate with them.


Klensin                  Expires August 30, 2006                [Page 7]

Internet-Draft           I18N Email Constraints            February 2006


   The key question is what the difficulties with the global solutions
   or the development of local solutions actually do to
   interoperability.  The Internet community is probably in for a bad
   time as reality catches up with many fantasies and delusions about
   how systems and people work, but there is some reason for optimism
   about the long term.  To take one (admittedly-extreme) reality as an
   example, suppose one user's primary language were written only in Old
   Futhark Runic and that user does not read or speak any other
   languages or write any other script.  Assume further, stretching the
   imagination a bit, that the only keyboards available to that user
   have only runes on them.  That user would have some serious problems
   in communications.  In particular, she would have been dead for
   centuries: as far as is known, no living person really knows how
   those languages and scripts worked (although there is a lot of
   speculation) and it is unclear whether some of the Unicode decisions
   in coding the runes are actually correct, much less optimal.  She is
   also not on the Internet in any significant way: the hypothetical
   keyboard does not exist, there is no way to type a URL or email
   address on it, etc.  So, for that user, the net effect of permitting
   IDNs in Runic, which IDNA now permits, is going to be just about zero
   except maybe in terms of helping with her cultural pride.  More
   important, if she can find a few other living exclusive users of the
   relevant scripts and languages, her ability to use those scripts and
   languages in either content or domain names _might_ enhance their
   ability to communicate with each other, but they certainly are not
   going to increase or decrease anyone else's ability to communicate
   with any of them.

   On the other hand, suppose a different user can speak, read, and
   write Russian as well as Old Viking Runic, but nothing else.  If he
   wants to communicate on the Internet, he can send notes (and use
   domain names, etc.) that some reasonably large number of people will
   be able to read easily, and a larger number will be able to get
   through with a struggle, but, for anyone who does not read Russian or
   recognize Cyrillic characters, he might as well have used Runic --
   the symbols are useless either way.  This problem is, of course,
   centuries old.  IDNs don't make it any worse although they don't help
   either.

   While Runic is a far-fetched example, some of the African languages
   and scripts are not.  And, unlike Runic, some of those African
   scripts have not even been coded into Unicode yet.


3.  Consequences and Implications

   The Internet community is probably in for a nasty learning curve, but
   things should work out as people accept reality.  Within a language


Klensin                  Expires August 30, 2006                [Page 8]

Internet-Draft           I18N Email Constraints            February 2006


   and cultural community, IDNs --and, even more important, email
   addresses with non-ASCII characters in the local parts-- are almost
   certain to be very important, especially among groups of people who
   are not comfortable with Roman-based characters.  They are going to
   prove helpful just as the ability to use native/local characters in
   content has proven helpful.  That helpfulness is going to be
   important to spreading accessibility to the Internet into some
   population groups (although, until there is a great deal of content
   in their languages, probably not as much as some of the IDN advocates
   around WSIS and ICANN have believed).  But, for communication between
   different language and cultural groups, we are going to find that we
   need to do what people have done through history, even before
   computer networking entered the equation: we will have to figure out,
   probably out of band, what languages and scripts we share with
   particular correspondents and then pick a member of that set.

3.1.  Choosing and mixing scripts and languages

   The choice of a common and shared script or language is going to be
   far more complicated for many cases than any of our existing content-
   negotiation ideas anticipate.  We will need to remember that some
   people may be able understand a spoken language but not read it in
   some or all of the scripts in which it is normally written and that,
   especially for alphabetic scripts, the ability to read the script
   (and even to crudely pronounce the sounds it implies) does not imply
   the ability to understand any of the languages normally written in
   it.  These differences may relate to the ability to recognize
   characters in a table, use a keyboard, recognize characters that
   might appear in an IRI or email address, and so on.  Ugly and nasty
   as punycode may be, we will need to pass domain names around in it
   unless we know in advance that our readers will know the relevant
   scripts well and be able to type them, cut and paste them accurately,
   and so on.  If we choose to use non-ASCII email local parts, we will
   discover that we need to keep ASCII alternative aliases around for
   communicating more broadly and that those ASCII alternatives will
   not, in the general case, be derivable algorithmically.  Once we get
   the email internationalization situation under control, nothing
   should prevent a speaker of Norwegian, say Torbjorn Torbjornson (with
   slashes across the second "o" in each name), from having an email
   address of torbjorn@example.com (U+00F8 as the sixth character, i.e.,
   with a slash across the "o") but, if he and a Russian-speaker want to
   communicate with each other, he would be well-advised to retain the
   ability to receive mail at torbjorn@example.com (or some other
   address), especially if the software of the Russian reader is going
   to magically transform the U+00F8 character into "j", which would be
   predicted by getting ISO 8859-1 and ISO 8859-5 confused.  And, if his
   alternative is not torbjorn@example.com but
   torbjorn@torbjorn.example.com (with a slash over the sixth character


Klensin                  Expires August 30, 2006                [Page 9]

Internet-Draft           I18N Email Constraints            February 2006


   in the domain name), then the Russian users or their software must be
   able to generate and use torbjorn@xn--torbjrn-u1a.example.com
   instead.

   It may be useful to note that "have an alternate address available
   and let people know" bears a strong resemblance to the traditional
   two-sided Asian business cards.  The Chinese, Korean, or Japanese
   characters on the front may be the correct ones but, if the owner of
   the card wants to have communications with illiterate westerners, the
   Roman characters on the back will rapidly become very important.  Of
   course, many people in those populations make exactly that choice:
   their business cards do not have Roman characters on them.
   Consequently, they have no expectations of communication with people
   who do not read and speak the relevant languages.

3.2.  Confusable characters and communcations accuracy

   The common example of similarity between the printed form of a
   Cyrillic "A" and a Roman one raises issues similar to the Norwegian
   example above.  If one sees the character in a domain name in context
   with other Cyrillic (or Roman) characters, it will probably lead to
   the right guess unless someone is being deliberately deceptive or
   cute.  If the context is not available, a good guess might still be
   possible based on whether the character appears on a sign in a rural
   community in Russia or the US (in Moscow or New York, one would
   probably need to know about specific neighborhoods and the guess
   would be less reliable).  Reducing the odds of a deception based on
   confusion between the characters that some would consider similar in
   appearance is a topic of active discussion, mostly about what DNS
   registries should be permitted to register.  But, if the person
   writing that message out is really concerned about accuracy, then
   either some explicit hints or, for domain names the punycode string,
   had best appear on the business card or sign... if they do not, the
   negative reinforcement from confused and irritated users will
   gradually get the message across that they should.

3.3.  Communication across languages and cultures

   All of this implies that those who communicate across language and
   cultural groups will be required to learn, if they do not understand
   already, to be quite self-aware about the use of internationalized
   identifiers, as well as other examples of characters or languages,
   across those boundaries.  There will be a lower level of demands on
   those who communicate only in a single language and within a single
   culture.  This is, of course, not an issue that originated with the
   introduction of the Internet: it has been this way since languages
   and scripts started to differentiate from each other and since
   different cultures came into contact.  As we internationalize the


Klensin                  Expires August 30, 2006               [Page 10]

Internet-Draft           I18N Email Constraints            February 2006


   network, a user of a given language that cannot be fully expressed in
   ASCII will always be faced with a choice between insisting on the
   purism of an email address local part and domain name in the script
   associated with the local language and maximizing the number of
   people who can communicate with her conveniently.  In some cases, the
   right answer will be "local language", in others, it will be "ASCII",
   and in still others it will be "maintain two addresses".  We are not
   required, and should not try, to make that choice for users: the
   users should make the best choices for their own needs, preferably
   after understanding the consequences of the choices.  As a community,
   we will need to be very clever about user interfaces.  As an example
   much more general than email, if someone with no ability to read
   Chinese characters sees a domain name written in those characters and
   decides she wants to copy and paste it somewhere, the copy mechanism
   is probably going to need to provide for both "copy the Chinese" and
   "convert quietly to punycode and copy that".  Either choice, by
   itself, will be wrong sometimes.  Users who both want to use Chinese-
   script domain names and communicate outside that language or script
   or culture are going to either learn to understand the difference and
   relationship, or develop some good rituals that work, or the network
   will keep slapping them in the head with failed lookups or bounced
   mail until they do learn.  Of course, substantially any language or
   script could be substituted for "Chinese" in that example.

3.4.  The place of internationalization in a global Internet

   Does that make internationalized domain names a bad idea and
   internationalized email addresses an even worse idea?  Globally,
   maybe... perhaps even probably if our exclusive focus is on global
   uses of the Internet.  But that is where we get back to examples
   similar to the Runic one.  If we have a population in an Arabic-
   speaking country that only reads and writes in Arabic and only wants
   to communicate with each other, internationalization extensions let
   them get themselves onto the Internet and communicate with each other
   and to do so without causing any harm to the rest of the Internet.
   It appears that is A Good Thing or at least not harmful in any
   significant way.  Will it help them communicate with someone who
   cannot read Arabic or help that person communicate with them?  Not a
   bit, at least in the absence of a translator who competent in Arabic
   and has the right computer tools.  The alternative, stated in its
   most extreme form, is "everyone who really wants to be an effective
   user of the global Internet had better be able to function in
   English".  At one level, that is probably true, politically-incorrect
   though it may be.  But, at another, it is a very different statement
   than requiring that everyone who wants to communicate in Amharic,
   with other Amharic-speakers, be forced to translate to and from
   English (or at least to and from a subset of ASCII characters) to
   manage that communication rather than being able to use their own


Klensin                  Expires August 30, 2006               [Page 11]

Internet-Draft           I18N Email Constraints            February 2006


   language and (Ethiopic) script.

   We need to be very careful to not make interoperability (or
   reliability of references and the like) worse among those who can now
   communicate.  It does not appear that either IDNs or i18n email
   addresses will necessarily make things worse, but we should remain
   vigilant to be sure that doesn't change.  Until everyone learns good
   habits we may rediscover an important part of the X.400 model-in-
   practice: sooner or later, a non-speaker of Chinese will get a
   message from a Chinese colleague with a return address that is all-
   Chinese.  The recipient will have no hope of using it in a reply
   unless cut and paste works, and will not be able to reliably verify
   whether or not it worked.  That user (message recipient) will have to
   deal with the message and replying to it by selecting an out-of-band
   communications path --a different address or the telephone are the
   most likely-- to get in touch with that person and either deliver the
   reply over that path or use it to say "I just got something from you,
   if in fact it was you, and I have no possible way to reply to it as
   written.  So what other address or path would you like me to use?"

   Clearly, that would not be ideal.  But there is no ideal solution as
   long as people persist in speaking different languages and writing in
   different scripts.  It does not appear that the use of different
   languages and scripts is likely to stop any time soon and, in
   general, it is not desirable that it do so.


4.  Specific Impact of I18N Email Addressing

   As discussed in [I18Nemail-Framework], the requirement that nothing
   inspect or alter an email local-part other than the final delivery
   server (see [RFC2821]) imposes strong constraints on automatic
   transformations of internationalized email addresses to ASCII form.
   If we insist on reliable cutting and pasting, regardless of the
   operational character coding of mail user agents, we are probably
   constrained to avoid non-ASCII forms entirely: only putting the
   internationalized string in encoded words and leaving the address
   exclusively in ASCII will work in a large number of cases, but even
   that can fail occasionally.  So, if we try to impose a rule in which
   the only email addresses that are permitted are those that will
   always be usable globally, the consequence will be a conclusion that
   non-ASCII local parts are impossible.

   Unfortunately, that conclusion is a recipe for local, non-
   interoperable, solutions -- probably ones based on "just use our
   local characters and character coding" -- and the consequent de facto
   network fragmentation that would follow from it, as discussed above.
   A better approach is adopt a more realistic set of goals, starting


Klensin                  Expires August 30, 2006               [Page 12]

Internet-Draft           I18N Email Constraints            February 2006


   from the realization that people who have no need or desire to
   communicate outside their language or cultural group are not going to
   do so and then focusing on (i) permitting them to communicate as they
   wish without creating risks for other Internet users and (ii)
   providing reasonable facilities for those who do wish to communicate
   across language groups to do so.


5.  Security Considerations

   This document discusses a series of internationalization issues that
   bear on interoperability and might indirectly bear on security.  As
   such, it may suggest some issues that should be considered in
   security evaluations of internationalized protocols.  Its conclusions
   also reinforce the well-understood point that expanding the range of
   characters in which identifiers can be expressed will tend to
   complicate the design of security-related protocols, and user
   interfaces to them, that utilize such internationalized identifiers.
   However, it raises no new security issues in itself.


6.  Acknowledgements

   The author would like to thank Alex Zinin and Dmitry Burkov for
   initiating a conversation about the relationship between Internet
   internationalization and fragmentation.  That conversation ultimately
   led to this memo. ...More to be supplied...


7.  References

7.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels'", RFC 2119, March 1997.

   [RFC2821]  Klensin, J., "Simple Mail Transfer Protocol", RFC 2821,
              April 2001.

   [RFC2978]  Freed, N. and J. Postel, "IANA Charset Registration
              Procedures", BCP 19, RFC 2978, October 2000.

   [RFC3490]  Faltstrom, P., Hoffman, P., and A. Costello,
              "Internationalizing Domain Names in Applications (IDNA)",
              RFC 3490, March 2003.

   [RFC3492]  Costello, A., "Punycode: A Bootstring encoding of Unicode
              for Internationalized Domain Names in Applications


Klensin                  Expires August 30, 2006               [Page 13]

Internet-Draft           I18N Email Constraints            February 2006


              (IDNA)", RFC 3492, March 2003.

7.2.  Informative References

   [I18Nemail-Framework]
              Klensin, J. and Y. Ko, "Overview and Framework for
              Internationalized Email",
              draft-klensin-ima-framework-00.txt (work in progress),
              September 2005, <http://www.ietf.org/internet-drafts/
              draft-klensin-ima-framework-00.txt>.

   [IDN-Nextsteps]
              Klensin, J. and P. Faltstrom, "Review and Recommendations
              for Internationalized Domain Names (IDN)",
              draft-iab-idn-nextsteps-03.txt (work in progress),
              February 2006, <http://www.ietf.org/internet-drafts/
              draft-iab-idn-nextsteps-03.txt>.

   [ISO.646.1991]
              International Organization for Standardization,
              "Information technology - ISO 7-bit coded character set
              for information interchange", ISO Standard 646, 1991.

   [Klensin-emailaddr]
              Klensin, J., "Internationalization of Email Addresses",
              draft-klensin-emailaddr-i18n-03 (work in progress),
              July 2005.

   [RFC1341]  Borenstein, N. and N. Freed, "MIME (Multipurpose Internet
              Mail Extensions): Mechanisms for Specifying and Describing
              the Format of Internet Message Bodies", RFC 1341,
              June 1992.

   [RFC2277]  Alvestrand, H., "IETF Policy on Character Sets and
              Languages", BCP 18, RFC 2277, January 1998.

   [RFC3467]  Klensin, J., "Role of the Domain Name System (DNS)",
              RFC 3467, February 2003.

   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
              Resource Identifier (URI): Generic Syntax", STD 66,
              RFC 3986, January 2005.

   [RFC3987]  Duerst, M. and M. Suignard, "Internationalized Resource
              Identifiers (IRIs)", RFC 3987, January 2005.

   [UTR36]    Davis, M. and M. Suignard, "Unicode Technical Report #36:
              Unicode Security Considerations", November 2005,


Klensin                  Expires August 30, 2006               [Page 14]

Internet-Draft           I18N Email Constraints            February 2006


              <http://www.unicode.org/draft/reports/tr36/tr36.html>.

              Working Draft for Proposed Update

   [UTR39]    Davis, M. and M. Suignard, "Unicode Technical Standard #39
              (proposed): Unicode Security Considerations", July 2005,
              <http://www.unicode.org/draft/reports/tr39/tr39.html>.

              Working Draft for Proposed Draft


Klensin                  Expires August 30, 2006               [Page 15]

Internet-Draft           I18N Email Constraints            February 2006


Author's Address

   John C Klensin
   1770 Massachusetts Ave, #322
   Cambridge, MA  02140
   USA

   Phone: +1 617 491 5735
   Email: john-ietf@jck.com


Klensin                  Expires August 30, 2006               [Page 16]

Internet-Draft           I18N Email Constraints            February 2006


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.


Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Klensin                  Expires August 30, 2006               [Page 17]