Internet Draft                                 D. Crocker
     draft-crocker-idn-idna-00.txt Brandenburg InternetWorking
     Expires in six months                       23 June  2002
                              
                              
                              
                              
               Internationalizing Domain Names
                   in Applications (IDNA)
                              
     
     
     Status of this Memo
     
     This document is an Internet-Draft and is in full
     conformance with all provisions of Section 10 of
     RFC2026.
     
     Internet-Drafts are working documents of the Internet
     Engineering Task Force (IETF), its areas, and its
     working groups. Note that other groups may also
     distribute working documents as Internet-Drafts.
     
     Internet-Drafts are draft documents valid for a maximum
     of six months and may be updated, replaced, or obsoleted
     by other documents at any time. It is inappropriate to
     use Internet-Drafts as reference material or to cite
     them other than as "work in progress."
     
     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt
     
     The list of Internet-Draft Shadow Directories can be
     accessed at http://www.ietf.org/shadow.html.
     
     
     Abstract
     
     Internationalized Domain Names (IDN) use Unicode for
     domain name, rather than using a subset of ASCII.  This
     increased name space, as well the requirement to
     maintain compatibility with the existing domain name
     service means that IDNs must be encoded in a form that
     can be supported without changes to any portion of the
     DNS that does not participate in the upgrade to IDN.
     This specification defines a mechanism called IDNA for
     handling them in a standard fashion and specifies an
     IDNA profile for domain names used as host references.
     IDNA allows non-ASCII characters to be represented using
     the same octets used in so-called host names today. This
     representation allows IDNs to be introduced with minimal
     changes to the existing DNS infrastructure. IDNA is only
     meant for processing domain names, not free text.



0.   Document Change Notes --
     
     This is a revision to draft-ietf-idn-idna-09.txt.  It is
     being distributed independently to facilitate
     discussion.
     
     The goal is to gain consensus about revisions to the IDN
     working group document, specifically for the following
     changes:
        
        a.   Split the document into two, one for defining
           Internationalized Domain Names (IDN) and the other for
           defining an encoding method of IDNs, namely IDNA using ACE.
        
        b. Distinguish general IDNA from its specific use for host
           names (IDNA-Host), by factoring the two into separate
           specification sections.  Use for host names is specified more
           precisely, in terms of a specific syntax BNF rule from the
           relevant existing DNS specification, so that IDNA-Host will
           apply precisely to all DNS record fields and protocol units
           conforming to that BNF.
        
        c. Distinguish Domain Name character set enhancement (IDN)
           from the encoding approach for 'non-native' representations
           (IDNA).
        
        d. Further clarification of distinction between IDNA world
           and non-IDNA world.

        e. Remove historical commentary.  At the least, it needs to
           be outside of the sections with normative text, or otherwise
           distinguish as being non-normative.

        f. Except for specification of the ACE-based mechanism,
           move  software, API and other host-specific discussion into a
           non-normative appendix, so that the specification is
           restricted to protocol-only details. f.  Distinguish user
           presentation from protocol and storage encoding.
        
        g. Change the anthropomorphic, ambiguous use of 'aware' and
           'unaware' to refer to the nature of encoding as IDN-native
           and IDN-ACE.
     
     Notations:
     
     Text, such as citations, that needs to be provided is
     indicated by <<???>>.  Personal comments are indicated
     by << // xqqy // /Dave >>
     
     The changes are extensive, so that providing change
     marks would be more distracting than helpful.  Still,
     most of the changes are slight language modifications
     and some moving of text around.  Most of the original
     text is still present.



1.   Introduction
     
     Expansion of the DNS namespace to permit Unicode, rather
     than a subset of ASCII, requires special handling of the
     binary data, within an ASCII DNS environment.  This
     document proceeds from <id: draft-idn-idn-00.txt> and
     defines:
          
          1)   A mechanism called IDNA for handling them in a 
               standard fashion within the current, ASCII-based 
               DNS, using an ASCII-compatible encoding (ACE) of 
               the IDN string

          2)   An IDNA profile, called IDNA-Host for domain 
               names used as references host references, such 
               as for URLs and email addresses.
     
     IDNA allows applications to use ASCII name labels that
     begin with a special prefix, to represent non-ASCII name
     labels. Protocols that transport domain names need not
     support this mapping; therefore IDNA does not require
     changes to any protocol infrastructure. Equally, IDNA is
     transparent to DNS servers and resolvers that do not yet
     participate in the IDNA enhancement; the ASCII name
     service provided by the existing DNS is sufficient for
     handling IDNA ACE strings.
     
     Therefore, the IDNA service also does not require any
     applications to conform to IDNA, except applications
     that elect to use IDNA in order to support IDN, while
     maintaining interoperability with the existing, ASCII-
     based DNS infrastructure. Adding IDNA support to an
     existing application entails changes to the application
     only -- or to a "shim" layer below the application and
     above the existing transport and DNS protocol layers.



2.   Terminology
     
     The key words "MUST", "SHALL", "REQUIRED", "SHOULD",
     "RECOMMENDED", and "MAY" in this document are to be
     interpreted as described in RFC 2119 [RFC2119].
     
     ACE
          
          means ASCII Compatible Encoding.
     
     ACE label
          
          refers to an internationalized label that can be
          represented using only ASCII characters but is
          equivalent to a label containing non-ASCII
          characters. More rigorously, an ACE label is
          defined to be any label that the ToUnicode
          operation would alter. For every internationalized
          label that cannot be directly represented in ASCII,
          there is an equivalent ACE label. An ACE label
          always begins with the ACE prefix defined in
          section 5. The conversion of labels to and from the
          ACE form is specified in section 4.
     
     ACE prefix
          
          is defined to be a string of ASCII characters that
          appears at the beginning of every ACE label. It is
          specified in section 5.
     
     Equivalence of labels
          
          is defined in IDNA in terms of the ToASCII
          operation, which constructs an ASCII form for a
          given label.  Labels are defined to be equivalent
          if and only if their ASCII forms produced by
          ToASCII match using a case-insensitive ASCII
          comparison. Traditional ASCII labels already have a
          notion of equivalence: upper case and lower case
          are considered equivalent.  The IDNA notion of
          equivalence is an extension of the old notion.
          Equivalent labels in IDNA are treated as alternate
          forms of the same label, just as "foo" and "Foo"
          are treated as alternate forms of the same label.
     
     Internationalized domain name for applications (IDNA)
          
          refers to a domain name subject to the technical
          enhancements for supporting IDN in the case of
          general domain names.  Procedurally, this is a
          domain name that can be mapped from IDN-native to
          IDN-ACE with the ToASCII operation (see section 4)
          applied to each label without failing.
     
     IDN-ACE
          
          is a domain name slot that is not an IDN-native
          domain name slot. Obviously, this includes any
          domain name slot whose specification predates IDNA.



3.   IDN for Applications (IDNA)
     
     In order to permit IDN functionality without requiring
     changes to existing DNS infrastructure servers and
     resolvers, IDNA uses ASCII Compatible Encoding (ACE).
     IDNA-ACE represents IDN labels within the current, ASCII-
     based DNS protocol and storage infrastructure.  That is,
     Domain names that use Unicode values in their labels are
     encoded to occupy a reserved portion of the existing,
     ASCII-based domain name space.
     
     Components of an IDNA-enhanced DNS are:
     
     Resolver-ASCII-1--|
                       |
                       |--Server-A--|
                       |            |--Server-A-ASCII-admin
     Resolver-ACE-2 ---|
                       |
                       |--Server-B--|
                                    |--Server-B-ACE-admin
     
     The components labeled with ASCII do not support IDN.
     The components labeled with ACE support IDN through
     IDNA's ACE conventions.
     
     The protocol between any resolver and any server is
     unmodified.
     
     The software and procedures for administering Server-A
     are unmodified.  Server-A therefore maintains only slots
     with original, ASCII values.  It maintains no IDN slots.
     
     Server-B is unmodified.  However Server-B-ACE-admin is
     modified to support creation and modification of IDN
     slots, based on IDNA's ACE conventions.  Hence, Server-B
     can hold IDN labels.
     
     Resolver-ASCII-1 is unmodified and supports only ASCII
     domain names.  It therefore can process an IDN string
     only in its ACE form.
     
     Resolver-ACE-2 is modified to support the IDN through
     IDNA's ACE conventions.  Hence it can convert ACE
     strings to their "native" Unicode, for display according
     to local host Unicode mechanisms.  The modification to
     Resolver-ACE-2 may be changes to the resolver, itself,
     or may be effected through an independent modules that
     is called as a surrogate for the Resolver and that, in
     turn calls an unmodified Resolver-ASCII module.



4.   IDNA for Host Domain Names (IDNA-Host)
     
     ASCII Domain names used within URLs and email addresses
     are subject to restrictions specified in [STD3] for host
     names.  Internationalized host domain names (IDN-Host)
     enhances the permitted range of host addresses by
     continuing the ASCII-related restrictions, but
     permitting use of Unicode values.
     
     IDNA mechanisms support IDN-Host as IDNA-Host.  IDNA-
     Host is IDNA with [STD3] host naming restrictions
     applied to ASCII and Unicode domain names.



5.   ACE
     
     ASCII Compatible Encoding (ACE) maps between Unicode
     "native" strings and an ASCII-readable representation of
     the Unicode.
     
     ACE domain labels comprise an ACE prefix string,
     followed by the ACE version of the Unicode.


5.1. ACE prefix
     
     [[ Note to the IESG and Internet Draft readers: The two
     uses of the string "IESG--" below are to be changed at
     time of publication to a prefix which fulfills the
     requirements in the first paragraph. IANA will assign
     this value. ]]
     
     The ACE prefix, used in the conversion operations
     (section 4), is two alphanumeric ASCII characters
     followed by two hyphen-minuses. It cannot
     
     be any of the prefixes already used in earlier
     documents, which includes the following: "bl--", "bq--",
     "dq--", "lq--", "mq--", "ra--", "wq--" and "zq--". The
     ToASCII and ToUnicode operations MUST recognize the ACE
     prefix in a case-insensitive manner.
     
     The ACE prefix for IDNA is "IESG--".
     
     This means that an ACE label might be "IESG--de-
     jg4avhby1noc0d", where "de-jg4avhby1noc0d" is the part
     of the ACE label that is generated by the encoding steps
     in [PUNYCODE].
     
     While all ACE labels begin with the ACE prefix, not all
     labels beginning with the ACE prefix are necessarily ACE
     labels.  Non-ACE labels that begin with the ACE prefix
     will confuse users and SHOULD NOT be allowed in DNS
     zones.
     


5.2. ACE Enforcement
     
     Whenever a domain name is put into an IDN-ACE domain
     name slot, it MUST contain only ASCII characters.
     
     Given an internationalized domain name (IDN), an
     equivalent domain name satisfying this requirement can
     be obtained by applying the ToASCII operation (see
     section 4) to each label and, if dots are used as label
     separators, changing all the label separators to U+002E.


5.3. ACE Display
     
     ACE labels obtained from domain name slots SHOULD be
     hidden from users except when the use of the non-ASCII
     form would cause problems or when the ACE form is
     explicitly requested.  Given an internationalized domain
     name, an equivalent domain name containing no ACE labels
     can be obtained by applying the ToUnicode operation (see
     section 4) to each label.  When requirements 2 and 3
     both apply, requirement 2 takes precedence.


5.4. ACE Conversion operations
     
     An application converts a domain name put into an IDN-
     ACE slot or displayed to a user. This section specifies
     the steps to perform in the conversion, and the ToASCII
     and ToUnicode operations.
     
     The input to ToASCII or ToUnicode is a single label that
     is a sequence of Unicode code points (remember that all
     ASCII code points are also Unicode code points). If a
     domain name is represented using a character set other
     than Unicode or US-ASCII, it will first need to be
     transcoded to Unicode.
     
     Starting from a whole domain name, the steps that an
     application takes to do the conversions are:
     
     1)   Decide whether the domain name is a "stored string" or a
          "query string" as described in [STRINGPREP]. If this
          conversion follows the "queries" rule from [STRINGPREP], set
          the flag called "AllowUnassigned".


     2)   Split the domain name into individual labels as
          described in section 3. The labels do not include the
          separator.

     3)   Decide whether or not to enforce the restrictions on
          ASCII characters in host names [STD3]. If the restrictions
          are to be enforced, set the flag called "UseSTD3ASCIIRules".

     4)   Process each label with either the ToASCII or the
          ToUnicode operation. Use the ToASCII operation if you are
          about to put the name into an IDN-ACE slot. Use the ToUnicode
          operation if you are displaying the name to a user.
     
     If ToASCII was applied in step 4 and dots are used as
     label separators, change all the label separators to
     U+002E (full stop).
     
     The following two subsections define the ToASCII and
     ToUnicode operations that are used in step 4.
     
     
     5.4.1.    ToASCII
     
     The ToASCII operation takes a sequence of Unicode code
     points that make up one label and transforms it into a
     sequence of code points in the ASCII range (0..7F). If
     ToASCII succeeds, the original sequence and the
     resulting sequence are equivalent labels.
     
     It is important to note that the ToASCII operation can
     fail. If the ToASCII operation fails on any label in a
     domain name, that domain name MUST NOT be used as an
     internationalized domain name. The application needs to
     have some method of dealing with this failure.
     
     The inputs to ToASCII are a sequence of code points, the
     AllowUnassigned flag and the UseSTD3ASCIIRules flag. The
     output of ToASCII is either a sequence of ASCII code
     points or a failure condition.
     
     ToASCII never alters a sequence of code points that are
     all in the ASCII range to begin with (although it could
     fail). Applying the ToASCII operation multiple times has
     exactly the same effect as applying it just once.
     
     ToASCII consists of the following steps:
     
     1.  If all code points in the sequence are in the ASCII
         range (0..7F) then skip to step 3.

     2.  Perform the steps specified in [NAMEPREP] and fail if
         there is an error. The AllowUnassigned flag is used in
         [NAMEPREP].
 
     3.  If the UseSTD3ASCIIRules flag is set, then perform these
         checks:
          
          (a)  Verify the absence of non-LDH ASCII code points; that
               is, the absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.

          (b)  Verify the absence of leading and trailing hyphen-minus;
               that is, the absence of U+002D at the beginning and end of
               the sequence.
     
     4.  If all code points in the sequence are in the ASCII
         range    (0..7F), then skip to step 8.

     5.  Verify that the sequence does NOT begin with the ACE
         prefix.

     6.  Encode the sequence using the encoding algorithm in
         [PUNYCODE]and fail if there is an error.

     7.  Prepend the ACE prefix.

     8.  Verify that the number of code points is in the range 1
         to 63 inclusive.
     
     
     5.4.2.    ToUnicode
     
     The ToUnicode operation takes a sequence of Unicode code
     points that make up one label and returns a sequence of
     Unicode code points. If the input sequence is a label in
     ACE form, then the result is an equivalent
     internationalized label that is not in ACE form,
     otherwise the original sequence is returned unaltered.
     
     ToUnicode never fails. If any step fails, then the
     original input sequence is returned immediately in that
     step.
     
     The inputs to ToUnicode are a sequence of code points,
     the AllowUnassigned flag and the UseSTD3ASCIIRules flag.
     The output of ToUnicode is always a sequence of Unicode
     code points.
     
     1.   If all code points in the sequence are in the ASCII
          range (0..7F) then skip to step 3.

     2.   Perform the steps specified in [NAMEPREP] and fail if
          there is an error. (If step 3 of ToASCII is also performed
          here, it will not affect the overall behavior of ToUnicode,
          but it is not necessary.) The AllowUnassigned flag is used in
          [NAMEPREP].

     3.   Verify that the sequence begins with the ACE prefix, and
          save a copy of the sequence.

     4.   Remove the ACE prefix.
 
     5.   Decode the sequence using the decoding algorithm in
          [PUNYCODE] and fail if there is an error. Save a copy of the
          result of this step.

6.   Apply ToASCII.

7.   Verify that the result of step 6 matches the saved copy
from step 3, using a case-insensitive ASCII comparison.

8.   Return the saved copy from step 5.


5.5. ACE Comparison
     
     Whenever two labels are compared, they MUST be
     considered to match if and only if they are equivalent,
     that is, their ASCII forms (obtained by applying
     ToASCII) match using a case-insensitive ASCII
     comparison.
     
     Whenever two names are compared, they MUST be considered
     to match if and only if their corresponding labels
     match, regardless of whether the names use the same
     forms of label separators.



6.   Implications for Components in DNS


6.1. Implications for typical applications using DNS
     
     In IDNA, applications perform the processing needed to
     input internationalized domain names from users, display
     internationalized domain names to users and process the
     inputs and outputs from DNS and other protocols that
     carry domain names.
     
     The components and interfaces between them can be
     represented pictorially as:
                      +------+
                      | User |
                      +------+
                           | Input and display:
                           | local interface methods
                           | (pen, keyboard, video, ...)
       +-------------------|-------------------------------+
       |                   v                               |
       |          +-----------------------------+          |
       |          |      Application            |          |
       |          |   (ToASCII and ToUnicode    |          |
       |          |    operations may be        |          |
       |          |    called here)             |          |
       |          +-----------------------------+          |
       |                   ^       ^                       |End
       |                            |                      |sys
       | Call to resolver: |        | Application-specific |
       |              ACE  |        | protocol:            |
       |                   v        | ACE unless the       |
       |           +----------+     | protocol is updated  |
       |           | Resolver |     | to handle other      |
       |           +----------+     | encodings            |
       |                 ^          |                      |
       +-----------------|----------|----------------------+
           DNS protocol: |          |
                     ACE |          |
                         v          v
              +-------------+    +---------------------+
              | DNS servers |    | Application servers |
              +-------------+    +---------------------+
     
     The box labeled "Application" is where the application
     splits a host name into labels, sets the appropriate
     flags, and performs the ToASCII and ToUnicode
     operations. This is described in section 4.
     
     
     6.1.1.    Entry and display in applications
     
     Applications can accept domain names using any character
     set or sets desired by the application developer, and
     can display domain names in any character set. That is,
     the IDNA protocol does not affect the interface between
     users and applications.
     
     An IDNA-native application can accept and display
     internationalized domain names in two formats: the
     internationalized character set(s) supported by the
     application, and as an ACE label. ACE labels that are
     displayed or input MUST always include the ACE prefix.
     Applications MAY allow input and display of ACE labels,
     but are not encouraged to do so except as an interface
     for special purposes, possibly for debugging.
          
          ACE encoding is opaque and ugly, and should thus
          only be exposed to users who absolutely need it.
     
     Because name labels encoded as ACE name labels can be
     rendered either as the encoded ASCII characters or the
     proper decoded characters, the application MAY have an
     option for the user to select the preferred method of
     display; if it does, rendering the ACE SHOULD NOT be the
     default.
     
     Domain names are often stored and transported in many
     places. For example, they are part of documents such as
     mail messages and web pages. They are transported in
     many parts of many protocols, such as both the control
     commands and the RFC 2822 body parts of SMTP, and the
     headers and the body content in HTTP. It is important to
     remember that domain names appear both in domain name
     slots and in the content that is passed over protocols.
     
     In protocols and document formats that define how to
     handle specification or negotiation of charsets, labels
     can be encoded in any charset allowed by the protocol or
     document format. If a protocol or document format only
     allows one charset, the labels MUST be given in that
     charset.
     
     In any place where a protocol or document format allows
     transmission of the characters in internationalized
     labels, internationalized labels SHOULD be transmitted
     using whatever character encoding and escape mechanism
     the protocol or document format uses at that place.
     
     All protocols that use domain name slots already have
     the capacity for handling domain names in the ASCII
     charset. Thus, ACE labels (internationalized labels that
     have been processed with the ToASCII operation) can
     inherently be handled by those protocols.
     
     
     6.1.2.    Applications and resolver libraries
     
     Applications normally use functions in the operating
     system when they resolve DNS queries. Those functions in
     the operating system are often called "the resolver
     library", and the applications communicate with the
     resolver libraries through a programming interface
     (API).
     
     Because these resolver libraries today expect only
     domain names in ASCII, applications MUST prepare labels
     that are passed to the resolver library using the
     ToASCII operation. Labels received from the resolver
     library contain only ASCII characters; internationalized
     labels that cannot be represented directly in ASCII use
     the ACE form. ACE labels always include the ACE prefix.
     
     IDNA-native applications MUST be able to work with both
     non-internationalized labels (those that conform to
     [STD13] and [STD3]) and internationalized labels.
     
     It is expected that new versions of the resolver
     libraries in the future will be able to accept domain
     names in other formats than ASCII, and application
     developers might one day pass not only domain names in
     Unicode, but also in local script to a new API for the
     resolver libraries in the operating system. Thus the
     ToASCII and ToUnicode operations might be performed
     inside these new versions of the resolver libraries.
     
     Domain names stored in zones follow the rules for
     "stored strings" from [STRINGPREP]. Domain names passed
     to resolvers or put into the question section of DNS
     requests follow the rules for "queries" from
     [STRINGPREP].
     
     
     6.1.3.    DNS servers
     
     An operating system might have a set of libraries for
     performing the ToASCII operation. The input to such a
     library might be in one or more charsets that are used
     in applications (UTF-8 and UTF-16 are likely candidates
     for almost any operating system, and script-specific
     charsets are likely for localized operating systems).
     
     For internationalized labels that cannot be represented
     directly in ASCII, DNS servers MUST use the ACE form
     produced by the ToASCII operation. All IDNs served by
     DNS servers MUST contain only ASCII characters.
     
     If a signaling system that makes negotiation possible
     between old and new DNS clients and servers is
     standardized in the future, the encoding of the query in
     the DNS protocol itself can be changed from ACE to
     something else, such as UTF-8. The question whether or
     not this should be used is, however, a separate problem
     and is not discussed in this memo.
     
     
     6.1.4.    Avoiding exposing users to the raw ACE encoding
     
     All applications that might show the user a domain name
     obtained from a domain name slot, such as from
     gethostbyaddr or part of a mail header, SHOULD be
     updated as soon as possible in order to prevent users
     from seeing the ACE.
     
     If an application decodes an ACE name using ToUnicode
     but cannot show all of the characters in the decoded
     name, such as if the name contains characters that the
     output system cannot display, the application SHOULD
     show the name in ACE format (which always includes the
     ACE prefix) instead of displaying the name with the
     replacement character (U+FFFD). This is to make it
     easier for the user to transfer the name correctly to
     other programs. Programs that by default show the ACE
     form when they cannot show all the characters in a name
     label SHOULD also have a mechanism to show the name that
     is produced by the ToUnicode operation with as many
     characters as possible and replacement characters in the
     positions where characters cannot be displayed.
     
     The ToUnicode operation does not alter labels that are
     not valid ACE labels, even if they begin with the ACE
     prefix. After ToUnicode has been
     
     applied, if a label still begins with the ACE prefix,
     then it is not a valid ACE label, and is not equivalent
     to any of the intermediate Unicode strings constructed
     by ToUnicode.
     
     
     6.1.5.    Bidirectional text in domain names
     
     The display of domain names that contain bidirectional
     text is not covered in this document. It may be covered
     in a future version of this document, or may be covered
     in a different document.
     
     For developers interested in displaying domain names
     that have bidirectional text, the Unicode standard has
     an extensive discussion of how to deal with reorder
     glyphs for display when dealing with bidirectional text
     such as Arabic or Hebrew. See [UAX9] for more
     information. In particular, all Unicode text is stored
     in logical order.
     
     
     6.1.6.    DNSSEC authentication of IDN domain names
     
     DNS Security [DNSSEC] is a method for supplying
     cryptographic verification information along with DNS
     messages. Public Key Cryptography is used in conjunction
     with digital signatures to provide a means for a
     requester of domain information to authenticate the
     source of the data. This ensures that it can be traced
     back to a trusted source, either directly, or via a
     chain of trust linking the source of the information to
     the top of the DNS hierarchy.
     
     IDNA specifies that all internationalized domain names
     served by DNS servers that cannot be represented
     directly in ASCII must use the ACE form produced by the
     ToASCII operation. This operation must be performed
     prior to a zone being signed by the private key for that
     zone. Because of this ordering, it is important to
     recognize that DNSSEC authenticates the ASCII domain
     name, not the Unicode form or the mapping between the
     Unicode form and the ASCII form. In other words, the
     output of ToASCII is the canonical name. In the presence
     of DNSSEC, this is the name that MUST be signed in the
     zone and MUST be validated against.
     
     One consequence of this for sites deploying IDNA in the
     presence of DNSSEC is that any special purpose proxies
     or forwarders used to transform user input into IDNs
     must be earlier in the resolution flow than DNSSEC
     authenticating nameservers for DNSSEC to work.
     
     
     
     6.1.7.    Limitations of IDNA
     
     The IDNA protocol does not solve all linguistic issues
     with users inputting names in different scripts. Many
     important language-based and script-based mappings are
     not covered in IDNA and must be handled outside the
     protocol. For example, names that are entered in a mix
     of traditional and simplified Chinese characters will
     not be mapped to a single canonical name. Another
     example is Scandinavian names that are entered with
     U+00F6 (LATIN SMALL LETTER O WITH DIAERESIS) will not be
     mapped to U+00F8 (LATIN SMALL LETTER O WITH STROKE).


6.2. Name Server Considerations
     
     Internationalized domain name data in zone files (as
     specified by section 5 of RFC 1035) MUST be processed
     with ToASCII before it is entered in the zone files.
     
     It is imperative that there be only one ASCII encoding
     for a particular domain name. Thus, a primary master
     name server MUST NOT contain an ACE-encoded label that
     decodes to an ASCII label. The ToASCII operation assures
     that no such names are ever output from the operation.
     
     Name servers MUST NOT serve records with domain names
     that contain non-ASCII characters; such names MUST be
     converted to ACE form by the
     
     ToASCII operation in order to be served. If names that
     are not processed by ToASCII are passed to an
     application, it will result in unpredictable behavior.
     Note that [STRINGPREP] describes how to handle
     versioning of unallocated codepoints.


6.3. Root Server Considerations
     
     IDNA strings are likely to be somewhat longer than
     current host names, so the bandwidth needed by the root
     servers should go up by a small amount. In addition,
     queries and responses using IDNA strings will probably
     be somewhat longer than typical queries today, so more
     queries and responses may be forced to go to TCP instead
     of UDP.



7.   References


7.1. Normative references
     
     [PUNYCODE] Adam Costello, "Punycode: An encoding of
     Unicode for use with IDNA", draft-ietf-idn-punycode.
     
     [NAMEPREP] Paul Hoffman and Marc Blanchet, "Nameprep: A
     Stringprep Profile for Internationalized Domain Names",
     draft-ietf-idn-nameprep.
     
     [STD3] Bob Braden, "Requirements for Internet Hosts --
     Communication Layers" (RFC 1122) and "Requirements for
     Internet Hosts -- Application and Support" (RFC 1123),
     STD 3, October 1989.
     
     [STD13] Paul Mockapetris, "Domain names - concepts and
     facilities" (RFC 1034) and "Domain names -
     implementation and specification" (RFC 1035), STD 13,
     November 1987.
     
     [STRINGPREP] Paul Hoffman and Marc Blanchet,
     "Preparation of Internationalized Strings
     ("stringprep")", draft-hoffman-stringprep, work in
     progress


7.2. Informative references
     
     [DNSSEC] Don Eastlake, "Domain Name System Security
     Extensions", RFC 2535, March 1999.
     
     [RFC2119] Scott Bradner, "Key words for use in RFCs to
     Indicate Requirement Levels", March 1997, RFC 2119.
     
     [UAX9] Unicode Standard Annex #9, The Bidirectional
     Algorithm,
     <http://www.unicode.org/unicode/reports/tr9/>.
     
     [UNICODE] The Unicode Standard, Version 3.1.0: The
     Unicode Consortium. The Unicode Standard, Version 3.0.
     Reading, MA, Addison-Wesley Developers Press, 2000. ISBN
     0-201-61633-5, as amended by: Unicode Standard Annex
     #27: Unicode 3.1,
     <http://www.unicode.org/unicode/reports/tr27/tr27-
     4.html>.
     
     [USASCII] Vint Cerf, "ASCII format for Network
     Interchange", October 1969, RFC 20.



8.   Security Considerations
     
     Security on the Internet partly relies on the DNS. Thus,
     any change to the characteristics of the DNS can change
     the security of much of the Internet.
     
     This memo describes an algorithm that encodes characters
     that are not valid according to STD3 and STD13 into
     octet values that are valid. No
     
     security issues such as string length increases or new
     allowed values are introduced by the encoding process or
     the use of these encoded values, apart from those
     introduced by the ACE encoding itself.
     
     Domain names are used by users to connect to Internet
     servers. The security of the Internet would be
     compromised if a user entering a single
     internationalized name could be connected to different
     servers based on different interpretations of the
     internationalized domain name.
     
     Because this document normatively refers to [NAMEPREP],
     it includes the security considerations from that
     document as well.



9.   Authors' Addresses
     
     Patrik Faltstrom
     Cisco Systems
     Arstaangsvagen 31 J
     S-117 43 Stockholm  Sweden
     paf@cisco.com
     
     Paul Hoffman
     Internet Mail Consortium and VPN Consortium
     127 Segre Place
     Santa Cruz, CA  95060  USA
     phoffman@imc.org
     
     Adam M. Costello
     University of California, Berkeley
     idna-spec.amc @ nicemice.net




   APPENDIX


A.1. Brief overview for application developers
     
     Applications can use IDNA to support internationalized
     domain names anywhere that ASCII domain names are
     already supported, including DNS master files and
     resolver interfaces. (Applications can also define
     protocols and interfaces that support IDNs directly
     using non-ASCII representations. IDNA does not prescribe
     any particular representation for new protocols, but it
     still defines which names are valid and how they are
     compared.)
     
     The IDNA protocol is contained completely within
     applications. It is not a client-server or peer-to-peer
     protocol: everything is done inside the application
     itself. When used with a DNS resolver library, IDNA is
     inserted as a "shim" between the application and the
     resolver library. When used for writing names into a DNS
     zone, IDNA is used just before the name is committed to
     the zone.
     
     There are two operations described in section 4 of this
     document:
     
     - The ToASCII operation is used before sending an IDN to
     something that expects ASCII names (such as a resolver)
     or writing an IDN into a place that expects ASCII names
     (such as a DNS master file).
     
     - The ToUnicode operation is used when displaying names
     to users, for example names obtained from a DNS zone.
     
     It is important to note that the ToASCII operation can
     fail. If it fails when processing a domain name, that
     domain name cannot be used as an internationalized
     domain name and the application has to have some method
     of dealing with this failure.
     
     IDNA requires that implementations process input strings
     with Nameprep [NAMEPREP], which is a profile of
     Stringprep [STRINGPREP], and then with Punycode
     [PUNYCODE]. Implementations of IDNA MUST fully implement
     Nameprep and Punycode; neither Nameprep nor Punycode are
     optional.