< draft-resman-idna2008-mappings-00.txt   draft-resman-idna2008-mappings-01.txt >
Network Working Group P. Resnick Network Working Group P. Resnick
Internet-Draft Qualcomm Incorporated Internet-Draft Qualcomm Incorporated
Intended status: Informational P. Hoffman Intended status: Informational P. Hoffman
Expires: October 15, 2010 VPN Consortium Expires: October 21, 2010 VPN Consortium
April 13, 2010 April 19, 2010
Mapping Characters in IDNA2008 Mapping Characters in IDNA2008
draft-resman-idna2008-mappings-00 draft-resman-idna2008-mappings-01
Abstract Abstract
In the original version of the Internationalized Domain Names in In the original version of the Internationalized Domain Names in
Applications (IDNA) protocol, any Unicode code points taken from user Applications (IDNA) protocol, any Unicode code points taken from user
input were mapped into a set of Unicode code points that "made input were mapped into a set of Unicode code points that "made
sense", and then encoded and passed to the domain name system (DNS). sense", and then encoded and passed to the domain name system (DNS).
The IDNA2008 protocol presumes that the input to the protocol comes The IDNA2008 protocol presumes that the input to the protocol comes
from a set of "permitted" code points, which it then encodes and from a set of "permitted" code points, which it then encodes and
passes to the DNS, but does not specify what to do with the result of passes to the DNS, but does not specify what to do with the result of
skipping to change at line 39 skipping to change at page 1, line 40
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 15, 2010. This Internet-Draft will expire on October 21, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at line 116 skipping to change at page 3, line 24
context-free mapping without considering the user interface context-free mapping without considering the user interface
properties has the potential of doing exactly the wrong thing for the properties has the potential of doing exactly the wrong thing for the
user. user.
The original version of IDNA conflated user interface processing and The original version of IDNA conflated user interface processing and
protocol. It took whatever characters the user produced in whatever protocol. It took whatever characters the user produced in whatever
encoding the application used, assumed some conversion to Unicode encoding the application used, assumed some conversion to Unicode
code points, and then without regard to context, locale, or anything code points, and then without regard to context, locale, or anything
about the user's intentions, mapped them into a particular set of about the user's intentions, mapped them into a particular set of
other characters, and then re-encoded them in Punycode, in order have other characters, and then re-encoded them in Punycode, in order have
the entire operation be contained within the protocol. This made for the entire operation be contained within the protocol. Ignoring
a much simpler implementation, making it it significantly less context, locale, and user preference in the IDNA protocol made life
complicated for the application developer, but at the expense of significantly less complicated for the application developer, but at
minimizing "user surprise" for consumers and producers of domain the expense of violating the principle of "least user surprise" for
names. consumers and producers of domain names.
In IDNA2008, the dividing line between "user interface" and In IDNA2008, the dividing line between "user interface" and
"protocol" is clear. The IDNA2008 specification defines the protocol "protocol" is clear. The IDNA2008 specification defines the protocol
part of IDNA: it explicitly does not deal with the user interface. part of IDNA: it explicitly does not deal with the user interface.
Mappings such as the one described in this document explicitly deal Mappings such as the one described in this document explicitly deal
with the user interface and not the protocol. That is, a mapping is with the user interface and not the protocol. That is, a mapping is
only to be applied before a string of characters is treated as a only to be applied before a string of characters is treated as a
domain name (in the "user interface") and is never to be applied domain name (in the "user interface") and is never to be applied
during domain name processing (in the "protocol"). during domain name processing (in the "protocol").
skipping to change at line 150 skipping to change at page 4, line 10
for quite large populations of people. for quite large populations of people.
A good mapping in the real world might use the "sensible and friendly A good mapping in the real world might use the "sensible and friendly
and mostly obvious" design goal but come up with a different and mostly obvious" design goal but come up with a different
algorithm. Many algorithms will have results that are close to what algorithm. Many algorithms will have results that are close to what
is described here, but will differ in assumptions about the users' is described here, but will differ in assumptions about the users'
way of thinking or typing. Having said that, it is likely that some way of thinking or typing. Having said that, it is likely that some
mappings will be significantly different. For example, a mapping mappings will be significantly different. For example, a mapping
might apply to a spoken user interface instead of a typed one. might apply to a spoken user interface instead of a typed one.
Another example is that a mapping might be different for users typing Another example is that a mapping might be different for users typing
than for users using copy-and-paste from different applications. than for users using copy-and-paste from different applications. Yet
another example is that a user interface that allows typed input that
is transliterated from Latin characters could have very different
mappings than one that applies to typing in other character sets;
this would be typical in a Pinyin input method for Chinese
characters.
2. The General Procedure 2. The General Procedure
This section defines a general algorithm that applications ought to This section defines a general algorithm that applications ought to
implement in order to produce Unicode code points that will be valid implement in order to produce Unicode code points that will be valid
under the IDNA protocol. An application might implement the full under the IDNA protocol. An application might implement the full
mapping as described below, or can choose a different mapping. This mapping as described below, or can choose a different mapping. This
mapping is very general and was designed to be very acceptable to the mapping is very general and was designed to be very acceptable to the
widest user community, but as stated above, it does not take into widest user community, but as stated above, it does not take into
account any particular context, culture, or locale. account any particular context, culture, or locale.
skipping to change at line 175 skipping to change at page 4, line 40
1. Upper case characters are mapped to their lower case equivalents 1. Upper case characters are mapped to their lower case equivalents
by using the algorithm for mapping case in Unicode characters. by using the algorithm for mapping case in Unicode characters.
This step was chosen because the output will behave more like This step was chosen because the output will behave more like
ASCII host names behave. ASCII host names behave.
2. Full-width and half-width characters (those defined with 2. Full-width and half-width characters (those defined with
Decomposition Types <wide> and <narrow>) are mapped to their Decomposition Types <wide> and <narrow>) are mapped to their
decomposition mappings as shown in the Unicode character decomposition mappings as shown in the Unicode character
database. This step was chosen because many input mechanisms, database. This step was chosen because many input mechanisms,
particularly in Asia, do no allow you to easily enter characters particularly in Asia, do not allow you to easily enter characters
in the form used by IDNA2008. Even if they do allow the correct in the form used by IDNA2008. Even if they do allow the correct
character form, the user might not know which form they are character form, the user might not know which form they are
entering. entering.
3. All characters are mapped using Unicode Normalization Form C 3. All characters are mapped using Unicode Normalization Form C
(NFC). This step was chosen because it maps combinations of (NFC). This step was chosen because it maps combinations of
combining characters into canonical composed form. As with the combining characters into canonical composed form. As with the
full-width/half-width mapping, users are not generally aware of full-width/half-width mapping, users are not generally aware of
the particular form of characters that they are entering, and the particular form of characters that they are entering, and
IDNA2008 requires that only the canonical composed forms from NFC IDNA2008 requires that only the canonical composed forms from NFC
 End of changes. 6 change blocks. 
11 lines changed or deleted 16 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/