< draft-ietf-idnabis-mappings-00.txt   draft-ietf-idnabis-mappings-01.txt >
IDNABIS P. Resnick, Ed. IDNABIS P. Resnick, Ed.
Internet-Draft Qualcomm Incorporated Internet-Draft Qualcomm Incorporated
Intended status: Standards Track May 25, 2009 Intended status: Standards Track P. Hoffman
Expires: November 26, 2009 Expires: January 4, 2010 VPN Consortium
July 3, 2009
Mapping Characters in IDNA Mapping Characters in IDNA
draft-ietf-idnabis-mappings-00 draft-ietf-idnabis-mappings-01
Status of this Memo Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79. This document may contain material provisions of BCP 78 and BCP 79. This document may contain material
from IETF Documents or IETF Contributions published or made publicly from IETF Documents or IETF Contributions published or made publicly
available before November 10, 2008. The person(s) controlling the available before November 10, 2008. The person(s) controlling the
copyright in some of this material may not have granted the IETF copyright in some of this material may not have granted the IETF
Trust the right to allow modifications of such material outside the Trust the right to allow modifications of such material outside the
IETF Standards Process. Without obtaining an adequate license from IETF Standards Process. Without obtaining an adequate license from
skipping to change at page 1, line 42 skipping to change at page 1, line 43
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on November 26, 2009. This Internet-Draft will expire on January 4, 2010.
Copyright Notice Copyright Notice
Copyright (c) 2009 IETF Trust and the persons identified as the Copyright (c) 2009 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents in effect on the date of Provisions Relating to IETF Documents in effect on the date of
publication of this document (http://trustee.ietf.org/license-info). publication of this document (http://trustee.ietf.org/license-info).
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 18 skipping to change at page 2, line 20
Abstract Abstract
In the original version of the Internationalized Domain Names in In the original version of the Internationalized Domain Names in
Applications (IDNA) protocol, any Unicode code points taken from user Applications (IDNA) protocol, any Unicode code points taken from user
input were mapped into a set of Unicode code points that "make input were mapped into a set of Unicode code points that "make
sense", which were then encoded and passed to the domain name system sense", which were then encoded and passed to the domain name system
(DNS). The current version of IDNA presumes that the input to the (DNS). The current version of IDNA presumes that the input to the
protocol comes from a set of "permitted" code points, which it then protocol comes from a set of "permitted" code points, which it then
encodes and passes to the DNS, but does not specify what to do with encodes and passes to the DNS, but does not specify what to do with
the result of user input. This document specifies the actions taken the result of user input. This document describes the actions taken
by an implementation between user input and passing permitted code by an implementation between user input and passing permitted code
points to the new IDNA protocol. points to the new IDNA protocol.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4
2. Architectural Principles . . . . . . . . . . . . . . . . . . . 4
3. The General Procedure . . . . . . . . . . . . . . . . . . . . . 6
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6
5. Security Considerations . . . . . . . . . . . . . . . . . . . . 7
Appendix A. Backwards-compatible Mapping Algorithm . . . . . . . . 7
Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 7
6. Normative References . . . . . . . . . . . . . . . . . . . . . 7
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8
1. Introduction 1. Introduction
This document specifies the operations that applications apply to This document describes the operations that can be applied to user
user input in order to get it into a form acceptable by the input in order to get it into a form acceptable by the
Internationalized Domain Names in Applications (IDNA) protocol Internationalized Domain Names in Applications (IDNA) protocol
[I-D.ietf-idnabis-protocol]. The document describes the [I-D.ietf-idnabis-protocol]. The document describes the underlying
architectural principles that underly this function in section 2, architectural principles (in section 2 and the general implementation
describes a general procedure that an application SHOULD implement in procedure (in section 3).
section 3, and specifies an algorithm and mapping that an application
MAY implement in order to remain reasonably backward compatible with
the original version of the IDNA protocol in appendix A.
It should be noted that this document is NOT specifying the behavior It should be noted that this document does not specify the behavior
of a protocol that appears "on the wire". It specifies an operation of a protocol that appears "on the wire". It describes an operation
that is to be applied to user input in order to prepare that user that is to be applied to user input in order to prepare that user
input for use in an "on the network" protocol. As unusual as this input for use in an "on the network" protocol. As unusual as this
may be for an IETF protocol document, it is a necessary operation to may be for an IETF protocol document, it is a necessary operation to
maintain interoperability. maintain interoperability.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Architectural Principles 2. Architectural Principles
An application that implements the IDNA protocol An application that implements the IDNA protocol
[I-D.ietf-idnabis-protocol] must take a set of user input and convert [I-D.ietf-idnabis-protocol] will always take any user input and
that input to a set of Unicode code points. That user input might be convert it to a set of Unicode code points. That user input may be
acquired by any of several different input methods, all with
differing conversion processes to be taken into consideration (e.g.,
typed on a keyboard, written by hand onto some sort of digitizer, typed on a keyboard, written by hand onto some sort of digitizer,
spoken into a microphone and interpreted by a speech-to-text engine, spoken into a microphone and interpreted by a speech-to-text engine,
or otherwise. The process of taking any particular user input and etc.). The process of taking any particular user input and mapping
mapping it into a Unicode code point may be a simple one: If a user it into a Unicode code point may be a simple one: If a user strikes
strikes the "A" key on a US English keyboard, without any modifiers the "A" key on a US English keyboard, without any modifiers such as
such as the "Shift" key held down, in order to draw a Latin small the "Shift" key held down, in order to draw a Latin small letter A
letter A ("a"), many (perhaps most) modern operating system input ("a"), many (perhaps most) modern operating system input methods will
methods will produce to the calling application the code point produce to the calling application the code point U+0061, encoded in
U+0061, encoded in a single octet. Sometimes the process is somewhat a single octet.
more complicated: A user might strike a particular set of keys to
represent a combining macron followed by striking the "A" key in Sometimes the process is somewhat more complicated: a user might
order to draw a Latin small letter A with a macron above it. strike a particular set of keys to represent a combining macron
Depending on the operating system, the input method chosen by the followed by striking the "A" key in order to draw a Latin small
user, and even the parameters with which the application communicates letter A with a macron above it. Depending on the operating system,
with the input method, the result might be the code point U+0101 the input method chosen by the user, and even the parameters with
(encoded as two octets in UTF-8 or UTF-16, four octets in UTF-32, which the application communicates with the input method, the result
etc.), the code point U+0061 followed by the code point U+0304 might be the code point U+0101 (encoded as two octets in UTF-8 or
(again, encoded in three or more octets, depending upon the encoding UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed
used) or even the code point U+FF41 followed by the code point U+0304 by the code point U+0304 (again, encoded in three or more octets,
(and encoded in some form). And these examples leave aside the issue depending upon the encoding used) or even the code point U+FF41
of operating systems and input methods that do not use Unicode code followed by the code point U+0304 (and encoded in some form). And
points for their character set. In every case, applications (with these examples leave aside the issue of operating systems and input
the help of the operating systems on which they run and the input methods that do not use Unicode code points for their character set.
methods used) MUST perform a mapping from user input into Unicode
code points. In every case, applications (with the help of the operating systems
on which they run and the input methods used) need to perform a
mapping from user input into Unicode code points.
The original version of the IDNA protocol [RFC3490] used a model The original version of the IDNA protocol [RFC3490] used a model
whereby input was taken from the user, mapped (via whatever input whereby input was taken from the user, mapped (via whatever input
method mechanisms were used) to a set of Unicode code points, and method mechanisms were used) to a set of Unicode code points, and
then further mapped to a set of Unicode code points using the then further mapped to a set of Unicode code points using the
Nameprep profile specified in [RFC3491]. In this procedure, there Nameprep profile specified in [RFC3491]. In this procedure, there
are two separate mapping steps: First, a mapping done by the input are two separate mapping steps: First, a mapping done by the input
method (which might be controlled by the operating system, the method (which might be controlled by the operating system, the
application, or some combination) and then a second mapping performed application, or some combination) and then a second mapping performed
by the Nameprep portion of the IDNA protocol. The mapping done in by the Nameprep portion of the IDNA protocol. The mapping done in
skipping to change at page 6, line 11 skipping to change at page 4, line 27
whatever mapping it requires to convert input into Unicode code whatever mapping it requires to convert input into Unicode code
points. This has the advantage of giving flexibility to the points. This has the advantage of giving flexibility to the
application to choose a mapping that is suitable for its user given application to choose a mapping that is suitable for its user given
specific user requirements, and avoids the two-step mapping of the specific user requirements, and avoids the two-step mapping of the
original protocol. Instead of a mapping, the current version of IDNA original protocol. Instead of a mapping, the current version of IDNA
provides a set of categories that can be used to specify the valid provides a set of categories that can be used to specify the valid
code points allowed in a domain name. code points allowed in a domain name.
In principle, an application ought to take user input of a domain In principle, an application ought to take user input of a domain
name and convert it to the set of Unicode code points that represent name and convert it to the set of Unicode code points that represent
the domain name the user _intends_. As a practical matter, of the domain name the user intends. As a practical matter, of course,
course, determining user desires is a tricky business, so an determining user intent is a tricky business, so an application needs
application needs to choose a reasonable mapping from user input. to choose a reasonable mapping from user input. That may differ
That may differ based on the particular circumstances of a user, based on the particular circumstances of a user, depending on locale,
depending on locale, language, type of input method, etc. It is up language, type of input method, etc. It is up to the application to
to the application to make a reasonable choice. make a reasonable choice.
In the next section, this document specifies a general algorithm that
applications SHOULD implement in order produce Unicode code points
that will be valid under the IDNA protocol. Then, in appendix A, a
full mapping is specified that is substantially compatible with the
original IDNA protocol. An application MAY implement the full
mapping or MAY choose a different mapping.
3. The General Procedure 3. The General Procedure
This section defines a general algorithm that applications ought to
implement in order to produce Unicode code points that will be valid
under the IDNA protocol. An application might implement the full
mapping as described below, or can choose a different mapping. In
fact, an appliction might want to implement a full mapping that is
substantially compatible with the original IDNA protocol instead of
the algorithm given here.
The general algorithm that an application (or the input method The general algorithm that an application (or the input method
provided by an operating system) should use is relatively provided by an operating system) ought to use is relatively
straightforward and generally follows section 5 of straightforward and generally follows section 5 of
[I-D.ietf-idnabis-protocol]: [I-D.ietf-idnabis-protocol]:
1. All characters are mapped using Unicode Normalization Form C 1. All characters are mapped using Unicode Normalization Form C
(NFC). [Unicode51] (NFC).
2. Capital (upper case) characters are mapped to their small (lower 2. Upper case characters are mapped to their lower case equivalents
case) equivalents. [[anchor2: Need reference to "toLowerCase"]] by using the algorithm for mapping Unicode characters.
3. Full-width and half-width CJK characters are mapped to their 3. Full-width and half-width characters (those defined with
equivalents. [[anchor3: Handwaving for how that's supposed to Decomposition Types <wide> and <narrow>) are mapped to their
happen]] decomposition mappings as shown in the Unicode character
database.
These are the minimal mappings that an application SHOULD do. Of Definitions for the rules in this algorithm can be found in
course, there are many others that MAY be done. In particular, a [Unicode51]. Specifically:
mapping that in substantially compatible with [RFC3490] appears below
in appendix A.
4. IANA Considerations o Unicode Normalization Form C can be found in Annex #15 of
[Unicode51].
This memo includes no request to IANA. o In order to map upper case characters to their lower case
equivalents (defined in section 3.13 of [Unicode51]), first map
characters to the "Lowercase_Mapping" property (the "<lower>"
entry in the second column) in
<http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt>, if any.
Then, map characters to the "Simple_Lowercase_Mapping" property
(the fourteenth column) in
<http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>, if any.
5. Security Considerations o In order to map full-width and half-width characters to their
decomposition mappings, map any character whose
"Decomposition_Type" (contained in the first part of of the sixth
column) in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>
is either "<wide>" or "<narrow>" to the "Decomposition_Mapping" of
that character (contained in the second part of the sixth column)
in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>.
Appendix A. Backwards-compatible Mapping Algorithm o The <http://www.unicode.org/Public/UNIDATA/UCD.html> web page has
useful descriptions of the contents of these files.
The following mapping is mostly backwards-compatible with the If this mappings in this document are applied to versions of Unicode
original version of the IDNA protocol [RFC3490]. One important later than Unicode 5.1, the later versions of the Unicode Standard
change is that the original IDNA specification mapped some characters should be consulted.
to nothing that the current IDNA specification permit. Those
characters are not re-mapped in this algorithm.
[[anchor4: This is filler; needs to be completed.]] These are a minimal set of mappings that an application should
strongly consider doing. Of course, there are many others that might
be done.
1. Map using table B.1 and B.2 from [RFC3454]. 4. IANA Considerations
2. Normalize using Unicode Normalization Form KC. [Unicode51] This memo includes no request to IANA.
3. Prohibit using tables C.1.2, C.3, C.4, C.5, C.6, C.7, C.8, and 5. Security Considerations
C.9 from [RFC3454].
Appendix B. Acknowledgements This document suggests creating mappings that might cause confusion
for some users while alleviating confusion in other users. Such
confusion is not covered in any depth in this document (nor in the
other IDNA-related documents).
6. Normative References 6. Normative References
[I-D.ietf-idnabis-protocol] [I-D.ietf-idnabis-protocol]
Klensin, J., "Internationalized Domain Names in Klensin, J., "Internationalized Domain Names in
Applications (IDNA): Protocol", Applications (IDNA): Protocol",
draft-ietf-idnabis-protocol-12 (work in progress), draft-ietf-idnabis-protocol-12 (work in progress),
May 2009. May 2009.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of
Internationalized Strings ("stringprep")", RFC 3454,
December 2002.
[RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello,
"Internationalizing Domain Names in Applications (IDNA)", "Internationalizing Domain Names in Applications (IDNA)",
RFC 3490, March 2003. RFC 3490, March 2003.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep
Profile for Internationalized Domain Names (IDN)", Profile for Internationalized Domain Names (IDN)",
RFC 3491, March 2003. RFC 3491, March 2003.
[Unicode51] [Unicode51]
The Unicode Consortium, "The Unicode Standard, Version The Unicode Consortium, "The Unicode Standard, Version
5.1.0", 2008. 5.1.0", 2008.
defined by: The Unicode Standard, Version 5.0, Boston, MA, defined by: The Unicode Standard, Version 5.0, Boston, MA,
Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by
Unicode 5.1.0 Unicode 5.1.0
(http://www.unicode.org/versions/Unicode5.1.0/). (<http://www.unicode.org/versions/Unicode5.1.0/>).
Author's Address Authors' Addresses
Peter W. Resnick (editor) Peter W. Resnick (editor)
Qualcomm Incorporated Qualcomm Incorporated
5775 Morehouse Drive 5775 Morehouse Drive
San Diego, CA 92121-1714 San Diego, CA 92121-1714
US US
Phone: +1 858 651 4478 Phone: +1 858 651 4478
Email: presnick@qualcomm.com Email: presnick@qualcomm.com
URI: http://www.qualcomm.com/~presnick/ URI: http://www.qualcomm.com/~presnick/
Paul Hoffman
VPN Consortium
127 Segre Place
Santa Cruz, CA 95060
US
Phone: 1-831-426-9827
Email: paul.hoffman@vpnc.org
 End of changes. 32 change blocks. 
107 lines changed or deleted 100 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/