| < draft-ietf-idnabis-mappings-00.txt | draft-ietf-idnabis-mappings-01.txt > | |||
|---|---|---|---|---|
| IDNABIS P. Resnick, Ed. | IDNABIS P. Resnick, Ed. | |||
| Internet-Draft Qualcomm Incorporated | Internet-Draft Qualcomm Incorporated | |||
| Intended status: Standards Track May 25, 2009 | Intended status: Standards Track P. Hoffman | |||
| Expires: November 26, 2009 | Expires: January 4, 2010 VPN Consortium | |||
| July 3, 2009 | ||||
| Mapping Characters in IDNA | Mapping Characters in IDNA | |||
| draft-ietf-idnabis-mappings-00 | draft-ietf-idnabis-mappings-01 | |||
| Status of this Memo | Status of this Memo | |||
| This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
| provisions of BCP 78 and BCP 79. This document may contain material | provisions of BCP 78 and BCP 79. This document may contain material | |||
| from IETF Documents or IETF Contributions published or made publicly | from IETF Documents or IETF Contributions published or made publicly | |||
| available before November 10, 2008. The person(s) controlling the | available before November 10, 2008. The person(s) controlling the | |||
| copyright in some of this material may not have granted the IETF | copyright in some of this material may not have granted the IETF | |||
| Trust the right to allow modifications of such material outside the | Trust the right to allow modifications of such material outside the | |||
| IETF Standards Process. Without obtaining an adequate license from | IETF Standards Process. Without obtaining an adequate license from | |||
| skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 43 ¶ | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on November 26, 2009. | This Internet-Draft will expire on January 4, 2010. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2009 IETF Trust and the persons identified as the | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents in effect on the date of | Provisions Relating to IETF Documents in effect on the date of | |||
| publication of this document (http://trustee.ietf.org/license-info). | publication of this document (http://trustee.ietf.org/license-info). | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| skipping to change at page 2, line 18 ¶ | skipping to change at page 2, line 20 ¶ | |||
| Abstract | Abstract | |||
| In the original version of the Internationalized Domain Names in | In the original version of the Internationalized Domain Names in | |||
| Applications (IDNA) protocol, any Unicode code points taken from user | Applications (IDNA) protocol, any Unicode code points taken from user | |||
| input were mapped into a set of Unicode code points that "make | input were mapped into a set of Unicode code points that "make | |||
| sense", which were then encoded and passed to the domain name system | sense", which were then encoded and passed to the domain name system | |||
| (DNS). The current version of IDNA presumes that the input to the | (DNS). The current version of IDNA presumes that the input to the | |||
| protocol comes from a set of "permitted" code points, which it then | protocol comes from a set of "permitted" code points, which it then | |||
| encodes and passes to the DNS, but does not specify what to do with | encodes and passes to the DNS, but does not specify what to do with | |||
| the result of user input. This document specifies the actions taken | the result of user input. This document describes the actions taken | |||
| by an implementation between user input and passing permitted code | by an implementation between user input and passing permitted code | |||
| points to the new IDNA protocol. | points to the new IDNA protocol. | |||
| Table of Contents | ||||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | ||||
| 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4 | ||||
| 2. Architectural Principles . . . . . . . . . . . . . . . . . . . 4 | ||||
| 3. The General Procedure . . . . . . . . . . . . . . . . . . . . . 6 | ||||
| 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 | ||||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 | ||||
| Appendix A. Backwards-compatible Mapping Algorithm . . . . . . . . 7 | ||||
| Appendix B. Acknowledgements . . . . . . . . . . . . . . . . . . . 7 | ||||
| 6. Normative References . . . . . . . . . . . . . . . . . . . . . 7 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 8 | ||||
| 1. Introduction | 1. Introduction | |||
| This document specifies the operations that applications apply to | This document describes the operations that can be applied to user | |||
| user input in order to get it into a form acceptable by the | input in order to get it into a form acceptable by the | |||
| Internationalized Domain Names in Applications (IDNA) protocol | Internationalized Domain Names in Applications (IDNA) protocol | |||
| [I-D.ietf-idnabis-protocol]. The document describes the | [I-D.ietf-idnabis-protocol]. The document describes the underlying | |||
| architectural principles that underly this function in section 2, | architectural principles (in section 2 and the general implementation | |||
| describes a general procedure that an application SHOULD implement in | procedure (in section 3). | |||
| section 3, and specifies an algorithm and mapping that an application | ||||
| MAY implement in order to remain reasonably backward compatible with | ||||
| the original version of the IDNA protocol in appendix A. | ||||
| It should be noted that this document is NOT specifying the behavior | It should be noted that this document does not specify the behavior | |||
| of a protocol that appears "on the wire". It specifies an operation | of a protocol that appears "on the wire". It describes an operation | |||
| that is to be applied to user input in order to prepare that user | that is to be applied to user input in order to prepare that user | |||
| input for use in an "on the network" protocol. As unusual as this | input for use in an "on the network" protocol. As unusual as this | |||
| may be for an IETF protocol document, it is a necessary operation to | may be for an IETF protocol document, it is a necessary operation to | |||
| maintain interoperability. | maintain interoperability. | |||
| 1.1. Requirements Language | ||||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||||
| document are to be interpreted as described in RFC 2119 [RFC2119]. | ||||
| 2. Architectural Principles | 2. Architectural Principles | |||
| An application that implements the IDNA protocol | An application that implements the IDNA protocol | |||
| [I-D.ietf-idnabis-protocol] must take a set of user input and convert | [I-D.ietf-idnabis-protocol] will always take any user input and | |||
| that input to a set of Unicode code points. That user input might be | convert it to a set of Unicode code points. That user input may be | |||
| acquired by any of several different input methods, all with | ||||
| differing conversion processes to be taken into consideration (e.g., | ||||
| typed on a keyboard, written by hand onto some sort of digitizer, | typed on a keyboard, written by hand onto some sort of digitizer, | |||
| spoken into a microphone and interpreted by a speech-to-text engine, | spoken into a microphone and interpreted by a speech-to-text engine, | |||
| or otherwise. The process of taking any particular user input and | etc.). The process of taking any particular user input and mapping | |||
| mapping it into a Unicode code point may be a simple one: If a user | it into a Unicode code point may be a simple one: If a user strikes | |||
| strikes the "A" key on a US English keyboard, without any modifiers | the "A" key on a US English keyboard, without any modifiers such as | |||
| such as the "Shift" key held down, in order to draw a Latin small | the "Shift" key held down, in order to draw a Latin small letter A | |||
| letter A ("a"), many (perhaps most) modern operating system input | ("a"), many (perhaps most) modern operating system input methods will | |||
| methods will produce to the calling application the code point | produce to the calling application the code point U+0061, encoded in | |||
| U+0061, encoded in a single octet. Sometimes the process is somewhat | a single octet. | |||
| more complicated: A user might strike a particular set of keys to | ||||
| represent a combining macron followed by striking the "A" key in | Sometimes the process is somewhat more complicated: a user might | |||
| order to draw a Latin small letter A with a macron above it. | strike a particular set of keys to represent a combining macron | |||
| Depending on the operating system, the input method chosen by the | followed by striking the "A" key in order to draw a Latin small | |||
| user, and even the parameters with which the application communicates | letter A with a macron above it. Depending on the operating system, | |||
| with the input method, the result might be the code point U+0101 | the input method chosen by the user, and even the parameters with | |||
| (encoded as two octets in UTF-8 or UTF-16, four octets in UTF-32, | which the application communicates with the input method, the result | |||
| etc.), the code point U+0061 followed by the code point U+0304 | might be the code point U+0101 (encoded as two octets in UTF-8 or | |||
| (again, encoded in three or more octets, depending upon the encoding | UTF-16, four octets in UTF-32, etc.), the code point U+0061 followed | |||
| used) or even the code point U+FF41 followed by the code point U+0304 | by the code point U+0304 (again, encoded in three or more octets, | |||
| (and encoded in some form). And these examples leave aside the issue | depending upon the encoding used) or even the code point U+FF41 | |||
| of operating systems and input methods that do not use Unicode code | followed by the code point U+0304 (and encoded in some form). And | |||
| points for their character set. In every case, applications (with | these examples leave aside the issue of operating systems and input | |||
| the help of the operating systems on which they run and the input | methods that do not use Unicode code points for their character set. | |||
| methods used) MUST perform a mapping from user input into Unicode | ||||
| code points. | In every case, applications (with the help of the operating systems | |||
| on which they run and the input methods used) need to perform a | ||||
| mapping from user input into Unicode code points. | ||||
| The original version of the IDNA protocol [RFC3490] used a model | The original version of the IDNA protocol [RFC3490] used a model | |||
| whereby input was taken from the user, mapped (via whatever input | whereby input was taken from the user, mapped (via whatever input | |||
| method mechanisms were used) to a set of Unicode code points, and | method mechanisms were used) to a set of Unicode code points, and | |||
| then further mapped to a set of Unicode code points using the | then further mapped to a set of Unicode code points using the | |||
| Nameprep profile specified in [RFC3491]. In this procedure, there | Nameprep profile specified in [RFC3491]. In this procedure, there | |||
| are two separate mapping steps: First, a mapping done by the input | are two separate mapping steps: First, a mapping done by the input | |||
| method (which might be controlled by the operating system, the | method (which might be controlled by the operating system, the | |||
| application, or some combination) and then a second mapping performed | application, or some combination) and then a second mapping performed | |||
| by the Nameprep portion of the IDNA protocol. The mapping done in | by the Nameprep portion of the IDNA protocol. The mapping done in | |||
| skipping to change at page 6, line 11 ¶ | skipping to change at page 4, line 27 ¶ | |||
| whatever mapping it requires to convert input into Unicode code | whatever mapping it requires to convert input into Unicode code | |||
| points. This has the advantage of giving flexibility to the | points. This has the advantage of giving flexibility to the | |||
| application to choose a mapping that is suitable for its user given | application to choose a mapping that is suitable for its user given | |||
| specific user requirements, and avoids the two-step mapping of the | specific user requirements, and avoids the two-step mapping of the | |||
| original protocol. Instead of a mapping, the current version of IDNA | original protocol. Instead of a mapping, the current version of IDNA | |||
| provides a set of categories that can be used to specify the valid | provides a set of categories that can be used to specify the valid | |||
| code points allowed in a domain name. | code points allowed in a domain name. | |||
| In principle, an application ought to take user input of a domain | In principle, an application ought to take user input of a domain | |||
| name and convert it to the set of Unicode code points that represent | name and convert it to the set of Unicode code points that represent | |||
| the domain name the user _intends_. As a practical matter, of | the domain name the user intends. As a practical matter, of course, | |||
| course, determining user desires is a tricky business, so an | determining user intent is a tricky business, so an application needs | |||
| application needs to choose a reasonable mapping from user input. | to choose a reasonable mapping from user input. That may differ | |||
| That may differ based on the particular circumstances of a user, | based on the particular circumstances of a user, depending on locale, | |||
| depending on locale, language, type of input method, etc. It is up | language, type of input method, etc. It is up to the application to | |||
| to the application to make a reasonable choice. | make a reasonable choice. | |||
| In the next section, this document specifies a general algorithm that | ||||
| applications SHOULD implement in order produce Unicode code points | ||||
| that will be valid under the IDNA protocol. Then, in appendix A, a | ||||
| full mapping is specified that is substantially compatible with the | ||||
| original IDNA protocol. An application MAY implement the full | ||||
| mapping or MAY choose a different mapping. | ||||
| 3. The General Procedure | 3. The General Procedure | |||
| This section defines a general algorithm that applications ought to | ||||
| implement in order to produce Unicode code points that will be valid | ||||
| under the IDNA protocol. An application might implement the full | ||||
| mapping as described below, or can choose a different mapping. In | ||||
| fact, an appliction might want to implement a full mapping that is | ||||
| substantially compatible with the original IDNA protocol instead of | ||||
| the algorithm given here. | ||||
| The general algorithm that an application (or the input method | The general algorithm that an application (or the input method | |||
| provided by an operating system) should use is relatively | provided by an operating system) ought to use is relatively | |||
| straightforward and generally follows section 5 of | straightforward and generally follows section 5 of | |||
| [I-D.ietf-idnabis-protocol]: | [I-D.ietf-idnabis-protocol]: | |||
| 1. All characters are mapped using Unicode Normalization Form C | 1. All characters are mapped using Unicode Normalization Form C | |||
| (NFC). [Unicode51] | (NFC). | |||
| 2. Capital (upper case) characters are mapped to their small (lower | 2. Upper case characters are mapped to their lower case equivalents | |||
| case) equivalents. [[anchor2: Need reference to "toLowerCase"]] | by using the algorithm for mapping Unicode characters. | |||
| 3. Full-width and half-width CJK characters are mapped to their | 3. Full-width and half-width characters (those defined with | |||
| equivalents. [[anchor3: Handwaving for how that's supposed to | Decomposition Types <wide> and <narrow>) are mapped to their | |||
| happen]] | decomposition mappings as shown in the Unicode character | |||
| database. | ||||
| These are the minimal mappings that an application SHOULD do. Of | Definitions for the rules in this algorithm can be found in | |||
| course, there are many others that MAY be done. In particular, a | [Unicode51]. Specifically: | |||
| mapping that in substantially compatible with [RFC3490] appears below | ||||
| in appendix A. | ||||
| 4. IANA Considerations | o Unicode Normalization Form C can be found in Annex #15 of | |||
| [Unicode51]. | ||||
| This memo includes no request to IANA. | o In order to map upper case characters to their lower case | |||
| equivalents (defined in section 3.13 of [Unicode51]), first map | ||||
| characters to the "Lowercase_Mapping" property (the "<lower>" | ||||
| entry in the second column) in | ||||
| <http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt>, if any. | ||||
| Then, map characters to the "Simple_Lowercase_Mapping" property | ||||
| (the fourteenth column) in | ||||
| <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>, if any. | ||||
| 5. Security Considerations | o In order to map full-width and half-width characters to their | |||
| decomposition mappings, map any character whose | ||||
| "Decomposition_Type" (contained in the first part of of the sixth | ||||
| column) in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt> | ||||
| is either "<wide>" or "<narrow>" to the "Decomposition_Mapping" of | ||||
| that character (contained in the second part of the sixth column) | ||||
| in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>. | ||||
| Appendix A. Backwards-compatible Mapping Algorithm | o The <http://www.unicode.org/Public/UNIDATA/UCD.html> web page has | |||
| useful descriptions of the contents of these files. | ||||
| The following mapping is mostly backwards-compatible with the | If this mappings in this document are applied to versions of Unicode | |||
| original version of the IDNA protocol [RFC3490]. One important | later than Unicode 5.1, the later versions of the Unicode Standard | |||
| change is that the original IDNA specification mapped some characters | should be consulted. | |||
| to nothing that the current IDNA specification permit. Those | ||||
| characters are not re-mapped in this algorithm. | ||||
| [[anchor4: This is filler; needs to be completed.]] | These are a minimal set of mappings that an application should | |||
| strongly consider doing. Of course, there are many others that might | ||||
| be done. | ||||
| 1. Map using table B.1 and B.2 from [RFC3454]. | 4. IANA Considerations | |||
| 2. Normalize using Unicode Normalization Form KC. [Unicode51] | This memo includes no request to IANA. | |||
| 3. Prohibit using tables C.1.2, C.3, C.4, C.5, C.6, C.7, C.8, and | 5. Security Considerations | |||
| C.9 from [RFC3454]. | ||||
| Appendix B. Acknowledgements | This document suggests creating mappings that might cause confusion | |||
| for some users while alleviating confusion in other users. Such | ||||
| confusion is not covered in any depth in this document (nor in the | ||||
| other IDNA-related documents). | ||||
| 6. Normative References | 6. Normative References | |||
| [I-D.ietf-idnabis-protocol] | [I-D.ietf-idnabis-protocol] | |||
| Klensin, J., "Internationalized Domain Names in | Klensin, J., "Internationalized Domain Names in | |||
| Applications (IDNA): Protocol", | Applications (IDNA): Protocol", | |||
| draft-ietf-idnabis-protocol-12 (work in progress), | draft-ietf-idnabis-protocol-12 (work in progress), | |||
| May 2009. | May 2009. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | ||||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | ||||
| [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of | ||||
| Internationalized Strings ("stringprep")", RFC 3454, | ||||
| December 2002. | ||||
| [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, | |||
| "Internationalizing Domain Names in Applications (IDNA)", | "Internationalizing Domain Names in Applications (IDNA)", | |||
| RFC 3490, March 2003. | RFC 3490, March 2003. | |||
| [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep | |||
| Profile for Internationalized Domain Names (IDN)", | Profile for Internationalized Domain Names (IDN)", | |||
| RFC 3491, March 2003. | RFC 3491, March 2003. | |||
| [Unicode51] | [Unicode51] | |||
| The Unicode Consortium, "The Unicode Standard, Version | The Unicode Consortium, "The Unicode Standard, Version | |||
| 5.1.0", 2008. | 5.1.0", 2008. | |||
| defined by: The Unicode Standard, Version 5.0, Boston, MA, | defined by: The Unicode Standard, Version 5.0, Boston, MA, | |||
| Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by | Addison-Wesley, 2007, ISBN 0-321-48091-0, as amended by | |||
| Unicode 5.1.0 | Unicode 5.1.0 | |||
| (http://www.unicode.org/versions/Unicode5.1.0/). | (<http://www.unicode.org/versions/Unicode5.1.0/>). | |||
| Author's Address | Authors' Addresses | |||
| Peter W. Resnick (editor) | Peter W. Resnick (editor) | |||
| Qualcomm Incorporated | Qualcomm Incorporated | |||
| 5775 Morehouse Drive | 5775 Morehouse Drive | |||
| San Diego, CA 92121-1714 | San Diego, CA 92121-1714 | |||
| US | US | |||
| Phone: +1 858 651 4478 | Phone: +1 858 651 4478 | |||
| Email: presnick@qualcomm.com | Email: presnick@qualcomm.com | |||
| URI: http://www.qualcomm.com/~presnick/ | URI: http://www.qualcomm.com/~presnick/ | |||
| Paul Hoffman | ||||
| VPN Consortium | ||||
| 127 Segre Place | ||||
| Santa Cruz, CA 95060 | ||||
| US | ||||
| Phone: 1-831-426-9827 | ||||
| Email: paul.hoffman@vpnc.org | ||||
| End of changes. 32 change blocks. | ||||
| 107 lines changed or deleted | 100 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||