Network Working Group M. Blanchet Internet-Draft Viagenie Intended status: Informational A. Sullivan Expires: April 21, 2011 October 18, 2010 Stringprep Revision Problem Statement draft-ietf-precis-problem-statement-00.txt Abstract Using Unicode codepoints in protocol strings that expect comparison with other strings [[anchor1: The WG will need to decide whether "other strings" is too broad. In particular, what about protocol slots that can take strings other than plain ASCII? --ajs@shinkuro.com]] requires preparation of the string that contains the Unicode codepoints. Internationalizing Domain Names in Applications (IDNA2003) defined and used Stringprep and Nameprep. Other protocols subsequently defined Stringprep profiles. A new approach different from Stringprep and Nameprep is used for a revision of IDNA2003 (called IDNA2008). Other Stringprep profiles need to be similarly updated or a replacement of Stringprep need to be designed. This document outlines the issues to be faced by those designing a Stringprep replacement. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 21, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Blanchet & Sullivan Expires April 21, 2011 [Page 1] Internet-Draft Stringprep Revision Problem Statement October 2010 Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Blanchet & Sullivan Expires April 21, 2011 [Page 2] Internet-Draft Stringprep Revision Problem Statement October 2010 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Usage and Issues of Stringprep . . . . . . . . . . . . . . . . 5 2.1. Issues raised during newprep BOF . . . . . . . . . . . . . 5 2.2. Specific issues with particular Stringprep profiles . . . 6 2.3. Inclusion vs. exclusion of characters . . . . . . . . . . 6 2.4. Stringprep and NFKC . . . . . . . . . . . . . . . . . . . 7 2.5. Case mapping . . . . . . . . . . . . . . . . . . . . . . . 7 2.6. Whether to use ASCII-compatible encoding . . . . . . . . . 7 2.7. Issues with delimiters . . . . . . . . . . . . . . . . . . 8 3. Considerations for Stringprep replacement . . . . . . . . . . 8 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 6. Discussion home for this draft . . . . . . . . . . . . . . . . 9 7. Informative References . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 Blanchet & Sullivan Expires April 21, 2011 [Page 3] Internet-Draft Stringprep Revision Problem Statement October 2010 1. Introduction Internationalizing Domain Names in Applications (IDNA2003) [RFC3490], [RFC3491], [RFC3492], [RFC3454] described a mechanism for encoding UTF-8 labels making up Internationalized Domain Names (IDNs) as standard DNS labels. The labels were processed using a method called Nameprep [RFC3491] and Punycode [RFC3492]. That method was specific to IDNA2003, but is generalized as Stringprep [RFC3454]. The general mechanism can be used to help other protocols with similar needs, but with different constraints than IDNA2003. Stringprep defines a framework within which protocols define their Stringprep profiles. Known IETF specifications using Stringprep are listed below: o The Nameprep profile [RFC3490] for use in Internationalized Domain Names (IDNs); o NFSv4 [RFC3530] and NFSv4.1 [RFC5661]; o The iSCSI profile [RFC3722] for use in Internet Small Computer Systems Interface (iSCSI) Names; o EAP [RFC3748]; o The Nodeprep and Resourceprep profiles [RFC3920] for use in the Extensible Messaging and Presence Protocol (XMPP), and the XMPP to CPIM mapping [RFC3922]; o The Policy MIB profile [RFC4011] for use in the Simple Network Management Protocol (SNMP); o The SASLprep profile [RFC4013] for use in the Simple Authentication and Security Layer (SASL), and SASL itself [RFC4422]; o TLS [RFC4279]; o IMAP4 using SASLprep [RFC4314]; o The trace profile [RFC4505] for use with the SASL ANONYMOUS mechanism; o The LDAP profile [RFC4518] for use with LDAP [RFC4511] and its authentication methods [RFC4513]; o Plain SASL using SASLprep [RFC4616]; o NNTP using SASLprep [RFC4643]; o PKIX subject identification using LDAPprep [RFC4683]; o Internet Application Protocol Collation Registry [RFC4790]; o SMTP Auth using SASLprep [RFC4954]; o POP3 Auth using SASLprep [RFC5034]; o TLS SRP using SASLprep [RFC5054]; o IRI and URI in XMPP [RFC5122]; o PKIX CRL using LDAPprep [RFC5280]; o IAX using Nameprep [RFC5456]; o SASL SCRAM using SASLprep [RFC5802]; o Remote management of Sieve using SASLprep [RFC5804]; Blanchet & Sullivan Expires April 21, 2011 [Page 4] Internet-Draft Stringprep Revision Problem Statement October 2010 o The i;unicode-casemap Unicode Collation [RFC5051]. There turned out to be some difficulties with IDNA2003, documented in [RFC4690]. These difficulties led to a new IDN specification, called IDNA2008 [RFC5890], [RFC5891], [RFC5892], [RFC5893]. Additional background and explanations of the decisions embodied in IDNA2008 is presented in [RFC5894]. One of the effects of IDNA2008 is that Nameprep and Stringprep are not used at all. Instead, an algorithm based on Unicode properties of codepoints is defined. That algorithm generates a stable and complete table of the supported Unicode codepoints. This algorithm is based on an inclusion-based approach, instead of the exclusion-based approach of Stringprep/Nameprep. This document lists the shortcomings and issues found by protocols listed above that defined Stringprep profiles. It also lists some early conclusions and requirements for a potential replacement of Stringprep. 2. Usage and Issues of Stringprep 2.1. Issues raised during newprep BOF During IETF 77, a BOF discussed the current state of the protocols that have defined Stringprep profiles [NEWPREP]. The main conclusions are : o Stringprep is bound to a specific version of Unicode: 3.2. Stringprep has not been updated to new versions of Unicode. Therefore, the protocols using Stringprep are stuck to Unicode 3.2. o The protocols need to be updated to support new versions of Unicode. The protocols would like to not be bound to a specific version of Unicode, but rather have better Unicode agility in the way of IDNA2008. This is important partly because it is usually impossible for an application to require Unicode 3.2; the application gets whatever version of Unicode is available on the host. o The protocols require better bidirectional support (bidi) than currently offered by Stringprep. o If the protocols are updated to use a new version of Stringprep or another framework, then backward compatibility is an important requirement. For example, Stringprep is based on and may use NFKC [UAX15], while IDNA2008 mostly uses NFC [UAX15]. o Protocols use each other; for example, a protocol can use user identifiers that are later passed to SASL, LDAP or another authentication mechanism. Therefore, common set of rules or classes of strings are preferred over specific rules for each protocol. Blanchet & Sullivan Expires April 21, 2011 [Page 5] Internet-Draft Stringprep Revision Problem Statement October 2010 Protocols that use Stringprep profiles use strings for different purposes: o XMPP uses a different Stringprep profile for each part of the XMPP address (JID): a localpart which is similar to a username and used for authentication, a domainpart which is a domain name and a resource part which is less restrictive than the localpart. o iSCSI uses a Stringprep profile for the IQN, which is very similar to (often is) a DNS domain name. o SASL and LDAP uses a Stringprep profile for usernames. o LDAP uses a set of Stringprep profiles. During the newprep BOF, it was the consensus of the attendees that it would be highly preferable to have a replacement of Stringprep, with similar characteristics to IDNA2008. That replacement should be defined so that the protocols could use internationalized strings without a lot of specialized internationalization work, since internationalization expertise is not available in the respective protocols or working groups. 2.2. Specific issues with particular Stringprep profiles [[anchor6: This section is where issues raised in the individual profile reviews goes. A review of the WG trac state on 2010-10-06 of the tracker suggests those reviews haven't happened yet. --ajs@shinkuro.com]] 2.3. Inclusion vs. exclusion of characters One of the primary changes of IDNA2008 is in the way it approaches Unicode characters. IDNA2003 created an explicit list of excluded or mapped-away characters; anything in Unicode 3.2 that was not so listed could be assumed to be allowed under the protocol. IDNA2008 begins instead from the assumption that characters are disallowed, and then relies on Unicode properties to derive whether a given character actually is allowed in the protocol. Moreover, there is more than one class of "allowed in the protocol". While some characters are simply disallowed, some are allowed only in certain contexts. The reasons for the context-dependent rules have to do with the way some characters are used. For instance, the ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER characters (ZWJ, U+200D and ZWNJ, U+200C) are allowed with contextual rules because they are required in some circumstances, yet are considered punctuation by Unicode and would therefore be DISALLOWED under the usual IDNA2008 derivation rules. The working group needs to decide whether similar contextual cases need to be supported. Blanchet & Sullivan Expires April 21, 2011 [Page 6] Internet-Draft Stringprep Revision Problem Statement October 2010 2.4. Stringprep and NFKC Stringprep profiles may use normalization. If they do, they use NFKC [UAX15]. It is not clear that NFKC is the right normalization to use in all cases. In [UAX15], there is the following observation regarding Normalization Forms KC and KD: "It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate." For things like the spelling of users' names, then, NKFC may not be the best form to use. At the same time, one of the nice things about NFKC is that it deals with the width of characters that are otherwise similar, by canonicalizing half-width to full- width. This mapping step can be crucial in practice. The WG will need to analyze the different use profiles and consider whether NFKC or NFC is a better normalization for each profile. 2.5. Case mapping In IDNA2003, labels are always mapped to lower case before the Punycode transformation. In IDNA2003, there is no mapping at all: input is either a valid U-label or it is not. At the same time, upper-case characters are by definition not valid U-labels, because they fall into the Unstable category (category B) of [RFC5892]. If there are protocols that require upper and lower cases be preserved, then the analogy with IDNA2008 will break down. The working group will need to decide whether there are any cases that require upper case, and what to do about it if so. 2.6. Whether to use ASCII-compatible encoding The development of IDNA2008 depended on the notion that there was a narrow repertoire of reasonable traditional labels, and what was necessary was to internationalize that repertoire rather than to incorporate any characters into domain name labels. More exactly, the idea was to internationalize the traditional hostname rules (the "LDH rule". See [RFC4690], section 5.1.). Efforts to internationalize email ([RFC5336]) have started from different assumptions. The email example suggests that in some cases, the right answer might be to internationalize the target protocol rather than to depend on a technology to ensure protocol slots can use only ASCII. The working group will need to determine which approach is correct for the different use-cases. Blanchet & Sullivan Expires April 21, 2011 [Page 7] Internet-Draft Stringprep Revision Problem Statement October 2010 2.7. Issues with delimiters There are two kinds of issues to address with delimiters. First, exactly where a delimiter will appear on the screen when dealing with bidirectional parts of a string can be extremely surprising. In the case of IDNA2008, just what to do in these cases remains a display issue (there is no question about the wire format, because the wire format is an A-label and it is always left to right). Second, there is the question of whether to include different kinds of protocol separators. For instance, FULL STOP, U+002E (.) may not be available on all keyboards. In addition, in some languages there is more than one full stop which are variants of one another. The working group will need to decide how to handle such cases: whether there will be a mapping, some restrictions, or something else. 3. Considerations for Stringprep replacement The above suggests the following direction for the working group: o A stringprep replacement should be defined. o The replacement should take an approach similar to IDNA2008, in that it enables Unicode agility. o Protocols share similar characteristics of strings. Therefore, defining i18n preparation algorithms for a (small) set of string classes may be sufficient for most cases and provides the coherence among a set of protocol friends. o The sets of string classes need to be evaluated for the following properties: * the normalization needed (NFC vs NFKC); * whether case-folding, case preservation, and case-insensitive matching is needed; * what restrictions on input are reasonable for the class (i.e. whether there is something like an "LDH rule" for the class), or whether the ASCII-only input in the protocol slot is lightly constrained; * the extent to which bidi considerations are important for the class. Existing deployments already depend on Stringprep profiles. Therefore, the working group will need to consider the effects of any new strategy on existing deployments. By way of comparison, it is worth noting that some characters were acceptable in IDNA labels under IDNA2003, but are not protocol-valid under IDNA2008 (and conversely). Different implementers may make different decisions about what to do in such cases; this could have interoperability effects. The working group will need to trade better support for different linguistic environments against the potential side effects Blanchet & Sullivan Expires April 21, 2011 [Page 8] Internet-Draft Stringprep Revision Problem Statement October 2010 of backward incompatibility. 4. Security Considerations This document merely states what problems are to be solved, and does not define a protocol. There are undoubtedly security implications of the particular results that will come from the work to be completed. 5. IANA Considerations This document has no actions for IANA. 6. Discussion home for this draft This document is intended to define the problem space discussed on the precis@ietf.org mailing list. 7. Informative References [NEWPREP] "Newprep BoF Meeting Minutes", March 2010. [RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002. [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, "Internationalizing Domain Names in Applications (IDNA)", RFC 3490, March 2003. [RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003. [RFC3492] Costello, A., "Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)", RFC 3492, March 2003. [RFC3530] Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and D. Noveck, "Network File System (NFS) version 4 Protocol", RFC 3530, April 2003. [RFC3722] Bakke, M., "String Profile for Internet Small Computer Systems Interface (iSCSI) Names", RFC 3722, April 2004. Blanchet & Sullivan Expires April 21, 2011 [Page 9] Internet-Draft Stringprep Revision Problem Statement October 2010 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. Levkowetz, "Extensible Authentication Protocol (EAP)", RFC 3748, June 2004. [RFC3920] Saint-Andre, P., Ed., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 3920, October 2004. [RFC3922] Saint-Andre, P., "Mapping the Extensible Messaging and Presence Protocol (XMPP) to Common Presence and Instant Messaging (CPIM)", RFC 3922, October 2004. [RFC4011] Waldbusser, S., Saperia, J., and T. Hongal, "Policy Based Management MIB", RFC 4011, March 2005. [RFC4013] Zeilenga, K., "SASLprep: Stringprep Profile for User Names and Passwords", RFC 4013, February 2005. [RFC4279] Eronen, P. and H. Tschofenig, "Pre-Shared Key Ciphersuites for Transport Layer Security (TLS)", RFC 4279, December 2005. [RFC4314] Melnikov, A., "IMAP4 Access Control List (ACL) Extension", RFC 4314, December 2005. [RFC4422] Melnikov, A. and K. Zeilenga, "Simple Authentication and Security Layer (SASL)", RFC 4422, June 2006. [RFC4505] Zeilenga, K., "Anonymous Simple Authentication and Security Layer (SASL) Mechanism", RFC 4505, June 2006. [RFC4511] Sermersheim, J., "Lightweight Directory Access Protocol (LDAP): The Protocol", RFC 4511, June 2006. [RFC4513] Harrison, R., "Lightweight Directory Access Protocol (LDAP): Authentication Methods and Security Mechanisms", RFC 4513, June 2006. [RFC4518] Zeilenga, K., "Lightweight Directory Access Protocol (LDAP): Internationalized String Preparation", RFC 4518, June 2006. [RFC4616] Zeilenga, K., "The PLAIN Simple Authentication and Security Layer (SASL) Mechanism", RFC 4616, August 2006. [RFC4643] Vinocur, J. and K. Murchison, "Network News Transfer Protocol (NNTP) Extension for Authentication", RFC 4643, October 2006. Blanchet & Sullivan Expires April 21, 2011 [Page 10] Internet-Draft Stringprep Revision Problem Statement October 2010 [RFC4683] Park, J., Lee, J., Lee, H., Park, S., and T. Polk, "Internet X.509 Public Key Infrastructure Subject Identification Method (SIM)", RFC 4683, October 2006. [RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, September 2006. [RFC4790] Newman, C., Duerst, M., and A. Gulbrandsen, "Internet Application Protocol Collation Registry", RFC 4790, March 2007. [RFC4954] Siemborski, R. and A. Melnikov, "SMTP Service Extension for Authentication", RFC 4954, July 2007. [RFC5034] Siemborski, R. and A. Menon-Sen, "The Post Office Protocol (POP3) Simple Authentication and Security Layer (SASL) Authentication Mechanism", RFC 5034, July 2007. [RFC5051] Crispin, M., "i;unicode-casemap - Simple Unicode Collation Algorithm", RFC 5051, October 2007. [RFC5054] Taylor, D., Wu, T., Mavrogiannopoulos, N., and T. Perrin, "Using the Secure Remote Password (SRP) Protocol for TLS Authentication", RFC 5054, November 2007. [RFC5122] Saint-Andre, P., "Internationalized Resource Identifiers (IRIs) and Uniform Resource Identifiers (URIs) for the Extensible Messaging and Presence Protocol (XMPP)", RFC 5122, February 2008. [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, May 2008. [RFC5336] Yao, J. and W. Mao, "SMTP Extension for Internationalized Email Addresses", RFC 5336, September 2008. [RFC5456] Spencer, M., Capouch, B., Guy, E., Miller, F., and K. Shumard, "IAX: Inter-Asterisk eXchange Version 2", RFC 5456, February 2010. [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File System (NFS) Version 4 Minor Version 1 Protocol", RFC 5661, January 2010. [RFC5802] Newman, C., Menon-Sen, A., Melnikov, A., and N. Williams, Blanchet & Sullivan Expires April 21, 2011 [Page 11] Internet-Draft Stringprep Revision Problem Statement October 2010 "Salted Challenge Response Authentication Mechanism (SCRAM) SASL and GSS-API Mechanisms", RFC 5802, July 2010. [RFC5804] Melnikov, A. and T. Martin, "A Protocol for Remotely Managing Sieve Scripts", RFC 5804, July 2010. [RFC5890] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", RFC 5890, August 2010. [RFC5891] Klensin, J., "Internationalized Domain Names in Applications (IDNA): Protocol", RFC 5891, August 2010. [RFC5892] Faltstrom, P., "The Unicode Code Points and Internationalized Domain Names for Applications (IDNA)", RFC 5892, August 2010. [RFC5893] Alvestrand, H. and C. Karp, "Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)", RFC 5893, August 2010. [RFC5894] Klensin, J., "Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale", RFC 5894, August 2010. [UAX15] "Unicode Standard Annex #15: Unicode Normalization Forms", UAX 15, September 2009. Authors' Addresses Marc Blanchet Viagenie 2600 boul. Laurier, suite 625 Quebec, QC G1V 4W1 Canada Email: Marc.Blanchet@viagenie.ca URI: http://viagenie.ca Andrew Sullivan 519 Maitland St. London, ON N6B 2Z5 Canada Email: ajs@crankycanuck.ca Blanchet & Sullivan Expires April 21, 2011 [Page 12]