Preparation and Comparison of Internationalized Strings Representing Simple User Names and PasswordsCisco Systems, Inc.1899 Wynkoop Street, Suite 600DenverCO80202USA+1-303-308-3282psaintan@cisco.comIsode Ltd5 Castle Business Village36 Station RoadHamptonMiddlesexTW12 2BXUKAlexey.Melnikov@isode.com
Applications
PRECISUsernamePasswordUnicodeInternationalizationSASLprepThis document describes how to handle Unicode strings representing simple user names and passwords, primarily for purposes of comparison. This profile is intended to be used by Simple Authentication and Security Layer (SASL) mechanisms (such as PLAIN and SCRAM-SHA-1), as well as other protocols that exchange simple user names or passwords. This document obsoletes RFC 4013.User names and passwords are used pervasively in authentication and authorization on the Internet. To increase the likelihood that the input and comparison of user names and passwords will work in ways that make sense for typical users throughout the world, this document defines rules for preparing and comparing internationalized strings that represent simple user names and passwords. (In many authentication technologies passwords are not directly compared because the actual password is used as input to an algorithm such as a hash function; however, non-ASCII code points in the input string still need to be handled correctly.)The algorithms defined in this document assume that all strings are comprised of characters from the Unicode character set .The algorithms are designed for use in Simple Authentication and Security Layer (SASL) mechanisms, such as PLAIN and SCRAM-SHA-1 . However, they might be applicable wherever simple user names or passwords are used. This profile is not intended for use in preparing strings that are not simple user names (e.g., email addresses, DNS domain names, LDAP distinguished names), nor in cases where identifiers or secrets are not strings (e.g., keys or certificates) or require different handling (e.g., case folding).This document builds upon the PRECIS framework defined in , which differs fundamentally from the stringprep technology used in SASLprep . The primary difference is that stringprep profiles allowed all characters except those which were explicitly disallowed, whereas PRECIS profiles disallow all characters except those which are explicitly allowed (this "inclusion model" was originally used for internationalized domain names in ; see for further discussion). It is important to keep this distinction in mind when comparing the technology defined in this document to SASLprep .This document obsoletes RFC 4013.Many important terms used in this document are defined in , , , , and . The term "non-ASCII space" refers to any Unicode code point with a general category of "Zs", with the exception of U+0020 (here called "ASCII space").As used here, the term "password" is not literally limited to a word; i.e., a password could be a passphrase consisting of more than one word, perhaps separated by spaces or other such characters.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in .Some SASL mechanisms (e.g., CRAM-MD5, DIGEST-MD5, and SCRAM) specify that the authentication identity used in the context of such mechanisms is a "simple user name" (see Section 2 of as well as ). However, the exact form of a simple user name in any particular mechanism or deployment thereof is a local matter, and a simple user name does not necessarily map to an application identifier such as the localpart of an email address.For purposes of preparation and comparison of authentication identities, this document specifies that a simple user name is a string of Unicode code points , encoded using UTF-8 , and structured either as an ordered sequence of "simpleparts" (where the complete simple user name can consist of a single simplepart or a space-separated sequence of simpleparts) or as a simplepart@domainpart (where the domainpart is an IP literal, an IPv4 address, or a fully-qualified domain name).Therefore the syntax for a simple user name is defined as follows using the Augmented Backus-Naur Form (ABNF) as specified in .Note well that all code points and blocks not explicitly allowed in the PRECIS IdentifierClass are disallowed; this includes private use characters, surrogate code points, and the other code points and blocks defined as "Prohibited Output" in Section 2.3 of RFC 4013.Note also that common constructions such as "user@example.com" are allowed as simple user names when using software that conforms to this specification, as they were under .A simple user name MUST NOT be zero bytes in length. This rule is to be enforced after any normalization and mapping of code points.Each simplepart of a simple user name MUST conform to the definition of the PRECIS IdentifierClass provided in , where the normalization, casemapping, and directionality rules are as described below.Unicode Normalization Form C (NFC) MUST be applied to all characters.Uppercase and titlecase characters MUST be mapped to their lowercase equivalents.Additional mappings MAY be applied, such as those defined in .With regard to directionality, the "Bidi Rule" provided in applies.For purposes of preparation and comparison of passwords, this document specifies that a password is a string of Unicode code points , encoded using UTF-8 , and conformant to the PRECIS FreeformClass.Therefore the syntax for a password is defined as follows using the Augmented Backus-Naur Form (ABNF) as specified in .Note well that all code points and blocks not explicitly allowed in the PRECIS FreeformClass are disallowed; this includes private use characters, surrogate code points, and the other code points and blocks defined as "Prohibited Output" in Section 2.3 of RFC 4013.A password MUST NOT be zero bytes in length. This rule is to be enforced after any normalization and mapping of code points.A password MUST be treated as follows, where the operations specified MUST be completed in the order shown:Apply Unicode Normalization Form C (NFC) to all characters.Map any instances of non-ASCII space to ASCII space (U+0020).Ensure that the resulting string conforms to the definition of the PRECIS FreeformClass.With regard to directionality, the "Bidi Rule" (defined in ) and similar rules are unnecessary and inapplicable to passwords, since they can reduce the range of characters that are allowed in a string and therefore reduce the amount of entropy that is possible in a password. Furthermore, such rules are intended to minimize the possibility that the same string will be displayed differently on a system set for right-to-left display and a system set for left-to-right display; however, passwords are typically not displayed at all and are rarely meant to be interoperable across different systems in the way that non-secret strings like domain names and user names are.The rules defined in this specification differ slightly from those defined by the SASLprep specification . The following sections describe these differences, along with their implications for migration, in more detail.Deployments that currently use SASLprep for handling user names might need to scrub existing data when migrating to use of the rules defined in this specification. In particular:SASLprep specified the use of Unicode Normalization Form KC (NFKC), whereas this usage of the PRECIS IdentifierClass employs Unicode Normalization Form C (NFC). In practice this change is unlikely to cause significant problems, because NFKC provides methods for mapping Unicode code points with compatibility equivalents to those equivalents, whereas the PRECIS IdentifierClass entirely disallows Unicode code points with compatibility equivalents (i.e., during comparison NFKC is more "aggressive" about finding matches than is NFC). A few examples might suffice to indicate the nature of the problem: (1) U+017F LATIN SMALL LETTER LONG S is compatibility equivalent to U+0073 LATIN SMALL LETTER S (2) U+2163 ROMAN NUMERAL FOUR is compatibility equivalent to U+0049 LATIN CAPITAL LETTER I and U+0056 LATIN CAPITAL LETTER V (3) U+FB01 LATIN SMALL LIGATURE FI is compatibility equivalent to U+0066 LATIN SMALL LETTER F and U+0069 LATIN SMALL LETTER I. Under SASLprep, the use of NFKC also handled the mapping of fullwidth and halfwidth code points to their decomposition equivalents (see ). Although it is expected that code points with compatibility equivalents are rare in existing user names, for migration purposes deployments might want to search their database of user names for Unicode code points with compatibility equivalents and map those code points to their compatibility equivalents.SASLprep mapped non-ASCII spaces to ASCII space (U+0020), whereas the PRECIS IdentifierClass entirely disallows non-ASCII spaces. The non-ASCII space characters are U+00A0 NO-BREAK SPACE, U+1680 OGHAM SPACE MARK, U+180E MONGOLIAN VOWEL SEPARATOR, U+2000 EN QUAD through U+200A HAIR SPACE, U+202F NARROW NO-BREAK SPACE, U+205F MEDIUM MATHEMATICAL SPACE, and U+3000 IDEOGRAPHIC SPACE. For migration purposes, deployments might want to convert non-ASCII space characters to ASCII space in simple user names.SASLprep mapped the "characters commonly mapped to nothing" from Appendix B.1 of ) to nothing, whereas the PRECIS IdentifierClass entirely disallows most of these characters, which correspond to the code points from the "M" category defined under Section 6.13 of (with the exception of U+1806 MONGOLIAN TODO SOFT HYPHEN, which was "commonly mapped to nothing" in Unicode 3.2 but at the time of this writing does not have a derived property of Default_Ignorable_Code_Point in Unicode 6.1). For migration purposes, deployments might want to remove code points contained in the PRECIS "M" category from simple user names.SASLprep allowed uppercase and titlecase characters, whereas this usage of the PRECIS IdentifierClass maps uppercase and titlecase characters to their lowercase equivalents. For migration purposes, deployments can either convert uppercase and titlecase characters to their lowercase equivalents in simple user names (thus losing the case information) or preserve uppercase and titlecase characters and ignore the case difference when comparing simple user names.Depending on local service policy, migration from RFC 4013 to this specification might not involve any scrubbing of data (since passwords might not be stored in the clear anyway); however, service providers need to be aware of possible issues that might arise during migration. In particular:SASLprep specified the use of Unicode Normalization Form KC (NFKC), whereas this usage of the PRECIS FreeformClass employs Unicode Normalization Form C (NFC). Because NFKC is more aggressive about finding matches than NFC, in practice this change is unlikely to cause significant problems and indeed has the security benefit of probably resulting in fewer false positives when comparing passwords. A few examples might suffice to indicate the nature of the problem: (1) U+017F LATIN SMALL LETTER LONG S is compatibility equivalent to U+0073 LATIN SMALL LETTER S (2) U+2163 ROMAN NUMERAL FOUR is compatibility equivalent to U+0049 LATIN CAPITAL LETTER I and U+0056 LATIN CAPITAL LETTER V (3) U+FB01 LATIN SMALL LIGATURE FI is compatibility equivalent to U+0066 LATIN SMALL LETTER F and U+0069 LATIN SMALL LETTER I. Under SASLprep, the use of NFKC also handled the mapping of fullwidth and halfwidth code points to their decomposition equivalents (see ). Although it is expected that code points with compatibility equivalents are rare in existing passwords, some passwords that matched when SASLprep was used might no longer work when the rules in this specification are applied.SASLprep mapped the "characters commonly mapped to nothing" from Appendix B.1 of ) to nothing, whereas the PRECIS FreeformClass entirely disallows such characters, which correspond to the code points from the "M" category defined under Section 6.13 of (with the exception of U+1806 MONGOLIAN TODO SOFT HYPHEN, which was commonly mapped to nothing in Unicode 3.2 but at the time of this writing is allowed by Unicode 6.1). In practice, this change will probably have no effect on comparison, but user-oriented software might reject such code points instead of ignoring them during password preparation.The ability to include a wide range of characters in passwords and passphrases can increase the potential for creating a strong password with high entropy. However, in practice, the ability to include such characters ought to be weighed against the possible need to reproduce them on various devices using various input methods.The security considerations described in apply to the "IdentifierClass" and "FreeformClass" base string classes used in this document for simple user names and passwords, respectively.The security considerations described in apply to the use of Unicode characters in user names and passwords.The IANA shall add an entry to the PRECIS Usage Registry for reuse of the PRECIS IdentifierClass in SASL, as follows:Usernames in SASL and Kerberos.IdentifierClass.No.The SASLprep profile of Stringprep.NFC.Map uppercase and titlecase characters to lowercase.None.The "Bidi Rule" defined in RFC 5893 applies.RFC XXXX. [Note to RFC Editor: please change XXXX to the number issued for this specification.]The IANA shall add an entry to the PRECIS Usage Registry for reuse of the PRECIS FreeformClass in SASL, as follows:Passwords in SASL and Kerberos.FreeformClassNo.The SASLprep profile of Stringprep.NFC.None.Map non-ASCII space characters to ASCII space.None.RFC XXXX. [Note to RFC Editor: please change XXXX to the number issued for this specification.]We need to compare the output obtained when applying the new rules with Unicode 3.2 and Unicode 6.1 data to the output obtained when applying the SASLprep rules with Unicode 3.2 data, then make sure that the PRECIS Working Group and KITTEN Working Group are comfortable with any changes to the Unicode characters that are allowed and disallowed. (See also the migration issues described under .)Precis Framework: Handling Internationalized Strings in ProtocolsCiscoViagenieApplication protocols using Unicode code points in protocol strings need to prepare such strings in order to perform comparison operations (e.g., for purposes of authentication or authorization). This document defines a framework enabling application protocols to handle various classes of strings in a way that depends on the properties of Unicode code points and that is agile with respect to versions of Unicode; as a result, this framework provides a more sustainable approach to the handling of internationalized strings than the previous framework, known as Stringprep (RFC 3454). A specification that reuses this framework can either directly use the base string classes or subclass the base string classes as needed. This framework takes an approach similar to the revised internationalized domain names in applications (IDNA) technology (RFC 5890, RFC 5891, RFC 5892, RFC 5893, RFC 5894) and thus adheres to the high-level design goals described in RFC 4690, albeit for application technologies other than the Domain Name System (DNS). This document obsoletes RFC 3454.Key words for use in RFCs to Indicate Requirement LevelsHarvard University1350 Mass. Ave.CambridgeMA 02138- +1 617 495 3864sob@harvard.edu
General
keyword
In many standards track documents several words are used to signify
the requirements in the specification. These words are often
capitalized. This document defines these words as they should be
interpreted in IETF documents. Authors who follow these guidelines
should incorporate this phrase near the beginning of their document:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
RFC 2119.
Note that the force of these words is modified by the requirement
level of the document in which they are used.
UTF-8, a transformation format of ISO 10646ISO/IEC 10646-1 defines a large character set called the Universal Character Set (UCS) which encompasses most of the world's writing systems. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. This memo obsoletes and replaces RFC 2279.Augmented BNF for Syntax Specifications: ABNFInternet technical specifications often need to define a formal syntax. Over the years, a modified version of Backus-Naur Form (BNF), called Augmented BNF (ABNF), has been popular among many Internet specifications. The current specification documents ABNF. It balances compactness and simplicity with reasonable representational power. The differences between standard BNF and ABNF involve naming rules, repetition, alternatives, order-independence, and value ranges. This specification also supplies additional rule definitions and encoding for a core lexical analyzer of the type common to several Internet specifications. [STANDARDS-TRACK]The Unicode Standard, Version 6.1The Unicode ConsortiumMapping characters for PRECIS classesPreparation and comparison of internationalized strings ("PRECIS") Framework [I-D.ietf-precis-framework] is defining several classes of strings for preparation and comparison. In the document, case mapping is defined because many of protocols handle case sensitive or case insensitive string comparison and therefore preparation of string is mandatory. As described in IDNA mapping [RFC5895] and PRECIS problem statement [I-D.ietf-precis-problem-statement], mappings in internationalized strings are not limited to case, but also width, delimiters and/or other specials are taken into consideration. This document considers mappings other than case mapping in PRECIS context.Preparation of Internationalized Strings ("stringprep")SASLprep: Stringprep Profile for User Names and PasswordsThis document describes how to prepare Unicode strings representing user names and passwords for comparison. The document defines the "SASLprep" profile of the "stringprep" algorithm to be used for both user names and passwords. This profile is intended to be used by Simple Authentication and Security Layer (SASL) mechanisms (such as PLAIN, CRAM-MD5, and DIGEST-MD5), as well as other protocols exchanging simple user names and/or passwords. [STANDARDS-TRACK]Simple Authentication and Security Layer (SASL)<p>The Simple Authentication and Security Layer (SASL) is a framework for providing authentication and data security services in connection-oriented protocols via replaceable mechanisms. It provides a structured interface between protocols and mechanisms. The resulting framework allows new protocols to reuse existing mechanisms and allows old protocols to make use of new mechanisms. The framework also provides a protocol for securing subsequent protocol exchanges within a data security layer.</p><p> This document describes how a SASL mechanism is structured, describes how protocols include support for SASL, and defines the protocol for carrying a data security layer over a connection. In addition, this document defines one SASL mechanism, the EXTERNAL mechanism.</p><p> This document obsoletes RFC 2222. [STANDARDS TRACK]</p>The PLAIN Simple Authentication and Security Layer (SASL) MechanismThis document defines a simple clear-text user/password Simple Authentication and Security Layer (SASL) mechanism called the PLAIN mechanism. The PLAIN mechanism is intended to be used, in combination with data confidentiality services provided by a lower layer, in protocols that lack a simple password authentication command. [STANDARDS-TRACK]Salted Challenge Response Authentication Mechanism (SCRAM) SASL and GSS-API MechanismsThe secure authentication mechanism most widely deployed and used by Internet application protocols is the transmission of clear-text passwords over a channel protected by Transport Layer Security (TLS). There are some significant security concerns with that mechanism, which could be addressed by the use of a challenge response authentication mechanism protected by TLS. Unfortunately, the challenge response mechanisms presently on the standards track all fail to meet requirements necessary for widespread deployment, and have had success only in limited use.</t><t> This specification describes a family of Simple Authentication and Security Layer (SASL; RFC 4422) authentication mechanisms called the Salted Challenge Response Authentication Mechanism (SCRAM), which addresses the security concerns and meets the deployability requirements. When used in combination with TLS or an equivalent security layer, a mechanism from this family could improve the status quo for application protocol authentication and provide a suitable choice for a mandatory-to-implement mechanism for future application protocol standards. [STANDARDS-TRACK]Internationalized Domain Names for Applications (IDNA): Definitions and Document FrameworkThis document is one of a collection that, together, describe the protocol and usage context for a revision of Internationalized Domain Names for Applications (IDNA), superseding the earlier version. It describes the document collection and provides definitions and other material that are common to the set. [STANDARDS TRACK]Internationalized Domain Names in Applications (IDNA): ProtocolThis document is the revised protocol definition for Internationalized Domain Names (IDNs). The rationale for changes, the relationship to the older specification, and important terminology are provided in other documents. This document specifies the protocol mechanism, called Internationalized Domain Names in Applications (IDNA), for registering and looking up IDNs in a way that does not require changes to the DNS itself. IDNA is only meant for processing domain names, not free text. [STANDARDS TRACK]Right-to-Left Scripts for Internationalized Domain Names for Applications (IDNA)The use of right-to-left scripts in Internationalized Domain Names (IDNs) has presented several challenges. This memo provides a new Bidi rule for Internationalized Domain Names for Applications (IDNA) labels, based on the encountered problems with some scripts and some shortcomings in the 2003 IDNA Bidi criterion. [STANDARDS-TRACK]Internationalized Domain Names for Applications (IDNA): Background, Explanation, and RationaleSeveral years have passed since the original protocol for Internationalized Domain Names (IDNs) was completed and deployed. During that time, a number of issues have arisen, including the need to update the system to deal with newer versions of Unicode. Some of these issues require tuning of the existing protocols and the tables on which they depend. This document provides an overview of a revised system and provides explanatory material for its components. This document is not an Internet Standards Track specification; it is published for informational purposes.Terminology Used in Internationalization in the IETFThis document provides a list of terms used in the IETF when discussing internationalization. The purpose is to help frame discussions of internationalization in the various areas of the IETF and to help introduce the main concepts to IETF participants. This memo documents an Internet Best Current Practice.Unicode Technical Report #39: Unicode Security MechanismsThe Unicode ConsortiumThe following substantive modifications were made from RFC 4013.A single SASLprep algorithm was replaced by two separate algorithms: one for simple user names and another for passwords.The new preparation algorithms use PRECIS instead of a stringprep profile. The new algorithms work independenctly of Unicode versions.As recommended in the PRECIS framwork, changed the Unicode normalization form from NFKC to NFC.Some Unicode code points that were mapped to nothing in RFC 4013 are simply disallowed by PRECIS.Thanks to Yoshiro YONEYA and Takahiro NEMOTO for implementation feedback. Thanks also to Marc Blanchet, Joe Hildebrand, Alan DeKok, Simon Josefsson, Jonathan Lennox, Matt Miller, Chris Newman, Pete Resnick, Andrew Sullivan, and Nico Williams for their input.This document borrows some text from RFC 4013 and RFC 6120.