idnits 2.17.1 draft-ietf-xmpp-address-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC3920, updated by this document, for RFC5378 checks: 2002-12-09) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 6, 2011) is 4858 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3490 (ref. 'IDNA2003') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (ref. 'NAMEPREP') (Obsoleted by RFC 5891) ** Obsolete normative reference: RFC 3454 (ref. 'STRINGPREP') (Obsoleted by RFC 7564) -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE-SEC' -- Obsolete informational reference (is this intentional?): RFC 3920 (Obsoleted by RFC 6120) Summary: 3 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 XMPP P. Saint-Andre 3 Internet-Draft Cisco 4 Updates: 3920 (if approved) January 6, 2011 5 Intended status: Standards Track 6 Expires: July 10, 2011 8 Extensible Messaging and Presence Protocol (XMPP): Address Format 9 draft-ietf-xmpp-address-09 11 Abstract 13 This document defines the format for addresses used in the Extensible 14 Messaging and Presence Protocol (XMPP), including support for non- 15 ASCII characters. This document updates RFC 3920. 17 Status of this Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on July 10, 2011. 34 Copyright Notice 36 Copyright (c) 2011 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 3 53 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 54 2. Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.1. Fundamentals . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.2. Domainpart . . . . . . . . . . . . . . . . . . . . . . . . 6 57 2.3. Localpart . . . . . . . . . . . . . . . . . . . . . . . . 7 58 2.4. Resourcepart . . . . . . . . . . . . . . . . . . . . . . . 8 59 3. Internationalization Considerations . . . . . . . . . . . . . 9 60 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 61 4.1. Reuse of Stringprep . . . . . . . . . . . . . . . . . . . 9 62 4.2. Reuse of Unicode . . . . . . . . . . . . . . . . . . . . . 9 63 4.3. Address Spoofing . . . . . . . . . . . . . . . . . . . . . 9 64 4.3.1. Address Forging . . . . . . . . . . . . . . . . . . . 10 65 4.3.2. Address Mimicking . . . . . . . . . . . . . . . . . . 10 66 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 67 5.1. Nodeprep Profile of Stringprep . . . . . . . . . . . . . . 13 68 5.2. Resourceprep Profile of Stringprep . . . . . . . . . . . . 13 69 6. Conformance Requirements . . . . . . . . . . . . . . . . . . . 14 70 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 71 7.1. Normative References . . . . . . . . . . . . . . . . . . . 15 72 7.2. Informative References . . . . . . . . . . . . . . . . . . 16 73 Appendix A. Nodeprep . . . . . . . . . . . . . . . . . . . . . . 18 74 A.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18 75 A.2. Character Repertoire . . . . . . . . . . . . . . . . . . . 19 76 A.3. Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 19 77 A.4. Normalization . . . . . . . . . . . . . . . . . . . . . . 19 78 A.5. Prohibited Output . . . . . . . . . . . . . . . . . . . . 19 79 A.6. Bidirectional Characters . . . . . . . . . . . . . . . . . 20 80 A.7. Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 20 81 Appendix B. Resourceprep . . . . . . . . . . . . . . . . . . . . 20 82 B.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 21 83 B.2. Character Repertoire . . . . . . . . . . . . . . . . . . . 21 84 B.3. Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 21 85 B.4. Normalization . . . . . . . . . . . . . . . . . . . . . . 21 86 B.5. Prohibited Output . . . . . . . . . . . . . . . . . . . . 21 87 B.6. Bidirectional Characters . . . . . . . . . . . . . . . . . 22 88 Appendix C. Differences From RFC 3920 . . . . . . . . . . . . . . 22 89 Appendix D. Acknowledgements . . . . . . . . . . . . . . . . . . 22 90 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 23 92 1. Introduction 94 1.1. Overview 96 The Extensible Messaging and Presence Protocol (XMPP) is an 97 application profile of the Extensible Markup Language [XML] for 98 streaming XML data in close to real time between any two or more 99 network-aware entities. The address format for XMPP entities was 100 originally developed in the Jabber open-source community in 1999, 101 first described by [XEP-0029] in 2002, and defined canonically by 102 [RFC3920] in 2004. 104 As specified in RFC 3920, the XMPP address format re-uses the 105 "stringprep" technology for preparation of non-ASCII characters 106 [STRINGPREP], including the Nameprep profile for internationalized 107 domain names as specified in [NAMEPREP] and [IDNA2003] along with two 108 XMPP-specific profiles for the localpart and resourcepart. 110 Since the publication of RFC 3920, IDNA2003 has been superseded by 111 IDNA2008 (see [IDNA-PROTO] and related documents), which is not based 112 on stringprep. Following the lead of the IDNA community, other 113 technology communities that use stringprep have begun discussions 114 about migrating away from stringprep toward more "modern" approaches. 115 The XMPP community is participating in those discussions (mostly 116 within the PRECIS Working Group) in order to find a replacement for 117 the Nodeprep and Resourceprep profiles of stringprep defined in RFC 118 3920. Because all other aspects of revised documentation for XMPP 119 have been incorporated into [XMPP], the XMPP Working Group decided to 120 temporarily split the XMPP address format into a separate document so 121 as not to significantly delay publication of improved documentation 122 for XMPP. It is expected that this document will be obsoleted as 123 soon as work on a new approach to preparation and comparison of 124 internationalized addresses has been completed. 126 Therefore, this specification provides corrected documentation of the 127 XMPP address format using the internationalization technologies 128 available in 2004 (when RFC 3920 was published). Although this 129 document normatively references [IDNA2003] and [NAMEPREP], XMPP 130 software implementations are encouraged to begin migrating to 131 IDNA2008 (see [IDNA-PROTO] and related documents) because the 132 specification that obsoletes this one will re-use IDNA2008 rather 133 than IDNA2003. 135 This document updates RFC 3920. 137 1.2. Terminology 139 Many important terms used in this document are defined in [IDNA2003], 140 [STRINGPREP], [UNICODE], and [XMPP]. 142 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 143 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 144 "OPTIONAL" in this document are to be interpreted as described in RFC 145 2119 [KEYWORDS]. 147 2. Addresses 149 2.1. Fundamentals 151 An XMPP entity is anything that is network-addressable and that can 152 communicate using XMPP. For historical reasons, the native address 153 of an XMPP entity is called a Jabber Identifier or JID. A valid JID 154 is a string of [UNICODE] code points, encoded using [UTF-8], and 155 structured as an ordered sequence of localpart, domainpart, and 156 resourcepart (where the first two parts are demarcated by the '@' 157 character used as a separator, and the last two parts are similarly 158 demarcated by the '/' character). 160 The syntax for a JID is defined as follows using the Augmented 161 Backus-Naur Form as specified in [ABNF]. 163 jid = [ localpart "@" ] domainpart [ "/" resourcepart ] 164 localpart = 1*(nodepoint) 165 ; 166 ; a "nodepoint" is a UTF-8 encoded Unicode code 167 ; point that satisfies the Nodeprep profile of 168 ; stringprep 169 ; 170 domainpart = IP-literal / IPv4address / ifqdn 171 ; 172 ; the "IPv4address" and "IP-literal" rules are 173 ; defined in RFC 3986, and the first-match-wins 174 ; (a.k.a. "greedy") algorithm described in RFC 175 ; 3986 applies to the matching process 176 ; 177 ; note well that re-use of the IP-literal rule 178 ; from RFC 3986 implies that IPv6 addresses are 179 ; enclosed in square brackets (i.e., beginning 180 ; with '[' and ending with ']'), which was not 181 ; the case in RFC 3920 182 ; 183 ifqdn = 1*(namepoint) 184 ; 185 ; a "namepoint" is a UTF-8 encoded Unicode 186 ; code point that satisfies the Nameprep 187 ; profile of stringprep 188 ; 189 resourcepart = 1*(resourcepoint) 190 ; 191 ; a "resourcepoint" is a UTF-8 encoded Unicode 192 ; code point that satisfies the Resourceprep 193 ; profile of stringprep 194 ; 196 All JIDs are based on the foregoing structure. 198 Each allowable portion of a JID (localpart, domainpart, and 199 resourcepart) MUST NOT be zero bytes in length and MUST NOT be more 200 than 1023 bytes in length, resulting in a maximum total size 201 (including the '@' and '/' separators) of 3071 bytes. 203 For the purpose of communication over an XMPP network (e.g., in the 204 'to' or 'from' address of an XMPP stanza), an entity's address MUST 205 be represented as a JID, not as a Uniform Resource Identifier [URI] 206 or Internationalized Resource Identifier [IRI]. An XMPP IRI 207 [XMPP-URI] is in essence a JID prepended with 'xmpp:'; however, the 208 native addressing format used in XMPP is that of a mere JID without a 209 URI scheme. [XMPP-URI] is provided only for identification and 210 interaction outside the context of XMPP itself, for example when 211 linking to a JID from a web page. See [XMPP-URI] for a description 212 of the process for securely extracting a JID from an XMPP URI or IRI. 214 Implementation Note: When dividing a JID into its component parts, 215 an implementation needs to match the separator characters '@' and 216 '/' before applying any transformation algorithms, which might 217 decompose certain Unicode code points to the separator characters 218 (e.g., U+FE6B SMALL COMMERCIAL AT might decompose into U+0040 219 COMMERCIAL AT). 221 2.2. Domainpart 223 The domainpart of a JID is that portion after the '@' character (if 224 any) and before the '/' character (if any); it is the primary 225 identifier and is the only REQUIRED element of a JID (a mere 226 domainpart is a valid JID). Typically a domainpart identifies the 227 "home" server to which clients connect for XML routing and data 228 management functionality. However, it is not necessary for an XMPP 229 domainpart to identify an entity that provides core XMPP server 230 functionality (e.g., a domainpart can identify an entity such as a 231 multi-user chat service, a publish-subscribe service, or a user 232 directory). 234 The domainpart for every XMPP service MUST be a fully qualified 235 domain name ("FQDN"; see [DNS]), IPv4 address, IPv6 address, or 236 unqualifed hostname (i.e., a text label that is resolvable on a local 237 network). 239 Interoperability Note: Domainparts that are IP addresses might not 240 be accepted by other services for the sake of server-to-server 241 communication, and domainparts that are unqualified hostnames 242 cannot be used on public networks because they are resolvable only 243 on a local network. 245 If the domainpart includes a final character considered to be a label 246 separator (dot) by [IDNA2003] or [DNS], this character MUST be 247 stripped from the domainpart before the JID of which it is a part is 248 used for the purpose of routing an XML stanza, comparing against 249 another JID, or constructing an [XMPP-URI]; in particular, the 250 character MUST be stripped before any other canonicalization steps 251 are taken, such as application of the [NAMEPREP] profile of 252 [STRINGPREP] or completion of the ToASCII operation as described in 253 [IDNA2003]. 255 A domainpart consisting of a fully qualified domain name MUST be an 256 "internationalized domain name" as defined in [IDNA2003], that is, it 257 MUST be "a domain name in which every label is an internationalized 258 label" and MUST follow the rules for construction of 259 internationalized domain names specified in [IDNA2003]. When 260 preparing a text label (consisting of a sequence of UTF-8 encoded 261 Unicode code points) for representation as an internationalized label 262 in the process of constructing an XMPP domainpart or comparing two 263 XMPP domainparts, an application MUST ensure that for each text label 264 it is possible to apply without failing the ToASCII operation 265 specified in [IDNA2003] with the UseSTD3ASCIIRules flag set (thus 266 forbidding ASCII code points other than letters, digits, and 267 hyphens). If the ToASCII operation can be applied without failing, 268 then the label is an internationalized label. (Note: The ToASCII 269 operation includes application of the [NAMEPREP] profile of 270 [STRINGPREP] and encoding using the algorithm specified in 271 [PUNYCODE]; for details, see [IDNA2003].) Although XMPP applications 272 do not communicate the output of the ToASCII operation (called an 273 "ACE label") over the wire, it MUST be possible to apply that 274 operation without failing to each internationalized label. If an 275 XMPP application receives as input an ACE label, it SHOULD convert 276 that ACE label to an internationalized label using the ToUnicode 277 operation (see [IDNA2003]) before including the label in an XMPP 278 domainpart that will be communicated over the wire on an XMPP network 279 (however, instead of converting the label, there are legitimate 280 reasons why an application might instead refuse the input altogether 281 and return an error to the entity that provided the offending data). 283 A domainpart MUST NOT be zero bytes in length and MUST NOT be more 284 than 1023 bytes in length. This rule is to be enforced after any 285 mapping or normalization resulting from application of the Nameprep 286 profile of stringprep (e.g., in Nameprep some characters can be 287 mapped to nothing, which might result in a string of zero length). 288 Naturally, the length limits of [DNS] apply, and nothing in this 289 document is to be interpreted as overriding those more fundamental 290 limits. 292 In the terms of IDNA2008 [IDNA-DEFS], the domainpart of a JID is a 293 "domain name slot". 295 2.3. Localpart 297 The localpart of a JID is an optional identifier placed before the 298 domainpart and separated from the latter by the '@' character. 299 Typically a localpart uniquely identifies the entity requesting and 300 using network access provided by a server (i.e., a local account), 301 although it can also represent other kinds of entities (e.g., a chat 302 room associated with a multi-user chat service). The entity 303 represented by an XMPP localpart is addressed within the context of a 304 specific domain (i.e., ). 306 A localpart MUST be formatted such that the Nodeprep profile of 308 [STRINGPREP] can be applied without failing (see Appendix A). Before 309 comparing two localparts, an application MUST first ensure that the 310 Nodeprep profile has been applied to each identifier (the profile 311 need not be applied each time a comparison is made, as long as it has 312 been applied before comparison). 314 A localpart MUST NOT be zero bytes in length and MUST NOT be more 315 than 1023 bytes in length. This rule is to be enforced after any 316 mapping or normalization resulting from application of the Nodeprep 317 profile of stringprep (e.g., in Nodeprep some characters can be 318 mapped to nothing, which might result in a string of zero length). 320 2.4. Resourcepart 322 The resourcepart of a JID is an optional identifier placed after the 323 domainpart and separated from the latter by the '/' character. A 324 resourcepart can modify either a address or a 325 mere address. Typically a resourcepart uniquely 326 identifies a specific connection (e.g., a device or location) or 327 object (e.g., an occupant in a multi-user chat room) belonging to the 328 entity associated with an XMPP localpart at a domain (i.e., 329 ). 331 A resourcepart MUST be formatted such that the Resourceprep profile 332 of [STRINGPREP] can be applied without failing (see Appendix B). 333 Before comparing two resourceparts, an application MUST first ensure 334 that the Resourceprep profile has been applied to each identifier 335 (the profile need not be applied each time a comparison is made, as 336 long as it has been applied before comparison). 338 A resourcepart MUST NOT be zero bytes in length and MUST NOT be more 339 than 1023 bytes in length. This rule is to be enforced after any 340 mapping or normalization resulting from application of the 341 Resourceprep profile of stringprep (e.g., in Resourceprep some 342 characters can be mapped to nothing, which might result in a string 343 of zero length). 345 Informational Note: For historical reasons, the term "resource 346 identifier" is often used in XMPP to refer to the optional portion 347 of an XMPP address that follows the domainpart and the "/" 348 separator character; to help prevent confusion between an XMPP 349 "resource identifier" and the meanings of "resource" and 350 "identifier" provided in Section 1.1 of [URI], this specification 351 uses the term "resourcepart" instead of "resource identifier" (as 352 in RFC 3920). 354 XMPP entities SHOULD consider resourceparts to be opaque strings and 355 SHOULD NOT impute meaning to any given resourcepart. In particular: 357 o Use of the '/' character as a separator between the domainpart and 358 the resourcepart does not imply that XMPP addresses are 359 hierarchical in the way that, say, HTTP addresses are 360 hierarchical; thus for example an XMPP address of the form 361 does not identify a resource "bar" 362 that exists below a resource "foo" in a hierarchy of resources 363 associated with the entity "localpart@domain". 365 o The '@' character is allowed in the resourcepart, and is often 366 used in the "nick" shown in XMPP chatrooms. For example, the JID 367 describes an entity who is an 368 occupant of the room with an (asserted) 369 nick of . However, chatroom services do not 370 necessarily check such an asserted nick against the occupant's 371 real JID. 373 3. Internationalization Considerations 375 XMPP servers MUST, and XMPP clients SHOULD, support [IDNA2003] for 376 domainparts (including the [NAMEPREP] profile of [STRINGPREP]), the 377 Nodeprep (Appendix A) profile of [STRINGPREP] for localparts, and the 378 Resourceprep (Appendix B) profile of [STRINGPREP] for resourceparts; 379 this enables XMPP addresses to include a wide variety of characters 380 outside the US-ASCII range. Rules for enforcement of the XMPP 381 address format are provided in [XMPP]. 383 4. Security Considerations 385 4.1. Reuse of Stringprep 387 The security considerations described in [STRINGPREP] apply to the 388 Nodeprep (Appendix A) and Resourceprep (Appendix B) profiles defined 389 in this document for XMPP localparts and resourceparts. The security 390 considerations described in [STRINGPREP] and [NAMEPREP] apply to the 391 Nameprep profile that is re-used here for XMPP domainparts. 393 4.2. Reuse of Unicode 395 The security considerations described in [UNICODE-SEC] apply to the 396 use of Unicode characters in XMPP addresses. 398 4.3. Address Spoofing 400 There are two forms of address spoofing: forging and mimicking. 402 4.3.1. Address Forging 404 In the context of XMPP technologies, address forging occurs when an 405 entity is able to generate an XML stanza whose 'from' address does 406 not correspond to the account credentials with which the entity 407 authenticated onto the network (or an authorization identity provided 408 during negotiation of SASL authentication [SASL] as described in 409 [XMPP]). For example, address forging occurs if an entity that 410 authenticated as "juliet@im.example.com" is able to send XML stanzas 411 from "nurse@im.example.com" or "romeo@example.net". 413 Address forging is difficult in XMPP systems, given the requirement 414 for sending servers to stamp 'from' addresses and for receiving 415 servers to verify sending domains via server-to-server authentication 416 (see [XMPP]). However, address forging is possible if: 418 o A poorly implemented server ignores the requirement for stamping 419 the 'from' address. This would enable any entity that 420 authenticated with the server to send stanzas from any 421 localpart@domainpart as long as the domainpart matches the sending 422 domain of the server. 424 o An actively malicious server generates stanzas on behalf of any 425 registered account. 427 Therefore, an entity outside the security perimeter of a particular 428 server cannot reliably distinguish between JIDs of the form 429 at that server and thus can authenticate only 430 the domainpart of such JIDs with any level of assurance. This 431 specification does not define methods for discovering or 432 counteracting such poorly implemented or rogue servers. However, the 433 end-to-end authentication or signing of XMPP stanzas could help to 434 mitigate this risk, since it would require the rogue server to 435 generate false credentials in addition to modifying 'from' addresses. 437 Furthermore, it is possible for an attacker to forge JIDs at other 438 domains by means of a DNS poisoning attack if DNS security extensions 439 [DNSSEC] are not used. 441 4.3.2. Address Mimicking 443 Address mimicking occurs when an entity provides legitimate 444 authentication credentials for and sends XML stanzas from an account 445 whose JID appears to a human user to be the same as another JID. For 446 example, in some XMPP clients the address "ju1iet@example.org" 447 (spelled with the number one as the third character of the localpart) 448 might appear to be the same as "juliet@example.org (spelled with the 449 lower-case version of the letter "L"), especially on casual visual 450 inspection; this phenomenon is sometimes called "typejacking". A 451 more sophisticated example of address mimicking might involve the use 452 of characters from outside the familiar Latin extended-A block of 453 Unicode code points, such as the characters U+13DA U+13A2 U+13B5 454 U+13AC U+13A2 U+13AC U+13D2 from the Cherokee block instead of the 455 similar-looking US-ASCII characters "STPETER". 457 In some examples of address mimicking, it is unlikely that the 458 average user could tell the difference between the real JID and the 459 fake JID. (Indeed, there is no programmatic way to distinguish with 460 full certainty which is the fake JID and which is the real JID; in 461 some communication contexts, the JID formed of Cherokee characters 462 might be the real JID and the JID formed of US-ASCII characters might 463 thus appear to be the fake JID.) Because JIDs can contain almost any 464 properly-encoded Unicode code point, it can be relatively easy to 465 mimic some JIDs in XMPP systems. The possibility of address 466 mimicking introduces security vulnerabilities of the kind that have 467 also plagued the World Wide Web, specifically the phenomenon known as 468 phishing. 470 These problems arise because Unicode and ISO/IEC 10646 repertoires 471 have many characters that look similar (so-called "confusable 472 characters" or "confusables"). In many cases, XMPP users might 473 perform visual matching, such as when comparing the JIDs of 474 communication partners. Because it is impossible to map similar- 475 looking characters without a great deal of context (such as knowing 476 the fonts used), stringprep and stringprep-based technologies such as 477 Nameprep, Nodeprep, and Resourceprep do nothing to map similar- 478 looking characters together, nor do they prohibit some characters 479 because they look like others. As a result, XMPP localparts and 480 resourceparts could contain confusable characters, producing JIDs 481 that appear to mimic other JIDs and thus leading to security 482 vulnerabilities such as the following: 484 o A localpart can be employed as one part of an entity's address in 485 XMPP. One common usage is as the username of an instant messaging 486 user; another is as the name of a multi-user chat room; and many 487 other kinds of entities could use localparts as part of their 488 addresses. The security of such services could be compromised 489 based on different interpretations of the internationalized 490 localpart; for example, a user entering a single internationalized 491 localpart could access another user's account information, or a 492 user could gain access to a hidden or otherwise restricted chat 493 room or service. 495 o A resourcepart can be employed as one part of an entity's address 496 in XMPP. One common usage is as the name for an instant messaging 497 user's connected resource; another is as the nickname of a user in 498 a multi-user chat room; and many other kinds of entities could use 499 resourceparts as part of their addresses. The security of such 500 services could be compromised based on different interpretations 501 of the internationalized resourcepart; for example, two or more 502 confusable resources could be bound at the same time to the same 503 account (resulting in inconsistent authorization decisions in an 504 XMPP application that uses full JIDs), or a user could send a 505 message to someone other than the intended recipient in a multi- 506 user chat room. 508 Despite the fact that some specific suggestions about identification 509 and handling of confusable characters appear in the Unicode Security 510 Considerations [UNICODE-SEC], it is also true (as noted in 511 [IDNA-DEFS]) that "there are no comprehensive technical solutions to 512 the problems of confusable characters". Mimicked JIDs that involve 513 characters from only one script, or from the script typically 514 employed by a particular user or community of language users, are not 515 easy to combat (e.g., the simple typejacking attack previously 516 described, which relies on a surface similarity between the 517 characters "1" and "l" in some presentations). However, mimicked 518 addresses that involve characters from more than one script, or from 519 a script not typically employed by a particular user or community of 520 language users, can be mitigated somewhat through the application of 521 appropriate registration policies at XMPP services and presentation 522 policies in XMPP client software. Therefore the following policies 523 are encouraged: 525 1. Because an XMPP service that allows registration of XMPP user 526 accounts (localparts) plays a role similar to that of a registry 527 for DNS domain names, such a service SHOULD establish a policy 528 about the scripts or blocks of characters it will allow in 529 localparts at the service. Such a policy is likely to be 530 informed by the languages and scripts that are used to write 531 registered account names; in particular, to reduce confusion, the 532 service MAY forbid registration of XMPP localparts that contain 533 characters from more than one script and to restrict 534 registrations to characters drawn from a very small number of 535 scripts (e.g., scripts that are well-understood by the 536 administrators of the service). Such policies are also 537 appropriate for XMPP services that allow temporary or permanent 538 registration of XMPP resourceparts, e.g., during resource binding 539 [XMPP] or upon joining an XMPP-based chat room [XEP-0045]. For 540 related considerations in the context of domain name 541 registration, refer to Section 4.3 of [IDNA-PROTO] and Section 542 3.2 of [IDNA-RATIONALE]. Note well that methods for enforcing 543 such restrictions are out of scope for this document. 545 2. Because every human user of an XMPP client presumably has a 546 preferred language (or, in some cases, a small set of preferred 547 languages), an XMPP client SHOULD gather that information either 548 explicitly from the user or implicitly via the operating system 549 of the user's device. Furthermore, because most languages are 550 typically represented by a single script (or a small set of 551 scripts) and most scripts are typically contained in one or more 552 blocks of characters, an XMPP client SHOULD warn the user when 553 presenting a JID that mixes characters from more than one script 554 or block, or that uses characters outside the normal range of the 555 user's preferred language(s). This recommendation is not 556 intended to discourage communication across different communities 557 of language users; instead, it recognizes the existence of such 558 communities and encourages due caution when presenting unfamiliar 559 scripts or characters to human users. 561 5. IANA Considerations 563 The following sections update the registrations provided in 564 [RFC3920]. 566 5.1. Nodeprep Profile of Stringprep 568 The Nodeprep profile of stringprep is defined under Nodeprep 569 (Appendix A). The IANA has registered Nodeprep in the stringprep 570 profile registry. 572 Name of this profile: 574 Nodeprep 576 RFC in which the profile is defined: 578 RFC XXXX 580 Indicator whether or not this is the newest version of the profile: 582 This is the first version of Nodeprep 584 5.2. Resourceprep Profile of Stringprep 586 The Resourceprep profile of stringprep is defined under Resourceprep 587 (Appendix B). The IANA has registered Resourceprep in the stringprep 588 profile registry. 590 Name of this profile: 592 Resourceprep 594 RFC in which the profile is defined: 596 RFC XXXX 598 Indicator whether or not this is the newest version of the profile: 600 This is the first version of Resourceprep 602 6. Conformance Requirements 604 This section describes a protocol feature set that summarizes the 605 conformance requirements of this specification. This feature set is 606 appropriate for use in software certification, interoperability 607 testing, and implementation reports. For each feature, this section 608 provides the following information: 610 o A human-readable name 612 o An informational description 614 o A reference to the particular section of this document that 615 normatively defines the feature 617 o Whether the feature applies to the Client role, the Server role, 618 or both (where "N/A" signifies that the feature is not applicable 619 to the specified role) 621 o Whether the feature MUST or SHOULD be implemented, where the 622 capitalized terms are to be understood as described in [KEYWORDS] 624 The feature set specified here attempts to adhere to the concepts and 625 formats proposed by Larry Masinter within the IETF's NEWTRK Working 626 Group in 2005, as captured in [INTEROP]. Although this feature set 627 is more detailed than called for by [REPORTS], it provides a suitable 628 basis for the generation of implementation reports to be submitted in 629 support of advancing this specification from Proposed Standard to 630 Draft Standard in accordance with [PROCESS]. 632 Feature: address-domain-length 633 Description: Ensure that the domainpart of an XMPP address is at 634 least one byte in length and at most 1023 bytes in length, and 635 conforms to the underlying length limits of the DNS. 637 Section: Section 2.2 638 Roles: Both MUST. 640 Feature: address-domain-prep 641 Description: Ensure that the domainpart of an XMPP address conforms 642 to the Nameprep profile of Stringprep. 643 Section: Section 2.2 644 Roles: Client SHOULD, Server MUST. 646 Feature: address-localpart-length 647 Description: Ensure that the localpart of an XMPP address is at 648 least one byte in length and at most 1023 bytes in length. 649 Section: Section 2.3 650 Roles: Both MUST. 652 Feature: address-localpart-prep 653 Description: Ensure that the localpart of an XMPP address conforms 654 to the Nodeprep profile of Stringprep. 655 Section: Section 2.3 656 Roles: Client SHOULD, Server MUST. 658 Feature: address-resource-length 659 Description: Ensure that the resourcepart of an XMPP address is at 660 least one byte in length and at most 1023 bytes in length. 661 Section: Section 2.4 662 Roles: Both MUST. 664 Feature: address-resource-prep 665 Description: Ensure that the resourcepart of an XMPP address 666 conforms to the Resourceprep profile of Stringprep. 667 Section: Section 2.2 668 Roles: Client SHOULD, Server MUST. 670 7. References 672 7.1. Normative References 674 [ABNF] Crocker, D. and P. Overell, "Augmented BNF for Syntax 675 Specifications: ABNF", STD 68, RFC 5234, January 2008. 677 [DNS] Mockapetris, P., "Domain names - implementation and 678 specification", STD 13, RFC 1035, November 1987. 680 [IDNA2003] 681 Faltstrom, P., Hoffman, P., and A. Costello, 682 "Internationalizing Domain Names in Applications (IDNA)", 683 RFC 3490, March 2003. 685 See Section 1 for an explanation of why the normative 686 reference to an obsoleted specification is needed. 688 [KEYWORDS] 689 Bradner, S., "Key words for use in RFCs to Indicate 690 Requirement Levels", BCP 14, RFC 2119, March 1997. 692 [NAMEPREP] 693 Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 694 Profile for Internationalized Domain Names (IDN)", 695 RFC 3491, March 2003. 697 See Section 1 for an explanation of why the normative 698 reference to an obsoleted specification is needed. 700 [STRINGPREP] 701 Hoffman, P. and M. Blanchet, "Preparation of 702 Internationalized Strings ("stringprep")", RFC 3454, 703 December 2002. 705 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 706 3.2.0", 2000. 708 The Unicode Standard, Version 3.2.0 is defined by The 709 Unicode Standard, Version 3.0 (Reading, MA, Addison- 710 Wesley, 2000. ISBN 0-201-61633-5), as amended by the 711 Unicode Standard Annex #27: Unicode 3.1 712 (http://www.unicode.org/reports/tr27/) and by the Unicode 713 Standard Annex #28: Unicode 3.2 714 (http://www.unicode.org/reports/tr28/). 716 [UNICODE-SEC] 717 The Unicode Consortium, "Unicode Technical Report #36: 718 Unicode Security Considerations", 2008. 720 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 721 10646", STD 63, RFC 3629, November 2003. 723 [XMPP] Saint-Andre, P., "Extensible Messaging and Presence 724 Protocol (XMPP): Core", draft-ietf-xmpp-3920bis-22 (work 725 in progress), December 2010. 727 7.2. Informative References 729 [DNSSEC] Arends, R., Austein, R., Larson, M., Massey, D., and S. 730 Rose, "DNS Security Introduction and Requirements", 731 RFC 4033, March 2005. 733 [IDNA-DEFS] 734 Klensin, J., "Internationalized Domain Names for 735 Applications (IDNA): Definitions and Document Framework", 736 RFC 5890, August 2010. 738 [IDNA-PROTO] 739 Klensin, J., "Internationalized Domain Names in 740 Applications (IDNA): Protocol", RFC 5891, August 2010. 742 [IDNA-RATIONALE] 743 Klensin, J., "Internationalized Domain Names for 744 Applications (IDNA): Background, Explanation, and 745 Rationale", RFC 5894, August 2010. 747 [INTEROP] Masinter, L., "Formalizing IETF Interoperability 748 Reporting", draft-ietf-newtrk-interop-reports-00 (work in 749 progress), October 2005. 751 [IRI] Duerst, M. and M. Suignard, "Internationalized Resource 752 Identifiers (IRIs)", RFC 3987, January 2005. 754 [PROCESS] Bradner, S., "The Internet Standards Process -- Revision 755 3", BCP 9, RFC 2026, October 1996. 757 [PUNYCODE] 758 Costello, A., "Punycode: A Bootstring encoding of Unicode 759 for Internationalized Domain Names in Applications 760 (IDNA)", RFC 3492, March 2003. 762 [REPORTS] Dusseault, L. and R. Sparks, "Guidance on Interoperation 763 and Implementation Reports for Advancement to Draft 764 Standard", BCP 9, RFC 5657, September 2009. 766 [RFC3920] Saint-Andre, P., Ed., "Extensible Messaging and Presence 767 Protocol (XMPP): Core", RFC 3920, October 2004. 769 [SASL] Melnikov, A. and K. Zeilenga, "Simple Authentication and 770 Security Layer (SASL)", RFC 4422, June 2006. 772 [URI] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 773 Resource Identifier (URI): Generic Syntax", STD 66, 774 RFC 3986, January 2005. 776 [XEP-0029] 777 Kaes, C., "Definition of Jabber Identifiers (JIDs)", XSF 778 XEP 0029, October 2003. 780 [XEP-0030] 781 Hildebrand, J., Millard, P., Eatmon, R., and P. Saint- 782 Andre, "Service Discovery", XSF XEP 0030, June 2008. 784 [XEP-0045] 785 Saint-Andre, P., "Multi-User Chat", XSF XEP 0045, 786 July 2008. 788 [XEP-0060] 789 Millard, P., Saint-Andre, P., and R. Meijer, "Publish- 790 Subscribe", XSF XEP 0060, September 2008. 792 [XEP-0165] 793 Saint-Andre, P., "Best Practices to Discourage JID 794 Mimicking", XSF XEP 0045, December 2007. 796 [XML] Paoli, J., Maler, E., Sperberg-McQueen, C., Yergeau, F., 797 and T. Bray, "Extensible Markup Language (XML) 1.0 (Fourth 798 Edition)", World Wide Web Consortium Recommendation REC- 799 xml-20060816, August 2006, 800 . 802 [XMPP-URI] 803 Saint-Andre, P., "Internationalized Resource Identifiers 804 (IRIs) and Uniform Resource Identifiers (URIs) for the 805 Extensible Messaging and Presence Protocol (XMPP)", 806 RFC 5122, February 2008. 808 Appendix A. Nodeprep 810 A.1. Introduction 812 This appendix defines the "Nodeprep" profile of stringprep. As such, 813 it specifies processing rules that will enable users to enter 814 internationalized localparts in the Extensible Messaging and Presence 815 Protocol (XMPP) and have the highest chance of getting the content of 816 the strings correct. (An XMPP localpart is the optional portion of 817 an XMPP address that precedes an XMPP domainpart and the '@' 818 separator; it is often but not exclusively associated with an instant 819 messaging username.) These processing rules are intended only for 820 XMPP localparts and are not intended for arbitrary text or any other 821 aspect of an XMPP address. 823 This profile defines the following, as required by [STRINGPREP]: 825 o The intended applicability of the profile: internationalized 826 localparts within XMPP 828 o The character repertoire that is the input and output to 829 stringprep: Unicode 3.2, specified in Section 2 of this Appendix 830 o The mappings used: specified in Section 3 831 o The Unicode normalization used: specified in Section 4 832 o The characters that are prohibited as output: specified in Section 833 5 834 o Bidirectional character handling: specified in Section 6 836 A.2. Character Repertoire 838 This profile uses Unicode 3.2 with the list of unassigned code points 839 being Table A.1, both defined in Appendix A of [STRINGPREP]. 841 A.3. Mapping 843 This profile specifies mapping using the following tables from 844 [STRINGPREP]: 846 Table B.1 847 Table B.2 849 A.4. Normalization 851 This profile specifies the use of Unicode normalization form KC, as 852 described in [STRINGPREP]. 854 A.5. Prohibited Output 856 This profile specifies the prohibition of using the following tables 857 from [STRINGPREP]. 859 Table C.1.1 860 Table C.1.2 861 Table C.2.1 862 Table C.2.2 863 Table C.3 864 Table C.4 865 Table C.5 866 Table C.6 867 Table C.7 868 Table C.8 869 Table C.9 871 In addition, the following additional Unicode characters are also 872 prohibited: 874 U+0022 (QUOTATION MARK), i.e., " 875 U+0026 (AMPERSAND), i.e., & 876 U+0027 (APOSTROPHE), i.e., ' 877 U+002F (SOLIDUS), i.e., / 878 U+003A (COLON), i.e., : 879 U+003C (LESS-THAN SIGN), i.e., < 880 U+003E (GREATER-THAN SIGN), i.e., > 881 U+0040 (COMMERCIAL AT), i.e., @ 883 A.6. Bidirectional Characters 885 This profile specifies checking bidirectional strings, as described 886 in Section 6 of [STRINGPREP]. 888 A.7. Notes 890 Because the additional characters prohibited by Nodeprep are 891 prohibited after normalization, an implementation MUST NOT enable a 892 human user to input any Unicode code point whose decomposition 893 includes those characters; such code points include but are not 894 necessarily limited to the following (refer to [UNICODE] for complete 895 information). 897 o U+2100 (ACCOUNT OF) 898 o U+2101 (ADDRESSED TO THE SUBJECT) 899 o U+2105 (CARE OF) 900 o U+2106 (CADA UNA) 901 o U+226E (NOT LESS-THAN) 902 o U+226F (NOT GREATER-THAN) 903 o U+2A74 (DOUBLE COLON EQUAL) 904 o U+FE13 (SMALL COLON) 905 o U+FE60 (SMALL AMPERSAND) 906 o U+FE64 (SMALL LESS-THAN SIGN) 907 o U+FE65 (SMALL GREATER-THAN SIGN) 908 o U+FE6B (SMALL COMMERCIAL AT) 909 o U+FF02 (FULLWIDTH QUOTATION MARK) 910 o U+FF06 (FULLWIDTH AMPERSAND) 911 o U+FF07 (FULLWIDTH APOSTROPHE) 912 o U+FF0F (FULLWIDTH SOLIDUS) 913 o U+FF1A (FULLWIDTH COLON) 914 o U+FF1C (FULLWIDTH LESS-THAN SIGN) 915 o U+FF1E (FULLWIDTH GREATER-THAN SIGN) 916 o U+FF20 (FULLWIDTH COMMERCIAL AT) 918 Appendix B. Resourceprep 919 B.1. Introduction 921 This appendix defines the "Resourceprep" profile of stringprep. As 922 such, it specifies processing rules that will enable users to enter 923 internationalized resourceparts in the Extensible Messaging and 924 Presence Protocol (XMPP) and have the highest chance of getting the 925 content of the strings correct. (An XMPP resourcepart is the 926 optional portion of an XMPP address that follows an XMPP domainpart 927 and the '/' separator.) These processing rules are intended only for 928 XMPP resourceparts and are not intended for arbitrary text or any 929 other aspect of an XMPP address. 931 This profile defines the following, as required by [STRINGPREP]: 933 o The intended applicability of the profile: internationalized 934 resourceparts within XMPP 935 o The character repertoire that is the input and output to 936 stringprep: Unicode 3.2, specified in Section 2 of this Appendix 937 o The mappings used: specified in Section 3 938 o The Unicode normalization used: specified in Section 4 939 o The characters that are prohibited as output: specified in Section 940 5 941 o Bidirectional character handling: specified in Section 6 943 B.2. Character Repertoire 945 This profile uses Unicode 3.2 with the list of unassigned code points 946 being Table A.1, both defined in Appendix A of [STRINGPREP]. 948 B.3. Mapping 950 This profile specifies mapping using the following tables from 951 [STRINGPREP]: 953 Table B.1 955 B.4. Normalization 957 This profile specifies the use of Unicode normalization form KC, as 958 described in [STRINGPREP]. 960 B.5. Prohibited Output 962 This profile specifies the prohibition of using the following tables 963 from [STRINGPREP]. 965 Table C.1.2 966 Table C.2.1 967 Table C.2.2 968 Table C.3 969 Table C.4 970 Table C.5 971 Table C.6 972 Table C.7 973 Table C.8 974 Table C.9 976 B.6. Bidirectional Characters 978 This profile specifies checking bidirectional strings, as described 979 in Section 6 of [STRINGPREP]. 981 Appendix C. Differences From RFC 3920 983 Based on consensus derived from implementation and deployment 984 experience as well as formal interoperability testing, the following 985 substantive modifications were made from RFC 3920. 987 o Corrected the ABNF syntax to ensure consistency with [URI] and 988 [IRI], including consistency with RFC 3986 and RFC 5952 with 989 regard to IPv6 addresses (e.g., enclosing the IPv6 address in 990 square brackets '[' and ']'). 991 o Corrected the ABNF syntax to prevent zero-length localparts, 992 domainparts, and resourceparts (and also noted that the underlying 993 length limits from the DNS apply to domainparts). 994 o To avoid confusion with the term "node" as used in [XEP-0030] and 995 [XEP-0060], changed the term "node identifier" to "localpart" (but 996 retained the name "Nodeprep" for backward compatibility). 997 o To avoid confusion with the terms "resource" and "identifier" as 998 used in [URI], changed the term "resource identifier" to 999 "resourcepart". 1000 o Corrected the nameprep processing rules to require use of the 1001 UseSTD3ASCIIRules flag. 1003 Appendix D. Acknowledgements 1005 Thanks to Ben Campbell, Waqas Hussain, Jehan Pages and Florian Zeitz 1006 for their feedback. Thanks also to Richard Barnes and Elwyn Davies 1007 for their reviews on behalf of the Security Directorate and the 1008 General Area Review Team, respectively. 1010 The Working Group chairs were Ben Campbell and Joe Hildebrand. The 1011 responsible Area Director was Gonzalo Camarillo. 1013 Some text in this document was borrowed or adapted from [IDNA-DEFS], 1014 [IDNA-PROTO], [IDNA-RATIONALE], and [XEP-0165]. 1016 Author's Address 1018 Peter Saint-Andre 1019 Cisco 1020 1899 Wyknoop Street, Suite 600 1021 Denver, CO 80202 1022 USA 1024 Phone: +1-303-308-3282 1025 Email: psaintan@cisco.com