idnits 2.17.1 draft-ietf-xmpp-address-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (November 17, 2010) is 4908 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3490 (ref. 'IDNA2003') (Obsoleted by RFC 5890, RFC 5891) ** Obsolete normative reference: RFC 3491 (ref. 'NAMEPREP') (Obsoleted by RFC 5891) ** Obsolete normative reference: RFC 3454 (ref. 'STRINGPREP') (Obsoleted by RFC 7564) -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE' -- Possible downref: Non-RFC (?) normative reference: ref. 'UNICODE-SEC' == Outdated reference: A later version (-22) exists of draft-ietf-xmpp-3920bis-19 -- Obsolete informational reference (is this intentional?): RFC 3920 (Obsoleted by RFC 6120) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 XMPP P. Saint-Andre 3 Internet-Draft Cisco 4 Intended status: Standards Track November 17, 2010 5 Expires: May 21, 2011 7 Extensible Messaging and Presence Protocol (XMPP): Address Format 8 draft-ietf-xmpp-address-07 10 Abstract 12 This document defines the format for addresses used in the Extensible 13 Messaging and Presence Protocol (XMPP), including support for non- 14 ASCII characters. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on May 21, 2011. 33 Copyright Notice 35 Copyright (c) 2010 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 1.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 53 1.3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . 4 54 2. Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2.1. Fundamentals . . . . . . . . . . . . . . . . . . . . . . . 4 56 2.2. Domainpart . . . . . . . . . . . . . . . . . . . . . . . . 6 57 2.3. Localpart . . . . . . . . . . . . . . . . . . . . . . . . 7 58 2.4. Resourcepart . . . . . . . . . . . . . . . . . . . . . . . 8 59 3. Internationalization Considerations . . . . . . . . . . . . . 9 60 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 61 4.1. Reuse of Stringprep . . . . . . . . . . . . . . . . . . . 9 62 4.2. Reuse of Unicode . . . . . . . . . . . . . . . . . . . . . 9 63 4.3. Address Spoofing . . . . . . . . . . . . . . . . . . . . . 9 64 4.3.1. Address Forging . . . . . . . . . . . . . . . . . . . 9 65 4.3.2. Address Mimicking . . . . . . . . . . . . . . . . . . 10 66 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 67 5.1. Nodeprep Profile of Stringprep . . . . . . . . . . . . . . 13 68 5.2. Resourceprep Profile of Stringprep . . . . . . . . . . . . 13 69 6. Conformance Requirements . . . . . . . . . . . . . . . . . . . 13 70 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 71 7.1. Normative References . . . . . . . . . . . . . . . . . . . 15 72 7.2. Informative References . . . . . . . . . . . . . . . . . . 16 73 Appendix A. Nodeprep . . . . . . . . . . . . . . . . . . . . . . 18 74 A.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 18 75 A.2. Character Repertoire . . . . . . . . . . . . . . . . . . . 18 76 A.3. Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 18 77 A.4. Normalization . . . . . . . . . . . . . . . . . . . . . . 19 78 A.5. Prohibited Output . . . . . . . . . . . . . . . . . . . . 19 79 A.6. Bidirectional Characters . . . . . . . . . . . . . . . . . 19 80 A.7. Notes . . . . . . . . . . . . . . . . . . . . . . . . . . 19 81 Appendix B. Resourceprep . . . . . . . . . . . . . . . . . . . . 20 82 B.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 20 83 B.2. Character Repertoire . . . . . . . . . . . . . . . . . . . 21 84 B.3. Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 21 85 B.4. Normalization . . . . . . . . . . . . . . . . . . . . . . 21 86 B.5. Prohibited Output . . . . . . . . . . . . . . . . . . . . 21 87 B.6. Bidirectional Characters . . . . . . . . . . . . . . . . . 21 88 Appendix C. Differences From RFC 3920 . . . . . . . . . . . . . . 21 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 91 1. Introduction 93 1.1. Overview 95 The Extensible Messaging and Presence Protocol (XMPP) is an 96 application profile of the Extensible Markup Language [XML] for 97 streaming XML data in close to real time between any two or more 98 network-aware entities. The address format for XMPP entities was 99 originally developed in the Jabber open-source community in 1999, 100 first described by [XEP-0029] in 2002, and defined canonically by 101 [RFC3920] in 2004. 103 As specified in RFC 3920, the XMPP address format re-uses the 104 "stringprep" technology for preparation of non-ASCII characters 105 [STRINGPREP], including the Nameprep profile for internationalized 106 domain names as specified in [NAMEPREP] and [IDNA2003] along with two 107 XMPP-specific profiles for the localpart and resourcepart. 109 Since the publication of RFC 3920, IDNA2003 has been superseded by 110 IDNA2008 (see [IDNA-PROTO] and related documents), which is not based 111 on stringprep. Following the lead of the IDNA community, other 112 technology communities that use stringprep have begun discussions 113 about migrating away from stringprep toward more "modern" approaches. 114 The XMPP community is participating in those discussions in order to 115 find a replacement for the Nodeprep and Resourceprep profiles of 116 stringprep defined in RFC 3920. However, work on updated handling of 117 internationalized addresses is currently in progress within the 118 PRECIS Working Group and at the time of this writing it seems that 119 such work might take several years to complete. Because all other 120 aspects of revised documentation for XMPP have been incorporated into 121 [XMPP], the XMPP Working Group decided to split the XMPP address 122 format into a separate specification so as not to significantly delay 123 publication of improved documentation for XMPP while awaiting the 124 conclusion of work on updated handling of internationalized 125 addresses. 127 Therefore, this specification provides corrected documentation of the 128 XMPP address format using the internationalization technologies 129 available in 2004 (when RFC 3920 was published), with the intent that 130 this specification will be superseded as soon as work on a new 131 approach to preparation and comparison of internationalized strings 132 has been defined by the PRECIS Working Group and applied to the 133 specific cases of XMPP localparts and resourceparts. In the 134 meantime, this document normatively references [IDNA2003] and 135 [NAMEPREP]; XMPP software implementations are encouraged to begin 136 migrating to IDNA2008 (see [IDNA-PROTO] and related documents) 137 because it is nearly certain that the specification superseding this 138 one will re-use IDNA2008. 140 1.2. Terminology 142 Many important terms used in this document are defined in [IDNA2003], 143 [STRINGPREP], [UNICODE], and [XMPP]. 145 The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 146 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 147 "OPTIONAL" in this document are to be interpreted as described in 148 [KEYWORDS]. 150 1.3. Acknowledgements 152 Thanks to Ben Campbell, Waqas Hussain, Jehan Pages and Florian Zeitz 153 for their feedback. Thanks also to Richard Barnes for his review on 154 behalf of the Security Directorate. 156 The Working Group chairs were Ben Campbell and Joe Hildebrand. 158 The responsible Area Director was Gonzalo Camarillo. 160 Some text in this document was borrowed or adapted from [IDNA-DEFS], 161 [IDNA-PROTO], [IDNA-RATIONALE], and [XEP-0165]. 163 2. Addresses 165 2.1. Fundamentals 167 An XMPP entity is anything that is network-addressable and that can 168 communicate using XMPP. For historical reasons, the native address 169 of an XMPP entity is called a Jabber Identifier or JID. A valid JID 170 is a string of [UNICODE] code points, encoded using [UTF-8], and 171 structured as an ordered sequence of localpart, domainpart, and 172 resourcepart (where the first two parts are demarcated by the '@' 173 character used as a separator, and the last two parts are similarly 174 demarcated by the '/' character). 176 The syntax for a JID is defined as follows using the Augmented 177 Backus-Naur Form as specified in [ABNF]. 179 jid = [ localpart "@" ] domainpart [ "/" resourcepart ] 180 localpart = 1*(nodepoint) 181 ; a "nodepoint" is a UTF-8 encoded Unicode code 182 ; point that satisfies the Nodeprep profile of 183 ; stringprep 184 domainpart = IP-literal / IPv4address / ifqdn 185 ; the "IPv4address" and "IP-literal" rules are 186 ; defined in RFC 3986, and the first-match-wins 187 ; (a.k.a. "greedy") algorithm described in RFC 188 ; 3986 applies to the matching process 189 ifqdn = 1*(namepoint) 190 ; a "namepoint" is a UTF-8 encoded Unicode 191 ; code point that satisfies the Nameprep 192 ; profile of stringprep 193 resourcepart = 1*(resourcepoint) 194 ; a "resourcepoint" is a UTF-8 encoded Unicode 195 ; code point that satisfies the Resourceprep 196 ; profile of stringprep 198 All JIDs are based on the foregoing structure. One common use of 199 this structure is to identify a messaging and presence account, the 200 server that hosts the account, and a connected resource (e.g., a 201 specific device) in the form of . 202 However, localparts other than clients are possible; for example, a 203 specific chat room offered by a multi-user chat service (see 204 [XEP-0045]) is addressed as (where "room" is the name 205 of the chat room and "service" is the hostname of the multi-user chat 206 service) and a specific occupant of such a room could be addressed as 207 (where "nick" is the occupant's room nickname). 208 Many other JID types are possible (e.g., 209 could be a server-side script or service). 211 Each allowable portion of a JID (localpart, domainpart, and 212 resourcepart) MUST NOT be zero bytes in length and MUST NOT be more 213 than 1023 bytes in length, resulting in a maximum total size 214 (including the '@' and '/' separators) of 3071 bytes. 216 For the purpose of communication over an XMPP network (e.g., in the 217 'to' or 'from' address of an XMPP stanza), an entity's address MUST 218 be represented as a JID, not as a Uniform Resource Identifier [URI] 219 or Internationalized Resource Identifier [IRI]. An XMPP URI or IRI 220 [XMPP-URI] is in essence a JID prepended with 'xmpp:', but the native 221 addressing format used in XMPP is that of a mere JID without a URI 222 scheme. [XMPP-URI] is provided only for identification and 223 interaction outside the context of XMPP itself, for example when 224 linking to a JID from a web page. See [XMPP-URI] for a description 225 of the process for securely extracting a JID from an XMPP URI or IRI. 227 Implementation Note: When dividing a JID into its component parts, 228 an implementation needs to match the separator characters '@' and 229 '/' before applying any transformation algorithms, which might 230 decompose certain Unicode code points to the separator characters 231 (e.g., U+FE6B SMALL COMMERCIAL AT might decompose into U+0040 232 COMMERCIAL AT). 234 2.2. Domainpart 236 The DOMAINPART of a JID is that portion after the '@' character (if 237 any) and before the '/' character (if any); it is the primary 238 identifier and is the only REQUIRED element of a JID (a mere 239 domainpart is a valid JID). Typically a domainpart identifies the 240 "home" server to which clients connect for XML routing and data 241 management functionality. However, it is not necessary for an XMPP 242 domainpart to identify an entity that provides core XMPP server 243 functionality (e.g., a domainpart can identify an entity such as a 244 multi-user chat service, a publish-subscribe service, or a user 245 directory). 247 The domainpart for every XMPP service MUST be a fully qualified 248 domain name ("FQDN"; see [DNS]), IPv4 address, IPv6 address, or 249 unqualifed hostname (i.e., a text label that is resolvable on a local 250 network). 252 Interoperability Note: Domainparts that are IP addresses might not 253 be accepted by other services for the sake of server-to-server 254 communication, and domainparts that are unqualified hostnames 255 cannot be used on public networks because they are resolvable only 256 on a local network. 258 A domainpart MUST NOT be zero bytes in length and MUST NOT be more 259 than 1023 bytes in length. 261 If the domainpart includes a final character considered to be a label 262 separator (dot) by [IDNA2003] or [DNS], this character MUST be 263 stripped from the domainpart before the JID of which it is a part is 264 used for the purpose of routing an XML stanza, comparing against 265 another JID, or constructing an [XMPP-URI]; in particular, the 266 character MUST be stripped before any other canonicalization steps 267 are taken, such as application of the [NAMEPREP] profile of 268 [STRINGPREP] or completion of the ToASCII operation as described in 269 [IDNA2003]. 271 A domainpart consisting of a fully qualified domain name MUST be an 272 "internationalized domain name" as defined in [IDNA2003], that is, it 273 MUST be "a domain name in which every label is an internationalized 274 label" and MUST follow the rules for construction of 275 internationalized domain names specified in [IDNA2003]. When 276 preparing a text label (consisting of a sequence of UTF-8 encoded 277 Unicode code points) for representation as an internationalized label 278 in the process of constructing an XMPP domainpart or comparing two 279 XMPP domainparts, an application MUST ensure that for each text label 280 it is possible to apply without failing the ToASCII operation 281 specified in [IDNA2003] with the UseSTD3ASCIIRules flag set (thus 282 forbidding ASCII code points other than letters, digits, and 283 hyphens). If the ToASCII operation can be applied without failing, 284 then the label is an internationalized label. (Note: The ToASCII 285 operation includes application of the [NAMEPREP] profile of 286 [STRINGPREP] and encoding using the algorithm specified in 287 [PUNYCODE]; for details, see [IDNA2003].) Although XMPP applications 288 do not communicate the output of the ToASCII operation (called an 289 "ACE label") over the wire, it MUST be possible to apply that 290 operation without failing to each internationalized label. If an 291 XMPP application receives as input an ACE label, it SHOULD convert 292 that ACE label to an internationalized label using the ToUnicode 293 operation (see [IDNA2003]) before including the label in an XMPP 294 domainpart that will be communicated over the wire on an XMPP network 295 (however, instead of converting the label, there are legitimate 296 reasons why an application might instead refuse the input altogether 297 and return an error to the entity that provided the offending data). 299 In the terms of IDNA2008 [IDNA-DEFS], the domainpart of a JID is a 300 "domain name slot". 302 2.3. Localpart 304 The LOCALPART of a JID is an optional identifier placed before the 305 domainpart and separated from the latter by the '@' character. 306 Typically a localpart uniquely identifies the entity requesting and 307 using network access provided by a server (i.e., a local account), 308 although it can also represent other kinds of entities (e.g., a chat 309 room associated with a multi-user chat service). The entity 310 represented by an XMPP localpart is addressed within the context of a 311 specific domain. 313 A localpart MUST NOT be zero bytes in length and MUST NOT be more 314 than 1023 bytes in length. 316 A localpart MUST be formatted such that the Nodeprep profile of 317 [STRINGPREP] can be applied without failing (see Appendix A). Before 318 comparing two localparts, an application MUST first ensure that the 319 Nodeprep profile has been applied to each identifier (the profile 320 need not be applied each time a comparison is made, as long as it has 321 been applied before comparison). 323 2.4. Resourcepart 325 The resourcepart of a JID is an optional identifier placed after the 326 domainpart and separated from the latter by the '/' character. A 327 resourcepart can modify either a address or a 328 mere address. Typically a resourcepart uniquely 329 identifies a specific connection (e.g., a device or location) or 330 object (e.g., an occupant in a multi-user chat room) belonging to the 331 entity associated with an XMPP localpart at a local domain. 333 When an XMPP address does not include a resourcepart (i.e., when it 334 is of the form or ), it is 335 referred to as a BARE JID. When an XMPP address includes a 336 resourcepart (i.e., when it is of the form 337 or ), is referred to as a FULL 338 JID. 340 A resourcepart MUST NOT be zero bytes in length and MUST NOT be more 341 than 1023 bytes in length. 343 A resourcepart MUST be formatted such that the Resourceprep profile 344 of [STRINGPREP] can be applied without failing (see Appendix B). 345 Before comparing two resourceparts, an application MUST first ensure 346 that the Resourceprep profile has been applied to each identifier 347 (the profile need not be applied each time a comparison is made, as 348 long as it has been applied before comparison). 350 Informational Note: For historical reasons, the term "resource 351 identifier" is often used in XMPP to refer to the optional portion 352 of an XMPP address that follows the domainpart and the "/" 353 separator character; to help prevent confusion between an XMPP 354 "resource identifier" and the meanings of "resource" and 355 "identifier" provided in Section 1.1 of [URI], this specification 356 uses the term "resourcepart" instead of "resource identifier" (as 357 in RFC 3920). 359 XMPP entities SHOULD consider resourceparts to be opaque strings and 360 SHOULD NOT impute meaning to any given resourcepart. In particular: 362 o Use of the '/' character as a separator between the domainpart and 363 the resourcepart does not imply that XMPP addresses are 364 hierarchical in the way that, say, HTTP addresses are 365 hierarchical; thus for example an XMPP address of the form 366 does not identify a resource "bar" 367 that exists below a resource "foo" in a hierarchy of resources 368 associated with the entity "localpart@domain". 370 o The '@' character is allowed in the resourcepart, and is often 371 used in the "nick" shown in XMPP chatrooms. For example, the JID 372 describes an entity who is an 373 occupant of the room with an (asserted) 374 nick of . However, chatroom services do not 375 necessarily check such an asserted nick against the occupant's 376 real JID. 378 3. Internationalization Considerations 380 XMPP servers MUST, and XMPP clients SHOULD, support [IDNA2003] for 381 domainparts (including the [NAMEPREP] profile of [STRINGPREP]), the 382 Nodeprep (Appendix A) profile of [STRINGPREP] for localparts, and the 383 Resourceprep (Appendix B) profile of [STRINGPREP] for resourceparts; 384 this enables XMPP addresses to include a wide variety of characters 385 outside the US-ASCII range. Rules for enforcement of the XMPP 386 address format are provided in [XMPP]. 388 4. Security Considerations 390 4.1. Reuse of Stringprep 392 The security considerations described in [STRINGPREP] apply to the 393 Nodeprep (Appendix A) and Resourceprep (Appendix B) profiles defined 394 in this document for XMPP localparts and resourceparts. The security 395 considerations described in [STRINGPREP] and [NAMEPREP] apply to the 396 Nameprep profile that is re-used here for XMPP domainparts. 398 4.2. Reuse of Unicode 400 The security considerations described in [UNICODE-SEC] apply to the 401 use of Unicode characters in XMPP addresses. 403 4.3. Address Spoofing 405 There are two forms of address spoofing: forging and mimicking. 407 4.3.1. Address Forging 409 In the context of XMPP technologies, address forging occurs when an 410 entity is able to generate an XML stanza whose 'from' address does 411 not correspond to the account credentials with which the entity 412 authenticated onto the network (or an authorization identity provided 413 during SASL negotiation). For example, address forging occurs if an 414 entity that authenticated as "juliet@im.example.com" is able to send 415 XML stanzas from "nurse@im.example.com" or "romeo@example.net". 417 Address forging is difficult in XMPP systems, given the requirement 418 for sending servers to stamp 'from' addresses and for receiving 419 servers to verify sending domains via server-to-server authentication 420 (see [XMPP]). However, address forging is possible if: 422 o A poorly implemented server ignores the requirement for stamping 423 the 'from' address. This would enable any entity that 424 authenticated with the server to send stanzas from any 425 localpart@domainpart as long as the domainpart matches the sending 426 domain of the server. 428 o An actively malicious server generates stanzas on behalf of any 429 registered account. 431 Therefore, an entity outside the security perimeter of a particular 432 server cannot reliably distinguish between bare JIDs of the form 433 at that server and thus can authenticate only 434 the domainpart of such JIDs with any level of assurance. This 435 specification does not define methods for discovering or 436 counteracting such poorly implemented or rogue servers. However, the 437 end-to-end authentication or signing of XMPP stanzas could help to 438 mitigate this risk, since it would require the rogue server to 439 generate false credentials in addition to modifying 'from' addresses. 441 Furthermore, it is possible for an attacker to forge JIDs at other 442 domains by means of a DNS poisoning attack if DNS security extensions 443 [DNSSEC] are not used. 445 4.3.2. Address Mimicking 447 Address mimicking occurs when an entity provides legitimate 448 authentication credentials for and sends XML stanzas from an account 449 whose JID appears to a human user to be the same as another JID. For 450 example, in some XMPP clients the address "ju1iet@example.org" 451 (spelled with the number one as the third character of the localpart) 452 might appear to be the same as "juliet@example.org (spelled with the 453 lower-case version of the letter "L"), especially on casual visual 454 inspection; this phenomenon is sometimes called "typejacking". A 455 more sophisticated example of address mimicking might involve the use 456 of characters from outside the familiar Latin extended-A block of 457 Unicode code points, such as the characters U+13DA U+13A2 U+13B5 458 U+13AC U+13A2 U+13AC U+13D2 from the Cherokee block instead of the 459 similar-looking US-ASCII characters "STPETER". 461 In some examples of address mimicking, it is unlikely that the 462 average user could tell the difference between the real JID and the 463 fake JID. (Indeed, there is no programmatic way to distinguish with 464 full certainty which is the fake JID and which is the real JID; in 465 some communication contexts, the JID formed of Cherokee characters 466 might be the real JID and the JID formed of US-ASCII characters might 467 thus appear to be the fake JID.) Because JIDs can contain almost any 468 properly-encoded Unicode code point, it can be relatively easy to 469 mimic some JIDs in XMPP systems. The possibility of address 470 mimicking introduces security vulnerabilities of the kind that have 471 also plagued the World Wide Web, specifically the phenomenon known as 472 phishing. 474 These problems arise because Unicode and ISO/IEC 10646 repertoires 475 have many characters that look similar (so-called "confusable 476 characters" or "confusables"). In many cases, XMPP users might 477 perform visual matching, such as when comparing the JIDs of 478 communication partners. Because it is impossible to map similar- 479 looking characters without a great deal of context (such as knowing 480 the fonts used), stringprep and stringprep-based technologies such as 481 Nameprep, Nodeprep, and Resourceprep do nothing to map similar- 482 looking characters together, nor do they prohibit some characters 483 because they look like others. As a result, XMPP localparts and 484 resourceparts could contain confusable characters, producing JIDs 485 that appear to mimic other JIDs and thus leading to security 486 vulnerabilities such as the following: 488 o A localpart can be employed as one part of an entity's address in 489 XMPP. One common usage is as the username of an instant messaging 490 user; another is as the name of a multi-user chat room; and many 491 other kinds of entities could use localparts as part of their 492 addresses. The security of such services could be compromised 493 based on different interpretations of the internationalized 494 localpart; for example, a user entering a single internationalized 495 localpart could access another user's account information, or a 496 user could gain access to a hidden or otherwise restricted chat 497 room or service. 499 o A resourcepart can be employed as one part of an entity's address 500 in XMPP. One common usage is as the name for an instant messaging 501 user's connected resource; another is as the nickname of a user in 502 a multi-user chat room; and many other kinds of entities could use 503 resourceparts as part of their addresses. The security of such 504 services could be compromised based on different interpretations 505 of the internationalized resourcepart; for example, a user could 506 attempt to bind multiple resources with the same name, or a user 507 could send a message to someone other than the intended recipient 508 in a multi-user chat room. 510 Despite the fact that some specific suggestions about identification 511 and handling of confusable characters appear in the Unicode Security 512 Considerations [UNICODE-SEC], it is also true (as noted in 514 [IDNA-DEFS]) that "there are no comprehensive technical solutions to 515 the problems of confusable characters". Mimicked JIDs that involve 516 characters from only one script, or from the script typically 517 employed by a particular user or community of language users, are not 518 easy to combat (e.g., the simple typejacking attack previously 519 described, which relies on a surface similarity between the 520 characters "1" and "l" in some presentations). However, mimicked 521 addresses that involve characters from more than one script, or from 522 a script not typically employed by a particular user or community of 523 language users, can be mitigated somewhat through the application of 524 appropriate registration policies at XMPP services and presentation 525 policies in XMPP client software. Therefore the following policies 526 are encouraged: 528 1. Because an XMPP service that allows registration of XMPP user 529 accounts (localparts) plays a role similar to that of a registry 530 for DNS domain names, such a service SHOULD establish a policy 531 about the scripts or blocks of characters it will allow in 532 localparts at the service. Such a policy is likely to be 533 informed by the languages and scripts that are used to write 534 registered account names; in particular, to reduce confusion, the 535 service MAY forbid registration of XMPP localparts that contain 536 characters from more than one script and to restrict 537 registrations to characters drawn from a very small number of 538 scripts (e.g., scripts that are well-understood by the 539 administrators of the service). Such policies are also 540 appropriate for XMPP services that allow temporary or permanent 541 registration of XMPP resourceparts, e.g., during resource binding 542 [XMPP] or upon joining an XMPP-based chat room [XEP-0045]. For 543 related considerations in the context of domain name 544 registration, refer to Section 4.3 of [IDNA-PROTO] and Section 545 3.2 of [IDNA-RATIONALE]. Note well that methods for enforcing 546 such restrictions are out of scope for this document. 548 2. Because every human user of an XMPP client presumably has a 549 preferred language (or, in some cases, a small set of preferred 550 languages), an XMPP client SHOULD gather that information either 551 explicitly from the user or implicitly via the operating system 552 of the user's device. Furthermore, because most languages are 553 typically represented by a single script (or a small set of 554 scripts) and most scripts are typically contained in one or more 555 blocks of characters, an XMPP client SHOULD warn the user when 556 presenting a JID that mixes characters from more than one script 557 or block, or that uses characters outside the normal range of the 558 user's preferred language(s). This recommendation is not 559 intended to discourage communication across different communities 560 of language users; instead, it recognizes the existence of such 561 communities and encourages due caution when presenting unfamiliar 562 scripts or characters to human users. 564 5. IANA Considerations 566 The following sections update the registrations provided in 567 [RFC3920]. 569 5.1. Nodeprep Profile of Stringprep 571 The Nodeprep profile of stringprep is defined under Nodeprep 572 (Appendix A). The IANA has registered Nodeprep in the stringprep 573 profile registry. 575 Name of this profile: 577 Nodeprep 579 RFC in which the profile is defined: 581 XXXX 583 Indicator whether or not this is the newest version of the profile: 585 This is the first version of Nodeprep 587 5.2. Resourceprep Profile of Stringprep 589 The Resourceprep profile of stringprep is defined under Resourceprep 590 (Appendix B). The IANA has registered Resourceprep in the stringprep 591 profile registry. 593 Name of this profile: 595 Resourceprep 597 RFC in which the profile is defined: 599 XXXX 601 Indicator whether or not this is the newest version of the profile: 603 This is the first version of Resourceprep 605 6. Conformance Requirements 607 This section describes a protocol feature set that summarizes the 608 conformance requirements of this specification. This feature set is 609 appropriate for use in software certification, interoperability 610 testing, and implementation reports. For each feature, this section 611 provides the following information: 613 o A human-readable name 615 o An informational description 617 o A reference to the particular section of this document that 618 normatively defines the feature 620 o Whether the feature applies to the Client role, the Server role, 621 or both (where "N/A" signifies that the feature is not applicable 622 to the specified role) 624 o Whether the feature MUST or SHOULD be implemented, where the 625 capitalized terms are to be understood as described in [KEYWORDS] 627 The feature set specified here attempts to adhere to the concepts and 628 formats proposed by Larry Masinter within the IETF's NEWTRK Working 629 Group in 2005, as captured in [INTEROP]. Although this feature set 630 is more detailed than called for by [REPORTS], it provides a suitable 631 basis for the generation of implementation reports to be submitted in 632 support of advancing this specification from Proposed Standard to 633 Draft Standard in accordance with [PROCESS]. 635 Feature: address-domain-length 636 Description: Ensure that the domainpart of an XMPP address is at 637 least one byte in length and at most 1023 bytes in length. 638 Section: Section 2.2 639 Roles: Both MUST. 641 Feature: address-domain-prep 642 Description: Ensure that the domainpart of an XMPP address conforms 643 to the Nameprep profile of Stringprep. 644 Section: Section 2.2 645 Roles: Client SHOULD, Server MUST. 647 Feature: address-localpart-length 648 Description: Ensure that the localpart of an XMPP address is at 649 least one byte in length and at most 1023 bytes in length. 650 Section: Section 2.3 651 Roles: Both MUST. 653 Feature: address-localpart-prep 654 Description: Ensure that the localpart of an XMPP address conforms 655 to the Nodeprep profile of Stringprep. 656 Section: Section 2.3 657 Roles: Client SHOULD, Server MUST. 659 Feature: address-resource-length 660 Description: Ensure that the resourcepart of an XMPP address is at 661 least one byte in length and at most 1023 bytes in length. 662 Section: Section 2.4 663 Roles: Both MUST. 665 Feature: address-resource-prep 666 Description: Ensure that the resourcepart of an XMPP address 667 conforms to the Resourceprep profile of Stringprep. 668 Section: Section 2.2 669 Roles: Client SHOULD, Server MUST. 671 7. References 673 7.1. Normative References 675 [ABNF] Crocker, D. and P. Overell, "Augmented BNF for Syntax 676 Specifications: ABNF", STD 68, RFC 5234, January 2008. 678 [IDNA2003] 679 Faltstrom, P., Hoffman, P., and A. Costello, 680 "Internationalizing Domain Names in Applications (IDNA)", 681 RFC 3490, March 2003. 683 See Section 1 for an explanation of why the normative 684 reference to an obsoleted specification is needed. 686 [KEYWORDS] 687 Bradner, S., "Key words for use in RFCs to Indicate 688 Requirement Levels", BCP 14, RFC 2119, March 1997. 690 [NAMEPREP] 691 Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep 692 Profile for Internationalized Domain Names (IDN)", 693 RFC 3491, March 2003. 695 See Section 1 for an explanation of why the normative 696 reference to an obsoleted specification is needed. 698 [STRINGPREP] 699 Hoffman, P. and M. Blanchet, "Preparation of 700 Internationalized Strings ("stringprep")", RFC 3454, 701 December 2002. 703 [UNICODE] The Unicode Consortium, "The Unicode Standard, Version 704 3.2.0", 2000. 706 The Unicode Standard, Version 3.2.0 is defined by The 707 Unicode Standard, Version 3.0 (Reading, MA, Addison- 708 Wesley, 2000. ISBN 0-201-61633-5), as amended by the 709 Unicode Standard Annex #27: Unicode 3.1 710 (http://www.unicode.org/reports/tr27/) and by the Unicode 711 Standard Annex #28: Unicode 3.2 712 (http://www.unicode.org/reports/tr28/). 714 [UNICODE-SEC] 715 The Unicode Consortium, "Unicode Technical Report #36: 716 Unicode Security Considerations", 2008. 718 [UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO 719 10646", STD 63, RFC 3629, November 2003. 721 [XMPP] Saint-Andre, P., "Extensible Messaging and Presence 722 Protocol (XMPP): Core", draft-ietf-xmpp-3920bis-19 (work 723 in progress), November 2010. 725 7.2. Informative References 727 [DNS] Mockapetris, P., "Domain names - implementation and 728 specification", STD 13, RFC 1035, November 1987. 730 [DNSSEC] Arends, R., Austein, R., Larson, M., Massey, D., and S. 731 Rose, "DNS Security Introduction and Requirements", 732 RFC 4033, March 2005. 734 [IDNA-DEFS] 735 Klensin, J., "Internationalized Domain Names for 736 Applications (IDNA): Definitions and Document Framework", 737 RFC 5890, August 2010. 739 [IDNA-PROTO] 740 Klensin, J., "Internationalized Domain Names in 741 Applications (IDNA): Protocol", RFC 5891, August 2010. 743 [IDNA-RATIONALE] 744 Klensin, J., "Internationalized Domain Names for 745 Applications (IDNA): Background, Explanation, and 746 Rationale", RFC 5894, August 2010. 748 [INTEROP] Masinter, L., "Formalizing IETF Interoperability 749 Reporting", draft-ietf-newtrk-interop-reports-00 (work in 750 progress), October 2005. 752 [IRI] Duerst, M. and M. Suignard, "Internationalized Resource 753 Identifiers (IRIs)", RFC 3987, January 2005. 755 [PROCESS] Bradner, S., "The Internet Standards Process -- Revision 756 3", BCP 9, RFC 2026, October 1996. 758 [PUNYCODE] 759 Costello, A., "Punycode: A Bootstring encoding of Unicode 760 for Internationalized Domain Names in Applications 761 (IDNA)", RFC 3492, March 2003. 763 [REPORTS] Dusseault, L. and R. Sparks, "Guidance on Interoperation 764 and Implementation Reports for Advancement to Draft 765 Standard", BCP 9, RFC 5657, September 2009. 767 [RFC3920] Saint-Andre, P., Ed., "Extensible Messaging and Presence 768 Protocol (XMPP): Core", RFC 3920, October 2004. 770 [URI] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 771 Resource Identifier (URI): Generic Syntax", STD 66, 772 RFC 3986, January 2005. 774 [XEP-0029] 775 Kaes, C., "Definition of Jabber Identifiers (JIDs)", XSF 776 XEP 0029, October 2003. 778 [XEP-0030] 779 Hildebrand, J., Millard, P., Eatmon, R., and P. Saint- 780 Andre, "Service Discovery", XSF XEP 0030, June 2008. 782 [XEP-0045] 783 Saint-Andre, P., "Multi-User Chat", XSF XEP 0045, 784 July 2008. 786 [XEP-0060] 787 Millard, P., Saint-Andre, P., and R. Meijer, "Publish- 788 Subscribe", XSF XEP 0060, September 2008. 790 [XEP-0165] 791 Saint-Andre, P., "Best Practices to Discourage JID 792 Mimicking", XSF XEP 0045, December 2007. 794 [XML] Paoli, J., Maler, E., Sperberg-McQueen, C., Yergeau, F., 795 and T. Bray, "Extensible Markup Language (XML) 1.0 (Fourth 796 Edition)", World Wide Web Consortium Recommendation REC- 797 xml-20060816, August 2006, 798 . 800 [XMPP-URI] 801 Saint-Andre, P., "Internationalized Resource Identifiers 802 (IRIs) and Uniform Resource Identifiers (URIs) for the 803 Extensible Messaging and Presence Protocol (XMPP)", 804 RFC 5122, February 2008. 806 Appendix A. Nodeprep 808 A.1. Introduction 810 This appendix defines the "Nodeprep" profile of stringprep. As such, 811 it specifies processing rules that will enable users to enter 812 internationalized localparts in the Extensible Messaging and Presence 813 Protocol (XMPP) and have the highest chance of getting the content of 814 the strings correct. (An XMPP localpart is the optional portion of 815 an XMPP address that precedes an XMPP domainpart and the '@' 816 separator; it is often but not exclusively associated with an instant 817 messaging username.) These processing rules are intended only for 818 XMPP localparts and are not intended for arbitrary text or any other 819 aspect of an XMPP address. 821 This profile defines the following, as required by [STRINGPREP]: 823 o The intended applicability of the profile: internationalized 824 localparts within XMPP 825 o The character repertoire that is the input and output to 826 stringprep: Unicode 3.2, specified in Section 2 of this Appendix 827 o The mappings used: specified in Section 3 828 o The Unicode normalization used: specified in Section 4 829 o The characters that are prohibited as output: specified in Section 830 5 831 o Bidirectional character handling: specified in Section 6 833 A.2. Character Repertoire 835 This profile uses Unicode 3.2 with the list of unassigned code points 836 being Table A.1, both defined in Appendix A of [STRINGPREP]. 838 A.3. Mapping 840 This profile specifies mapping using the following tables from 841 [STRINGPREP]: 843 Table B.1 844 Table B.2 846 A.4. Normalization 848 This profile specifies the use of Unicode normalization form KC, as 849 described in [STRINGPREP]. 851 A.5. Prohibited Output 853 This profile specifies the prohibition of using the following tables 854 from [STRINGPREP]. 856 Table C.1.1 857 Table C.1.2 858 Table C.2.1 859 Table C.2.2 860 Table C.3 861 Table C.4 862 Table C.5 863 Table C.6 864 Table C.7 865 Table C.8 866 Table C.9 868 In addition, the following additional Unicode characters are also 869 prohibited: 871 U+0022 (QUOTATION MARK), i.e., " 872 U+0026 (AMPERSAND), i.e., & 873 U+0027 (APOSTROPHE), i.e., ' 874 U+002F (SOLIDUS), i.e., / 875 U+003A (COLON), i.e., : 876 U+003C (LESS-THAN SIGN), i.e., < 877 U+003E (GREATER-THAN SIGN), i.e., > 878 U+0040 (COMMERCIAL AT), i.e., @ 880 A.6. Bidirectional Characters 882 This profile specifies checking bidirectional strings, as described 883 in Section 6 of [STRINGPREP]. 885 A.7. Notes 887 Because the additional characters prohibited by Nodeprep are 888 prohibited after normalization, an implementation MUST NOT enable a 889 human user to input any Unicode code point whose decomposition 890 includes those characters; such code points include but are not 891 necessarily limited to the following (refer to [UNICODE] for complete 892 information). 894 o U+2100 (ACCOUNT OF) 895 o U+2101 (ADDRESSED TO THE SUBJECT) 896 o U+2105 (CARE OF) 897 o U+2106 (CADA UNA) 898 o U+226E (NOT LESS-THAN) 899 o U+226F (NOT GREATER-THAN) 900 o U+2A74 (DOUBLE COLON EQUAL) 901 o U+FE13 (SMALL COLON) 902 o U+FE60 (SMALL AMPERSAND) 903 o U+FE64 (SMALL LESS-THAN SIGN) 904 o U+FE65 (SMALL GREATER-THAN SIGN) 905 o U+FE6B (SMALL COMMERCIAL AT) 906 o U+FF02 (FULLWIDTH QUOTATION MARK) 907 o U+FF06 (FULLWIDTH AMPERSAND) 908 o U+FF07 (FULLWIDTH APOSTROPHE) 909 o U+FF0F (FULLWIDTH SOLIDUS) 910 o U+FF1A (FULLWIDTH COLON) 911 o U+FF1C (FULLWIDTH LESS-THAN SIGN) 912 o U+FF1E (FULLWIDTH GREATER-THAN SIGN) 913 o U+FF20 (FULLWIDTH COMMERCIAL AT) 915 Appendix B. Resourceprep 917 B.1. Introduction 919 This appendix defines the "Resourceprep" profile of stringprep. As 920 such, it specifies processing rules that will enable users to enter 921 internationalized resourceparts in the Extensible Messaging and 922 Presence Protocol (XMPP) and have the highest chance of getting the 923 content of the strings correct. (An XMPP resourcepart is the 924 optional portion of an XMPP address that follows an XMPP domainpart 925 and the '/' separator.) These processing rules are intended only for 926 XMPP resourceparts and are not intended for arbitrary text or any 927 other aspect of an XMPP address. 929 This profile defines the following, as required by [STRINGPREP]: 931 o The intended applicability of the profile: internationalized 932 resourceparts within XMPP 933 o The character repertoire that is the input and output to 934 stringprep: Unicode 3.2, specified in Section 2 of this Appendix 935 o The mappings used: specified in Section 3 936 o The Unicode normalization used: specified in Section 4 937 o The characters that are prohibited as output: specified in Section 938 5 939 o Bidirectional character handling: specified in Section 6 941 B.2. Character Repertoire 943 This profile uses Unicode 3.2 with the list of unassigned code points 944 being Table A.1, both defined in Appendix A of [STRINGPREP]. 946 B.3. Mapping 948 This profile specifies mapping using the following tables from 949 [STRINGPREP]: 951 Table B.1 953 B.4. Normalization 955 This profile specifies the use of Unicode normalization form KC, as 956 described in [STRINGPREP]. 958 B.5. Prohibited Output 960 This profile specifies the prohibition of using the following tables 961 from [STRINGPREP]. 963 Table C.1.2 964 Table C.2.1 965 Table C.2.2 966 Table C.3 967 Table C.4 968 Table C.5 969 Table C.6 970 Table C.7 971 Table C.8 972 Table C.9 974 B.6. Bidirectional Characters 976 This profile specifies checking bidirectional strings, as described 977 in Section 6 of [STRINGPREP]. 979 Appendix C. Differences From RFC 3920 981 Based on consensus derived from implementation and deployment 982 experience as well as formal interoperability testing, the following 983 substantive modifications were made from RFC 3920. 985 o Corrected the ABNF syntax to (1) ensure consistency with [URI] and 986 [IRI], and (2) prevent zero-length localparts, domainparts, and 987 resourceparts. 988 o To avoid confusion with the term "node" as used in [XEP-0030] and 989 [XEP-0060], changed the term "node identifier" to "localpart" (but 990 retained the name "Nodeprep" for backward compatibility). 991 o To avoid confusion with the terms "resource" and "identifier" as 992 used in [URI], changed the term "resource identifier" to 993 "resourcepart". 994 o Corrected the nameprep processing rules to require use of the 995 UseSTD3ASCIIRules flag. 997 Author's Address 999 Peter Saint-Andre 1000 Cisco 1001 1899 Wyknoop Street, Suite 600 1002 Denver, CO 80202 1003 USA 1005 Phone: +1-303-308-3282 1006 Email: psaintan@cisco.com