idnits 2.17.1 draft-klensin-iri-sri-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC3987, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (July 9, 2012) is 4299 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'XML' -- Obsolete informational reference (is this intentional?): RFC 3490 (Obsoleted by RFC 5890, RFC 5891) -- Obsolete informational reference (is this intentional?): RFC 4627 (Obsoleted by RFC 7158, RFC 7159) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IETF J. Klensin 3 Internet-Draft S. Moonesamy 4 Obsoletes: 3987 (if approved) July 9, 2012 5 Intended status: Standards Track 6 Expires: January 10, 2013 8 An XML-based Simple Resource Identifier 9 draft-klensin-iri-sri-00.txt 11 Abstract 13 While the URI specification has been widely deployed, it has long 14 been recognized that many valid URIs, especially those that contain 15 extensive information in the "tail" are unsuitable for user 16 presentation, especially for internationalized environments. IRIs 17 have been proposed as a solution for that problem but inherit (and 18 are constrained by) the complex and sometimes method-dependent syntax 19 model of URIs as well as positional and ordering assumptions that 20 make them more difficult to localize than is desirable. 22 This specification illustrates a way to define an "above URI" model 23 for a localization-friendly simple reference identifier (SRI) that 24 explicitly identifies fields and is more appropriate than IRIs to 25 support localization. The current version is intended simply to 26 initiate a discussion. In particular, while it is written to use an 27 XML element syntax model, variations using JSON or some other system 28 with explicitly-labeled data fields might be as, or more, 29 appropriate. 31 Status of this Memo 33 This Internet-Draft is submitted in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Internet-Drafts are working documents of the Internet Engineering 37 Task Force (IETF). Note that other groups may also distribute 38 working documents as Internet-Drafts. The list of current Internet- 39 Drafts is at http://datatracker.ietf.org/drafts/current/. 41 Internet-Drafts are draft documents valid for a maximum of six months 42 and may be updated, replaced, or obsoleted by other documents at any 43 time. It is inappropriate to use Internet-Drafts as reference 44 material or to cite them other than as "work in progress." 46 This Internet-Draft will expire on January 10, 2013. 48 Copyright Notice 49 Copyright (c) 2012 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (http://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 66 1.2. Status and Discussion . . . . . . . . . . . . . . . . . . 3 67 2. Tagged Elements . . . . . . . . . . . . . . . . . . . . . . . 4 68 3. Data Element Description . . . . . . . . . . . . . . . . . . . 4 69 3.1. scheme Element . . . . . . . . . . . . . . . . . . . . . . 5 70 3.2. authority Element . . . . . . . . . . . . . . . . . . . . 5 71 3.2.1. user-info Element . . . . . . . . . . . . . . . . . . 5 72 3.2.2. host Element . . . . . . . . . . . . . . . . . . . . . 5 73 3.2.3. port Element . . . . . . . . . . . . . . . . . . . . . 5 74 3.3. path Element . . . . . . . . . . . . . . . . . . . . . . . 5 75 3.4. query Element . . . . . . . . . . . . . . . . . . . . . . 5 76 3.5. fragment Element . . . . . . . . . . . . . . . . . . . . . 6 77 4. Internationalization and Escapes . . . . . . . . . . . . . . . 6 78 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 79 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 7 80 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 81 8. Security Considerations . . . . . . . . . . . . . . . . . . . 7 82 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 83 9.1. Normative References . . . . . . . . . . . . . . . . . . . 7 84 9.2. Informative References . . . . . . . . . . . . . . . . . . 8 85 Appendix A. This Specification and the IRI Approach . . . . . . . 8 86 Appendix B. XML DTD . . . . . . . . . . . . . . . . . . . . . . . 9 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 89 1. Introduction 91 While the URI specification [RFC3986] has been widely deployed, it 92 has long been recognized that many valid URIs, especially those that 93 contain extensive information in the "tail" are unsuitable for user 94 presentation, especially for internationalized environments. IRIs 95 [RFC3987] have been proposed as a solution for that problem but 96 inherit (and are constrained by) the complex and sometimes method- 97 dependent syntax model of URIs as well as positional and ordering 98 assumptions that make them more difficult to localize than is 99 desirable. 101 This specification illustrates a way to define a localization- 102 friendly "above URI" simple syntax (a "SRI") that explicitly 103 identifies fields and is more appropriate than IRIs to support 104 localization. 106 [[anchor2: Note in Draft: "Simple" is chosen in the grand tradition 107 of "simple" protocols like SMTP and SIP". Certainly the parsing of 108 the compound identifier into components is simpler than the URI 109 model. But suggestions for alternate terms would be welcome if 110 "simple" turns into flame-bait.]] 112 This specification obviates most, if not all, of the perceived need 113 for IRIs and hence obsoletes the specification of them in RFC 3087. 114 A discussion of the reasons for that action appears in Appendix A. 116 1.1. Terminology 118 The terms "i18n" and "l10n" are liberally used as abbreviations for 119 "internationalization" and "localization", respectively, in this 120 specification. 122 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 123 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 124 document are to be interpreted as described in RFC 2119 [RFC2119]. 126 1.2. Status and Discussion 128 [[anchor5: RFC Editor: Please remove this subsection.]] 130 This draft is a pre-proposal to stimulate discussion of the IRI 131 approach and alternatives to it. While it is deliberately 132 incomplete, the path to an actual proposal should be clear. Also, 133 the choice of an XML element syntax model [XML] structure was fairly 134 arbitrary. It would probably be equally reasonable to support a JSON 135 [RFC4627] or other structure instead (or additionally) as long as the 136 basic syntax chosen supports clear identification of data elements 137 and a very precise and context-independent syntax for element values. 139 Discussion of this draft should occur on the IRI WG mailing list. 140 Details about subscription and archives for the list may be found at 141 XXXXX. 143 2. Tagged Elements 145 Much of the complexity in the URI specification lies in trying to 146 identify and extract the various parts of a URI. That process is 147 complicated by scheme-dependent elements and the associated 148 delimiters which may be reserved or not depending on the scheme. 149 That work may be appropriate if some system actually needs to parse 150 and execute a URI -- an activity that requires understanding the 151 scheme in any event-- but may be less appropriate for an i18n / l10n 152 overlay. 154 This specification overcomes that problem and the associated 155 complexities introduced by characters outside the ASCII repertoire, 156 URI escaping conventions, and so on by eliminating the constraint of 157 forward compatibility with URIs in favor of a more international 158 format that can be easily localized and equally easily be mapped into 159 that URI syntax. 161 3. Data Element Description 163 This section maps the various components of URIs into XML elements. 164 For purposes of this specification, the URI syntax is discarded; only 165 the data elements are retained. The mapping from an XML-structured 166 document using these elements to URI syntax should be fairly obvious 167 [[anchor8: ...and possibly covered in more detail in a future version 168 of this spec]]. It is obviously possible to specify a collection of 169 elements with this specification that, when mapped back into URI 170 syntax, will be invalid or confusing for a particular scheme. If 171 that is perceived as an issue, specific lists of what elements are 172 valid for which schemes should be easy to compile. 174 The basic structure starts with a localization-friendly element that 175 contains all other elements (and has no direct textual content): 176 178 [[anchor9: Note in Draft: Each of the subsections that follow can 179 probably benefit from some fleshing-out. For this version, the 180 general intent should be clear. It is likely that several more 181 subsidiary elements are needed, but that is a topic for future 182 discussion.]] 184 3.1. scheme Element 186 SchemeName 188 The Scheme element has no subsidiary elements. 190 3.2. authority Element 192 193 Authority elements as below. 194 196 The Authority element has the subsidiary elements listed in the 197 subsections below. 199 3.2.1. user-info Element 201 3.2.2. host Element 203 Domain names are subject to special rules because of IDNA 204 considerations, so the normal content of the host element is a domain 205 element. [Domain-]relative URIs do not use the domain element. 207 3.2.2.1. domain Element 209 Fully-qualified-domain-name 211 3.2.3. port Element 213 NN 214 NN is a numeric port number. 216 3.3. path Element 218 PathString 220 [[anchor16: Subsidiary elements here, including and/or 221 when appropriate.]] 223 3.4. query Element 225 QueryString 227 [[anchor18: Subsidiary elements here, including and/or 228 when appropriate.]] 230 3.5. fragment Element 232 FragmentName or other identifier 234 The Fragment element has no subsidiary elements. 236 4. Internationalization and Escapes 238 Part of the goal for the format specified here is to express the 239 abstract components of a URI as naturally as possible. Consequently, 240 any text component of any element can be expressed in UTF-8 in 241 normalization form NFC. Escapes ("%" or otherwise) are prohibited 242 except as required by XML. If "%" appears, it must be doubled in 243 mapping to URI syntax. 245 5. Examples 247 [[anchor22: There should be several of these, each showing a URI and 248 the matching XRI form.]] 250 The URI that would appear as 251 http://example.com/test?sri=http://example.net/ 252 Would appear in this form as: 254 255 256 http 257 258 259 example.com 260 261 262 /test 263 264 265 http 266 267 268 example.net 269 270 271 272 273 274 276 [[anchor23: Note in draft: RFC (and I-D) constraints prohibit showing 277 one of these data structures with characters in it outside the ASCII 278 repertoire. If the document ever progresses to RFC, an alternate 279 form that can show such examples including such characters should be 280 a requirement.]] 282 6. Acknowledgements 284 Some of the structuring information for this document was derived 285 from a W3C working draft on URLs [W3C-URL] as well as the URI 286 specification. The thinking that led to this work started with a 287 discussion many years ago with James Seng in which he pointed out 288 that the "natural" ordering of components of compound identifiers 289 differed by culture. 291 7. IANA Considerations 293 [[anchor24: RFC Editor: Please remove this section before 294 publication.]] 296 This memo includes no requests to or actions for IANA. 298 8. Security Considerations 300 The model introduced in this specification does not raise any 301 security issues not already present in the URI specification that 302 would not be caught by a URI processor. Because it is less subtle 303 and complex than the URI specification, it may actually lead to a 304 reduction in vunerabilities. 306 9. References 308 9.1. Normative References 310 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 311 Requirement Levels", BCP 14, RFC 2119, March 1997. 313 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 314 Resource Identifier (URI): Generic Syntax", STD 66, 315 RFC 3986, January 2005. 317 [RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource 318 Identifiers (IRIs)", RFC 3987, January 2005. 320 [XML] Bray, T., Ed., Paoli, J., Ed., Sperberg-McQueen, C., Ed., 321 and E. Maler, Ed., "Extensible Markup Language (XML) 1.0 322 (Second Edition), W3C=20 Recommendation", October 2000, 323 . 325 9.2. Informative References 327 [IRI-Charter] 328 IETF, "Internationalized Resource Identifiers (iri)", 329 Captured 2012-07-05, 2019, 330 . 332 [RFC3490] Faltstrom, P., Hoffman, P., and A. Costello, 333 "Internationalizing Domain Names in Applications (IDNA)", 334 RFC 3490, March 2003. 336 [RFC4627] Crockford, D., "The application/json Media Type for 337 JavaScript Object Notation (JSON)", RFC 4627, July 2006. 339 [RFC5890] Klensin, J., "Internationalized Domain Names for 340 Applications (IDNA): Definitions and Document Framework", 341 RFC 5890, August 2010. 343 [RFC6055] Thaler, D., Klensin, J., and S. Cheshire, "IAB Thoughts on 344 Encodings for Internationalized Domain Names", RFC 6055, 345 February 2011. 347 [W3C-URL] W3C, "URL", Captured 2012-07-03, 2012, 348 . 350 Appendix A. This Specification and the IRI Approach 352 The original IRI specification [RFC3987] was intended as a strict 353 superset of the URI syntax [RFC3986] with all URI forms being 354 permitted but with the use of non-escaped UTF-8 strings also being 355 allowed. IRIs were not separate protocol identifiers or intended for 356 use "on the wire". Instead, they were intended as an overlay for 357 URIs that was more convenient for users. In part because of the 358 interaction with the original [RFC3490] and revised [RFC5890] 359 versions of the IDNA specification, the mapping from IRIs to URIs was 360 not unique: one could map a domain name expressed as a UTF-8 string 361 into either a URI escape sequence or into a set of IDNA A-labels. 362 That choice interacted badly with the domain name encoding 363 considerations discussed by the IAB [RFC6055] and, more importantly, 364 with URI comparisons in caches and similar contexts. 366 Based on those and other considerations, an IETF WG charged with IRI 367 revision [IRI-Charter] concluded that IRIs should be treated as a 368 separate protocol identifier, primarily for use in new protocols, 369 rather than as a strictly-forward-compatible URI overlay. That 370 decision immediately raised the question of whether it was more 371 valuable to preserve a URI-like syntax or depart from it entirely. 372 This specification resulted from the desire to explore the 373 possibilities that would be opened up by abandoning the constraint of 374 apparent similarity to the URI syntax. But, just as the decision to 375 move to a separate protocol identifier essentially recognizes that 376 the IRIs defined in RFC 3987 was not feasible and an IRI variation 377 that defined a new protocol element while retaining the general form 378 of the URI syntax would obsolete 3987, this specification does as 379 well: whether the underlying syntax model is changed or not, the WG 380 has concluded that IRIs as defined in RFC 3987 are inappropriate for 381 general use on the public Internet. 383 Appendix B. XML DTD 385 387 388 391 393 395 397 399 401 403 405 407 408 410 412 Authors' Addresses 414 John C Klensin 415 1770 Massachusetts Ave, Ste 322 416 Cambridge, MA 02140 417 USA 419 Phone: +1 617 491 5735 420 Email: john-ietf@jck.com 422 Subramanian Moonesamy 423 76, Ylang Ylang Avenue 424 Quatre Bornes 425 Mauritius 427 Email: sm+ietf@elandsys.com