idnits 2.17.1 draft-ietf-urlreg-guide-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 425 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'URL-PROCESS' on line 361 ** Obsolete normative reference: RFC 2396 (ref. '1') (Obsoleted by RFC 3986) -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 2279 (ref. '3') (Obsoleted by RFC 3629) Summary: 10 errors (**), 0 flaws (~~), 3 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Larry Masinter 2 Harald T. Alvestrand 3 March 25, 1999 Dan Zigmond 4 Rich Petke 6 Guidelines for new URL Schemes 8 Status of this Memo 10 This document is an Internet-Draft and is in full conformance with 11 all provisions of Section 10 of RFC 2026. Internet-Drafts are 12 working documents of the Internet Engineering Task Force (IETF), its 13 areas, and its working groups. Note that other groups may also 14 distribute working documents as Internet-Drafts. Internet-Drafts 15 are draft documents valid for a maximum of six months and may be 16 updated, replaced, or obsoleted by other documents at any time. It 17 is inappropriate to use Internet-Drafts as reference material or to 18 cite them other than as "work in progress." The list of current 19 Internet-Drafts can be accessed at 20 http://www.ietf.org/ietf/1id-abstracts.txt The list of 21 Internet-Draft Shadow Directories can be accessed at 22 http://www.ietf.org/shadow.html. 24 Distribution of this memo is unlimited. 26 This Internet Draft expires September 25, 1999. 28 Copyright Notice 30 Copyright (C) The Internet Society (1999). All Rights Reserved. 32 Abstract 34 A Uniform Resource Locator (URL) is a compact string representation 35 of the location for a resource that is available via the Internet. 36 This document provides guidelines for the definition of new URL 37 schemes. 39 1. Introduction 41 A Uniform Resource Locator (URL) is a compact string representation 42 of the location for a resource that is available via the Internet. 43 RFC 2396 [1] defines the general syntax and semantics of URIs, and, 44 by inclusion, URLs. URLs are designated by including a ":" 45 and then a "". Many URL schemes are already 46 defined. 48 This document provides guidelines for the definition of new URL 49 schemes, for consideration by those who are defining and 50 registering or evaluating those definitions. 52 The process by which new URL schemes are registered is defined in 53 RFC [URL-PROCESS] [2]. 55 2. Guidelines for new URL schemes 57 Because new URL schemes potentially complicate client software, new 58 schemes must have demonstrable utility and operability, as well as 59 compatibility with existing URL schemes. This section elaborates 60 these criteria. 62 2.1 Syntactic compatibility 64 New URL schemes should follow the same syntactic conventions of 65 existing schemes when appropriate. If a URI scheme that has 66 embedded links in content accessed by that scheme does not share 67 syntax with a different scheme, the same content cannot be served up 68 under different schemes without rewriting the content. This can 69 already be a problem, and with future digital signature schemes, 70 rewriting may not even be possible. Deployment of other schemes in 71 the future could therefore become extremely difficult. 73 2.1.1 Motivations for syntactic compatibility 75 Why should new URL schemes share as much of the generic URI syntax 76 (that makes sense to share) as possible? Consider the following: 78 o If fragment syntax isn't shared between two schemes, (e.g. ""), you can't move individual completely self 80 referential documents between schemes without rewriting the 81 embedded references within the document. In the Web, the fragment 82 syntax is a property of the media type, and evaluated by the 83 client. 85 o If fragment syntax is not shared between different media types of 86 the same capability (e.g. HTML, XML, Word, or image types such as 87 GIF, JPEG, PNG) then you can't have a URI reference that can 88 evolve to superior media types as they become available, or even 89 likely work properly today with content negotiation. 91 o If relative syntax (to the extent of understanding the URI is 92 relative, and what part of the URI string is relative) isn't 93 shared between two schemes, (e.g. ""), you can't 94 move sets of documents that are internally self referential 95 between schemes without rewriting the embedded URIs. 97 o If the ".." syntax as a path component in relative URI's isn't 98 shared between schemes, you can't easily have sets of document 99 sets and refer to them between schemes without rewriting the 100 embedded references. 102 o If the "/" syntax (to the extent of understanding that the URI 103 refers to a path relative to the current naming authority, see 104 section 2.1.1) isn't shared, you can't have multiple sets of 105 documents easily be moved up or down in a relative hierarchy of 106 names and share a common set of documents between them, without 107 rewriting the content, shared either in that scheme or between 108 schemes. The best example is a site that has a common set of 109 GIF's, JPEG and PNG images, and you want to reorganize the site 110 changing the depth of a subtree from one depth to another, or 111 from one directory to another where the depth isn't the same. 113 o If naming authority syntax (e.g. what comes after "//" in most URL 114 schemes, see section 2.1.1) and relative path syntax is shared, to 115 the extent of understanding that the URI has a naming authority, 116 and what part of the URI string is the naming authority vs. path), 117 isn't shared between two schemes, you can't share identical name 118 spaces and serve them up via different schemes. (The naming 119 authority syntax is a property of the scheme). The fact that 120 HTTP, and FTP have the same syntax, for example, has often been 121 exploited by sites transitioning from ftp archive service to HTTP 122 archive service so that the URL's can be identical between schemes 123 except for the scheme; the same content can be served via two 124 schemes simultaneously. 126 2.1.2 Improper use of "//" following ":" 128 Contrary to some examples set in past years, the use of double 129 slashes as the first component of the of a 130 URL is not simply an artistic indicator that what follows is a URL: 131 Double slashes are used ONLY when the syntax of the URL's 132 contains a hierarchical structure as 133 described in RFC 2396. In URLs from such schemes, the use of double 134 slashes indicates that what follows is the top hierarchical element 135 for a naming authority. (See section 3 of RFC 2396 for more 136 details.) URL schemes which do not contain a conformant 137 hierarchical structure in their should not 138 use double slashes following the ":" string. 140 2.1.3 Compatibility with relative URLs 142 URL schemes should use the generic URL syntax if they are intended 143 to be used with relative URLs. A description of the allowed 144 relative forms should be included in the scheme's definition. 145 Many applications use relative URLs extensively. Specifically, 147 o Can the scheme be parsed according to RFC 2396 - that is, if the 148 tokens "//", "/", ";", "?" and "#" are used, do they have the 149 meaning given in RFC 2396? 151 o Does the scheme make sense to use it in relative URLs like those 152 RFC 2396 specifies? 154 o If the scheme syntax is designed to be broken into pieces, does 155 the documentation for the scheme's syntax specify what those 156 pieces are, why it should be broken in this way, and why the 157 breaks aren't where RFC 2396 says that they usually should be? 159 o If the scheme has a hierarchy, does it go left-to-right and with 160 slash separators like RFC 2396? If not, why not? 162 2.1.4 Compatibility with fragment syntax 164 Fragment syntax should be shared across URL schemes whenever 165 possible. Fragments indicate a location within a particular 166 document, of a particular media type. As media types evolve, 167 and content negotiation becomes deployed, a shared fragment syntax 168 allows a fragment to point to the correct location within documents 169 of different media types. For example, a named fragment (#foo), 170 should to be able to point to the foo label in either a HTML 171 document or an XML document. Similarly for fragments identifying a 172 location in an image, where the image may want to evolve from GIF, 173 to JPEG, to PNG, the fragment ID should point to the same location. 175 2.2 Is the scheme well defined? 177 It is important that the semantics of the "resource" that a URL 178 "locates" be well defined. This might mean different things 179 depending on the nature of the URL scheme. 181 2.2.1 Clear mapping from other name spaces 183 In many cases, new URL schemes are defined as ways to translate 184 other protocols and name spaces into the general framework of 185 URLs. The "ftp" URL scheme translates from the FTP protocol, while 186 the "mid" URL scheme translates from the Message-ID field of 187 messages. 189 In either case, the description of the mapping must be complete, 190 must describe how characters get encoded or not in URLs, must 191 describe exactly how all legal values of the base standard can be 192 represented using the URL scheme, and exactly which modifiers, 193 alternate forms and other artifacts from the base standards are 194 included or not included. These requirements are elaborated 195 below. 197 2.2.2 URL schemes associated with network protocols 199 Most new URL schemes are associated with network resources that 200 have one or several network protocols that can access them. The 201 'ftp', 'news', and 'http' schemes are of this nature. For such 202 schemes, the specification should completely describe how URLs are 203 translated into protocol actions in sufficient detail to make the 204 access of the network resource unambiguous. If an implementation 205 of the URL scheme requires some configuration, the configuration 206 elements must be clearly identified. (For example, the 'news' 207 scheme, if implemented using NTTP, requires configuration of the 208 NTTP server.) 210 2.2.3 Definition of non-protocol URL schemes 212 In some cases, URL schemes do not have particular network protocols 213 associated with them, because their use is limited to contexts 214 where the access method is understood. This is the case, for 215 example, with the "cid" and "mid" URL schemes. For these URL 216 schemes, the specification should describe the notation of the 217 scheme and a complete mapping of the locator from its source. 219 2.2.4 Definition of URL schemes not associated with data resources 221 Most URL schemes locate Internet resources that correspond 222 to data objects that can be retrieved or modified. This is the 223 case with "ftp" and "http", for example. However, some URL schemes 224 do not; for example, the "mailto" URL scheme corresponds to an 225 Internet mail address. 227 If a new URL scheme does not locate resources that are data 228 objects, the properties of names in the new space must be clearly 229 defined. 231 2.2.5 Character encoding 233 When describing URL schemes in which (some of) the elements of 234 the URL are actually representations of sequences of characters, 235 care should be taken not to introduce unnecessary variety in the 236 ways in which characters are encoded into octets and then into 237 URL characters. Unless there is some compelling reason for a 238 particular scheme to do otherwise, translating character sequences 239 into UTF-8 (RFC 2279) [3] and then subsequently using the %HH 240 encoding for unsafe octets is recommended. 242 2.2.6 Definition of operations 244 In some contexts (for example, HTML forms) it is possible to 245 specify any one of a list of operations to be performed on a 246 specific URL. (Outside forms, it is generally assumed to be 247 something you GET.) 249 The URL scheme definition should describe all well-defined 250 operations on the URL identifier, and what they are supposed to 251 do. 253 Some URL schemes (for example, "telnet") provide location 254 information for hooking onto bi-directional data streams, and don't 255 fit the "infoaccess" paradigm of most URLs very well; this should 256 be documented. 258 NOTE: It is perfectly valid to say that "no operation apart from 259 GET is defined for this URL". It is also valid to say that "there's 260 only one operation defined for this URL, and it's not very 261 GET-like". The important point is that what is defined on this type 262 is described. 264 2.3 Demonstrated utility 266 URL schemes should have demonstrated utility. New URL schemes are 267 expensive things to support. Often they require special code in 268 browsers, proxies, and/or servers. Having a lot of ways to say the 269 same thing needless complicates these programs without adding value 270 to the Internet. 272 The kinds of things that are useful include: 274 o Things that cannot be referred to in any other way. 276 o Things where it is much easier to get at them using this scheme 277 than (for instance) a proxy gateway. 279 2.3.1 Proxy into HTTP/HTML 281 One way to provide a demonstration of utility is via a gateway 282 which provides objects in the new scheme for clients using an 283 existing protocol. It is much easier to deploy gateways to a new 284 service than it is to deploy browsers that understand the new URL 285 object. 287 Things to look for when thinking about a proxy are: 289 o Is there a single global resolution mechanism whereby any proxy 290 can find the referenced object? 291 o If not, is there a way in which the user can find any object of 292 this type, and "run his own proxy"? 293 o Are the operations mappable one-to-one (or possibly using 294 modifiers) to HTTP operations? 295 o Is the type of returned objects well defined? 296 - as MIME content-types? 297 - as something that can be translated to HTML? 298 o Is there running code for a proxy? 300 2.4 Are there security considerations? 302 Above and beyond the security considerations of the base mechanism 303 a scheme builds upon, one must think of things that can happen in 304 the normal course of URL usage. 306 In particular: 308 o Does the user need to be warned that such a thing is happening 309 without an explicit request (GET for the source of an IMG tag, 310 for instance)? This has implications for the design of a proxy 311 gateway, of course. 313 o Is it possible to fake URLs of this type that point to different 314 things in a dangerous way? 316 o Are there mechanisms for identifying the requester that can be 317 used or need to be used with this mechanism (the From: field in a 318 mailto: URL, or the Kerberos login required for AFS access in the 319 AFS: URL, for instance)? 321 o Does the mechanism contain passwords or other security 322 information that are passed inside the referring document in the 323 clear (as in the "ftp" URL, for instance)? 325 2.5 Does it start with UR? 327 Any scheme starting with the letters "U" and "R", in particular if 328 it attaches any of the meanings "uniform", "universal" or 329 "unifying" to the first letter, is going to cause intense debate, 330 and generate much heat (but maybe little light). 332 Any such proposal should either make sure that there is a large 333 consensus behind it that it will be the only scheme of its type, or 334 pick another name. 336 2.6 Non-considerations 338 Some issues that are often raised but are not relevant to new URL 339 schemes include the following. 341 2.6.1 Are all objects accessible? 343 Can all objects in the world that are validly identified by a 344 scheme be accessed by any UA implementing it? 346 Sometimes the answer will be yes and sometimes no; often it will 347 depend on factors (like firewalls or client configuration) not 348 directly related to the scheme itself. 350 3. Security considerations 352 New URL schemes are required to address all security considerations 353 in their definitions. 355 4. References 357 [1] Berners-Lee, T., Fielding, R., Masinter, L., "Uniform Resource 358 Identifiers (URI): Generic Syntax", RFC 2396, August 1998 360 [2] Petke, R., "Registration Procedures for URL Scheme Names", 361 RFC [URL-PROCESS], November 1998 363 [3] Yergeau, F., "UTF-8, A Transformation Format of Unicode and ISO 364 10646", RFC 2279, January 1998. 366 5. Authors' Addresses 368 Larry Masinter 369 Xerox Corporation 370 Palo Alto Research Center 371 3333 Coyote Hill Road 372 Palo Alto, CA 94304 373 Fax: +1-415-812-4333 374 EMail: masinter@parc.xerox.com 376 Harald Tveit Alvestrand 377 Maxware, Pirsenteret 378 N-7005 Trondheim 379 NORWAY 380 Voice: +47 73 54 57 00 381 EMail: harald.alvestrand@maxware.no 383 Dan Zigmond 384 WebTV Networks, Inc. 385 305 Lytton Avenue 386 Palo Alto, CA 94301 387 USA 388 Voice: +1-650-614-6071 389 EMail: djz@corp.webtv.net 391 Rich Petke 392 UUNET Technologies 393 5000 Britton Road 394 P. O. Box 5000 395 Hilliard, OH 43026-5000 396 Voice: +1-614-723-4157 397 Fax: +1-614-723-1333 398 EMail: rpetke@wcom.net