INTERNET DRAFT Ron Daniel draft-ietf-urn-http-conv-00.txt Los Alamos National Laboratory 21 Nov, 1996 Conventions for the Use of HTTP for URN Resolution Status of this Memo =================== This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). This draft expires 21 May, 1997. Abstract: ========= The URN-WG was formed to specify persistent, location-independent names for network accessible resources, and resolution mechanisms to retrive the resources given such a name. At this time the URN-WG is considering one particular resolution mechanism, the NAPTR proposal [1]. That proposal does not get the client software all the way from the URN to the resource. Instead, it gets the client from a URN to a "resolver", which is a system that can then tell the client where the resource is. The NAPTR draft defines a "resolution protocol" to be the protocol used to speak to a resolver in order to obtain the resource, its location(s), or other information about the resource. The NAPTR proposal allows different resolution protocols to be used for commuicating with resolvers. This draft establishes conventions for encoding URN resolution requests and responses in HTTP 1.0 (and 1.1) requests and responses. The primary goal of this draft is to define a convention that is simple to implement and will allow existing HTTP servers to easily add support for URN resolution. We expect that the resolution databases that arise will be useful when more sophisticated resolution protocols are developed later. 1.0 Introduction: ================== The NAPTR draft[1] describes a way of using DNS to locate resolvers for URIs. That draft provides places to specify the "resolution protocol" spoken by the resolver, as well as the "resolution services" it offers. As of this writing, the "resolution protocols" allowed by the NAPTR draft are HTTP, RCDS, HDL, and RWHOIS. (That list is expected to grow over time). The NAPTR draft also lists a variety of resolution services, such as N2L (given a URN, return a URL); N2R (Given a URN, return the named resource), etc. This draft specifies the conventions to follow to encode resolution service requests in the HTTP protocol, allowing widely available HTTP daemons to serve as URN resolvers. The reader is assumed to be familiar with the HTTP/1.0 [2] and 1.1 [3] specifications. 2.0 General Approach: ===================== The general approach used to encode resolution service requests in HTTP is quite simple: GET /uri-res// HTTP/1.0 For example, if we have the URN "cid:foo@huh.com" and want a URL, we would send the request: GET /uri-res/N2L/cid:foo@huh.com HTTP/1.0 Because of the character set limitations on URIs, we might wish to encode the '@' character as its hex equivalent, thus the request would be GET /uri-res/N2L/cid:foo%40huh.com HTTP/1.0 The request could also be encoded as an HTTP 1.1 request. This would look like: GET /uri-res/N2L/cid:foo%40huh.com HTTP/1.1 Host: Handling these requests on the server side is easy to implement in a number of ways. The N2L request could be handled by a CGI script that took the incoming URN, looked it up in a database, and returned the URL as an HTTP redirect. Service requests like N2R or N2C could be set up so that the daemon answered the request by returning files out of N2R/ and N2C/ directories, or they could be handled by a script. One caveat should be kept in mind. The "urn:" prefix is still the subject of controversy, so some URN documents advocate treating it as optional. HTTP resolvers MUST return identical results for URIs that do and do not contain the "urn:" prefix. For example, the two request below must return identical results: GET /uri-res/N2L/cid:foo%40huh.com HTTP/1.0 GET /uri-res/N2L/urn:cid:foo%40huh.com HTTP/1.0 Responses from the HTTP server follow standard HTTP practice. Status codes, such as 200 (OK) or 404 (Not Found) shall be returned. The normal rules for determining cachability, negotiating formats, etc. apply. 3.0 Service-specific details: ============================= This section goes through the various resolution services established in the URN Framework draft [4] and states how to encode each of them, how the results should be returned, and any special status codes that are likely to arise. Unless stated otherwise, the HTTP requests are formed according to the simple convention above, either for HTTP/1.0 or HTTP/1.1. The response is assumed to be an entity with normal headers and body unless stated otherwise. (N2L is the only request that does not return a body). 3.1 N2L (URN to URL): ---------------------- The request is encoded as above. The URL MUST be returned in a Location: header for the convienience of the most common case of wanting the resource. A 30X status line SHOULD be returned. HTTP/1.1 clients should be sent the 303 status code. HTTP/1.0 clients should be sent the 302 (Moved temporarily) status code unless the resolver has particular resons for using 301 (moved permanently) or 304 (not modified) codes. 3.2 N2Ls (URN to URLs): ------------------------ The request is encoded as above. The result is a list of 0 or more URLs. The Internet Media Type (aka ContentType) of the result may be negotiated using standard HTTP mechanisms if desired. At a minimum the resolver should support the text/uri-list media type. (See Appendix A for the definition of this media type). That media type is suitable for machine-processing of the list of URLs. Resolvers may also return the results as text/html, text/plain, or any other media type they deem suitable. No matter what the particular media type, the result MUST be a list of the URLs which may be used to obtain an instance of the resource identified by the URN. All URIs shall be encoded according to the URI specification [5]. If the client has requested the result be returned as text/html or application/html, the result should be encoded as: where the strings ...url n... are replaced by the n'th URL in the list. 3.3 N2R (URN to Resource): --------------------------- The request is encoded as above. The resource is returned using standard HTTP mechanisms. The request may be modified using the Accept: header as in normal HTTP to specify that the result be given in a preferred Internet Media Types. 3.4 N2Rs (URN to Resources): ----------------------------- This resolution service returns multiple instances of a resource, for example, GIF and JPEG versions of an image. The judgment about the resources being "the same" resides with the naming authority that issued the URN. The request is encoded as above. The result shall be a MIME multipart/alternative message with the alternative versions of the resource in seperate body parts. If there is only one version of the resource identified by the URN, it MAY be returned without the multipart/alternative wrapper. Resolver software SHOULD look at the Accept: header, if any, and only return versions of the resource that are acceptable according to that header. 3.5 N2C (URN to URC): ---------------------- URCs (Uniform Resource Characteristics) are descriptions of other resources. This request allows us to obtain a description of the resource identified by a URN, as opposed to the resource itself. The description might be a bibliographic citation, a digital signature, a revision history, etc. This draft does not specify the content of any response to a URC request. That content is expected to vary from one resolver to another. The format of any response to a N2C request MUST be communicated using the ContentType header, as is standard HTTP practice. The Accept: header SHOULD be honored. 3.6 N2Ns (URN to URNs): ------------------------ While URNs are supposed to identify one and only one resource, that does not mean that a resource may have one and only one URN. For example, consider a resource that has something like "current-weather-map" for one URN and "weather-map-for-datetime-x" for another URN. The N2Ns service request lets us obtain lists of URNs that are believed equivalent at the time of the request. As the weathermap example shows, some of the equivalances will be transitory, so the standard HTTP mechanisms for communicating cachability MUST be honored. The request is encoded as above. The result is a list of all the URNs, known to the resolver, which identify the same resource as the input URN. The result shall be encoded as for the N2Ls request above (text/uri-list unless specified otherwise by an Accept: header). 3.7 L2Ns (URL to URNs): ---------------------- The request is encoded as above. The response is a list of any URNs known to be assigned to the resource at the given URL. The result shall be encoded as for the N2Ls and N2Ns requests. 3.8 L2Ls (URL to URLs): ------------------------ The request is encoded as described above. The result is a list of all the URLs that the resolver knows are associated with the resource located by the given URL. This is encoded as for the N2Ls, N2Ns, and L2Ns requests. 3.9 L2C (URL to URC): ---------------------- The request is encoded as above, the response is the same as for the N2C request. Implementation Notes: ===================== This section gives an example of how to configure a web server to respond to the N2L resolution request. It is not intended to specify standard behavior, it is provided here merely as a courtesy for implementors. First, we assume the presence of a CGI script, n2l.pl, that processes the provided URN, converting it into a canonical format. It would remove any "urn:" prefix, decode any %encoded special characters, normalize case-insensitive parts of the URN to lower case, etc. It would then use the normalized URN as the key for a search in a database, which we assume returns the URL as a string. A sample of our implementation of that script is provided as Appendix B. We will further assume that the n2l.pl script is in the cgi-bin directory of the web server. The easiest way to invoke the n2l.pl script in response to N2L requests is to set up a Redirect directive in the srm.conf file. (This works for servers based on the original NCSA HTTP daemon, such as Apache.) The relevant Redirect directives might look like: Redirect /uri-res/N2L http://urn.acl.lanl.gov/cgi-bin/n2l.pl Redirect /uri-res/L2N http://urn.acl.lanl.gov/cgi-bin/l2n.pl Appendix A: The text/uri-list Internet Media Type ================================================= [This appendix will be augmented or replaced by the registration of the text/uri-list IMT once that registration has been performed]. Several of the resolution service requests, such as N2Ls, N2Ns, L2Ns, L2Ls, result in a list of URIs being returned to the client. The text/uri-list Internet Media Type is defined to provide a simple format for the automatic processing of such lists of URIs. The format of text/uri-list resources is: 1) Any lines beginning with the '#' character are comment lines and are ignored during processing. (Note that '#' is a character that may appear in URIs, so it only denotes a comment when it is the first character on a line). 2) The remaining non-comment lines MUST be URIs (URNs or URLs), encoded according to the URI specification RFC[5]. Each URI shall appear on one and only one line. 3) As for all text/* formats, lines are terminated with a CR LF pair. In applications where one URI has been mapped to a list of URIs, such as in response to the N2Ls request, the first line of the text/uri-list response SHOULD be a comment giving the original URI. An example of such a result for the N2L request is shown below in figure 1. # urn:cid:foo@huh.org http://www.huh.org/cid/foo.html http://www.huh.org/cid/foo.pdf ftp://ftp.foo.org/cid/foo.txt Figure 1: Example of the text/uri-list format Appendix B: n2l.pl script ========================== This is a simple perl script for the N2L resolution service. It assumes the presence of a DBM database to store the URN to URL mappings. #!/bin/perl # N2L - performs urn to url resolution $n2l_File = "...filename for DBM database..."; $urn = $ENV{'PATH_INFO'} ; if(length($urn)<3) { $error=1; } if(!$error) { $urn =~s/^(\/)(urn:)?(.*)/$3/i; # Additional canonicalization should be performed here dbmopen(%lu,$n2l_File,0444); if($lu{$urn}) { $url=$lu{$urn}; print STDOUT "Location: $url\n\n"; }else{ $error=2; } dbmclose(%lu); } if($error) { print "Content-Type: text/html \n\n"; print "\n"; print "URN Resolution: N2L\n"; print "\n"; print "

URN to URL resolution failed for the URN:

\n"; print "

$urn

\n"; print "\n"; print "\n"; } exit; References: =========== [1] Ron Daniel and Michael Mealling, "Resolution of Uniform Resource Identifiers using the Domain Name System", draft-ietf-urn-naptr-01.txt, November, 1996. [2] RFC 1945, "Hypertext Transfer Protocol -- HTTP/1.0", T. Berners-Lee, R. Fielding, H. Frystyk, May 1996. [3] R. Fielding, J. Gettys, J.C. Mogul, H. Frystyk, T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", draft-ietf-http-v11-spec-06, July 1996. [4] URN Framework draft - [5] RFC 1630, "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", T. Berners-Lee, June 1994. Security Considerations ======================= Communications with a resolver may be of a sensitive nature. Some resolvers will hold information that should only be released to authorized users. The results from resolvers may be the target of spoofing, especially once electronic commerce transactions are common and ther is money to be made by directing users to pirate repositories rather than repositories which pay royalties to rightsholders. Resolution requests may be of interest to traffic analysts. The requests may also be subject to spoofing. The requests and responses in this draft are amenable to encoding, signing, and authentication in the manner of any other HTTP traffic. Author Contact Information: =========================== Ron Daniel Los Alamos National Laboratory MS B287 Los Alamos, NM, USA, 87545 voice: +1 505 665 0597 fax: +1 505 665 4939 email: rdaniel@lanl.gov This draft expires 21 May, 1997. Ron Daniel Jr. email: rdaniel@acl.lanl.gov Advanced Computing Lab voice: (505) 665-0597 MS B-287 TA-3 Bldg. 2011 fax: (505) 665-4939 Los Alamos National Lab http://www.acl.lanl.gov/~rdaniel/ Los Alamos, NM, 87545 obscure_term: "hypernym"