Internet-Draft Herbert Document: draft-vandesompel-info-uri-00.txt Van de Sompel Expires: March 2004 Los Alamos National Laboratory Tony Hammond Elsevier Eamonn Neylon Manifest Solutions Stuart L. Weibel OCLC Online Computer Library Center September 2003 The "info" URI Scheme for Information Assets with Identifiers in Public Namespaces Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document defines the "info" Uniform Resource Identifier (URI) scheme for information assets with identifiers in public namespaces. Namespaces participating in the "info" URI scheme are regulated by an "info" Registry mechanism. Table of Contents 1 Introduction..................................................2 Van de Sompel Expires ū March 2004 [Page 1] The "info" URI Scheme September 2003 2 Terminology...................................................3 3 Application of the "info" URI Scheme..........................3 4 The "info" Registry...........................................4 5 The "info" URI Scheme.........................................5 6 Normalization and Comparison of "info" URIs...................8 7 Rationale.....................................................8 8 Security Considerations.......................................9 9 Acknowledgements.............................................10 10 Normative References.......................................10 11 Non-Normative References...................................10 12 Authors' Addresses.........................................11 13 Full Copyright Statement...................................12 1 Introduction This document defines the "info" Uniform Resource Identifier (URI) scheme for information assets that have identifiers in public namespaces that are not part of the URI allocation. By information asset this document intends any information construct ū i.e. any abstraction or manifestation - that has identity within a public namespace. There exist many information assets with identifiers in public namespaces that are not referenceable by URI schemes. Examples of such namespaces include Dewey Decimal Classifications [DEWEY], Library of Congress Control Numbers [LCCN], NASA Astrophysics Data System Bibcodes [BIBCODE], and Open Archives Initiative identifiers [OAI]. Other possible namespaces could include National Library of Medicine PubMed identifiers [PUBMED], International DOI Foundation Digital Object Identifiers (DOI) [DOI], Publisher Item Identifiers (PII) [PII], NISO Serial Item and Contribution Identifier (SICI) codes [SICI], and NISO OpenURL Framework identifiers [OFI]. The "info" URI scheme facilitates the referencing of information assets that have identifiers in such public namespaces by means of URIs. When referencing an information asset by means of its "info" URI, the asset SHALL be considered a "resource" as defined in RFC 2396 [RFC2396]. As such, the "info" URI scheme enables public namespaces that are not part of the URI allocation to become part of the URI allocation. The "info" URI scheme thus provides a bridging mechanism to allow public namespaces to become part of the URI allocation. Namespaces declared under the "info" URI scheme are regulated by an "info" Registry mechanism. The "info" Registry allows a public namespace that is not part of the URI allocation to be declared in a registration process by the organization that manages it (the Namespace Authority). The "info" Registry supports the Van de Sompel Expires - March 2004 [Page 2] The "info" URI Scheme September 2003 declaration of public namespaces that are not part of the URI allocation in a manner that facilitates the construction of URIs for information assets without imposing the necessity of URI maintenance on the Namespace Authority. Information assets identified within a registered namespace SHALL be added or deleted according to the business processes of the Namespace Authority, and yet MAY be referenced within network applications via the "info" URI in an open, standardized way without additional action on the part of the Namespace Authority. The "info" URI scheme explicitly decouples identification from resolution. Applications SHOULD NOT assume that an "info" URI can be dereferenced to a representation of the resource identified by the URI, though some business processes MAY make "info" URIs resolvable either directly or conditionally. The purposes of the "info" URI scheme are the identification of information assets and the standardization of rules for declaring and comparing identity of information assets without regard to any resolution of the URI or even whether the information asset identified by the URI is accessible on the Internet. 2 Terminology In this document the key words "MUST", "MUST NOT", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", and "MAY" are to be interpreted as described in RFC 2119 [RFC2119] and indicate requirement levels for compliant implementations. 3 Application of the "info" URI Scheme Public namespaces that are used for the identification of information assets, and that are not referenceable within the URI allocation, MAY be registered as namespaces within the "info" Registry. Namespace Authorities MAY register these namespaces in the "info" URI Registry, thereby making these namespaces addressable in applications that use URIs. Registrations of public namespaces that are not part of the URI allocation by parties other than the Namespace Authority SHALL NOT be permitted, thereby insuring against hostile usurpation or inappropriate usage of registered service marks or the public namespaces of others. Registration under the "info" Registry of a public namespace that is not part of the URI allocation implies no particular functionalities of the identifiers from the registered namespace other than the identification of information assets within the namespace being registered. There is no assurance of resolution mechanisms associated with the "info" URI scheme, though for any particular namespace there MAY exist mechanisms for resolving identifiers to information assets. The definition of such services Van de Sompel Expires - March 2004 [Page 3] The "info" URI Scheme September 2003 falls outside the scope of the "info" URI scheme. Registration does not define namespace-specific semantics for identifiers within a registered namespace, though allowable character sets and normalization rules are specified in sections 5 and 6 so as to ensure that the URIs created using such identifiers are compliant with applications that use URIs. The registration of a public namespace in the "info" Registry SHALL NOT preclude further development of services associated with that namespace which MAY qualify the namespace for additional publication elsewhere within the URI allocation. 4 The "info" Registry The "info" Registry provides a mechanism for the registration of public namespaces that are used for the identification of information assets, irrespective of whether such assets are accessible on the Internet. NISO [NISO], the National Information Standards Organization, will act as the Maintenance Authority for the "info" Registry, and will delegate the day-to-day operation of the "info" Registry to a Registry Operator. As the Maintenance Agency, NISO will ensure that the Registry Operator operates the "info" Registry in accordance with publicly articulated policy established under NISO governance and available at . The "info" Registry policy defines a review process for candidate namespaces and provides measures of quality control and suitability for entry of namespaces. 4.1 Management Characteristics of the "info" Registry The "info" Registry will be managed according to policies established under the auspices of NISO. All such policies, as well as the namespace declarations in the "info" Registry, will be public. 4.2 Functional Characteristics of the "info" Registry The "info" Registry will be publicly accessible and will support discovery (by both humans and machines) of: . string literals identifying the namespaces . names and contact information of Namespace Authorities . syntax requirement for identifiers maintained in such namespaces . normalization methodology for identifiers maintained in such namespaces Van de Sompel Expires - March 2004 [Page 4] The "info" URI Scheme September 2003 . if applicable, resolution mechanism, and accompanying security considerations, for identifiers maintained in such namespaces where the Registry entries refer to the corresponding "namespace" and "identifier" components which are defined in the ABNF given in section 5.1 of this document. 4.3 Maintenance of the "info" Registry The public namespaces that MAY be registered in the "info" Registry will be those of interest to the communities served by NISO and therefore NISO is committed to act as Maintenance Authority for the "info" Registry, and to assign a Registry Operator to operate it on a day-to-day basis. NISO, a non-profit association accredited by the American National Standards Institute (ANSI), identifies, develops, maintains, and publishes technical standards to manage information in the digital environment. NISO standards apply technologies to the full range of information-related needs, including retrieval, re-purposing, storage, metadata, and preservation. Founded in 1939, incorporated as a not-for-profit education association in 1983, and assuming its current name the following year, NISO draws its support from the communities it serves. The leaders of over 70 organizations in the fields of publishing, libraries, IT and media serve as its voting members. Hundreds of experts and practitioners serve on NISO committees and as officers of the association. NISO has been designated by ANSI to represent US interests to the International Organization for Standardization's (ISO) Technical Committee 46 on Information and Documentation. The NISO headquarters office is located at: 4733 Bethesda Ave., Bethesda, MD 20814, USA. 5 The "info" URI Scheme 5.1 Definition of "info" URI Syntax The "info" URI syntax presented in this document conforms to the generic URI syntax defined in RFC 2396 [RFC2396]. This specification uses the Augmented Backus-Naur Form (ABNF) notation of RFC 2234 [RFC2234] to define the URI. The following core ABNF productions are used by this specification as defined by Section Van de Sompel Expires - March 2004 [Page 5] The "info" URI Scheme September 2003 6.1 of RFC 2234: ALPHA, DIGIT, HEXDIG. The complete URI syntax is as follows: info-uri = info-scheme ":" info-identifier info-scheme = "info" info-identifier = info-namespace "/" identifier info-namespace = ALPHA *name-char identifier = *path-char name-char = ALPHA / DIGIT / "+" / "-" / "." path-char = unescaped / escaped unescaped = ALPHA / DIGIT / "-" / "_" / "." / "!" / "~" / "*" / "'" / "(" / ")" / ";" / ":" / "@" / "&" / "=" / "+" / "$" / "," escaped = "%" HEXDIG HEXDIG An "info" URI has an "info-identifier" as its scheme-specific part. An "info-identifier" is constructed by appending an "identifier" component to an "info-namespace" component separated by a slash "/" character. The "info" scheme is hierarchical as indicated by the presence of the slash "/" character but is pruned to be one level deep. Values for the "info-namespace" component of the "info" URI are name tokens composed of name characters ("name-char") only. They identify the public namespace in which the value for the "identifier" component originates, and are registered in the "info" Registry, which guarantees their uniqueness. Although the "info-namespace" component is case-insensitive, the canonical form is lowercase and documents that specify values for the "info- namespace" component MUST do so using lowercase letters. An implementation SHOULD accept uppercase letters as equivalent to lowercase in "info-namespace" names, for the sake of robustness, but SHOULD only generate lowercase "info-namespace" names, for consistency. Values for the "identifier" component of the "info" URI are opaque strings composed of path segment characters ("path-char"). In their originating public namespace, the values for the "identifier" component identify an information asset. The values for the "identifier" component MUST be %-escaped, and thus display Van de Sompel Expires - March 2004 [Page 6] The "info" URI Scheme September 2003 no visible URI structure. The "identifier" component MUST be treated as case-sensitive, although the "info" Registry MAY record the case-sensitivity of identifiers from within particular registered public namespaces. 5.2 Allowed Characters Under the "info" URI Scheme The "info" URI syntax uses the same set of allowed US-ASCII characters as specified in RFC 2396 [RFC2396] for a generic URI. Reserved characters as well as excluded US-ASCII characters and non-US-ASCII characters MUST be %-escaped before forming the URI. Details of the %-escape encoding can be found in RFC 2396, section 2.4. 5.3 Examples of "info" URIs Some examples of syntactically valid "info" URIs are given below: a) info:ddc/22%2Feng%2F%2F004.678 where "ddc" is the "info-namespace" component for the namespace of Dewey Decimal Classifications [DEWEY] and "22%2Feng%2F%2F004.678" is the "identifier" component for an identifier of an information asset within that namespace in %-escaped form. In unescaped form this identifier is "22/eng//004.678", i.e. the (22nd Ed.) English- language Dewey Decimal Classification for "Internet". b) info:lccn/2002022641 where "lccn" is the "info-namespace" component for a Library of Congress Control Number [LCCN] namespace and "2002022641" is the "identifier" component for an identifier of an information asset within that namespace. c) where "bibcode" is the "info-namespace" component for a NASA ADS Bibcode [BIBCODE] namespace and "2003Icar..163..263Z " is the "identifier" component for an identifier of an information asset within that namespace. d) info:oai/arXiv.org:hep-th%2F9901001 where "oai" is the "info-namespace" component for the namespace of OAI identifiers [OAI] and "arXiv.org:hep-th%2F9901001" is the "identifier" component for an identifier of an information asset within that namespace in %-escaped form. In unescaped form this identifier is represented using the Unicode [UNICODE] character Van de Sompel Expires - March 2004 [Page 7] The "info" URI Scheme September 2003 set and is encoded in UTF-8 [RFC2279] as "arXiv.org:hep- th/9901001". 6 Normalization and Comparison of "info" URIs In order to facilitate comparison of "info" URIs, a sequence of normalization steps SHOULD be applied to minimize the amount of software processing. Since the "info" URI is case-sensitive, a canonical form MAY only be arrived at by consulting the "info" Registry for possible information on the case-sensitivity of identifiers from a registered public namespace, and any case normalization step to apply. The following normalization steps SHOULD be applied: a) Normalize the case of the URI scheme token to be lowercase b) Normalize the case of the "info-namespace" component to be lowercase c) Unescape all unreserved %-escaped characters d) Normalize the case of any %-escaped characters to be uppercase The following unnormalized forms of an "info" URI U1. INFO:OAI/arXiv.org:hep-th%2F9901001 U2. info:oai/ARXIV.ORG:hep-th%2f9901001 U3. info:oai/arXiv.org:hep-th%2f9901001 U4. info:OAI/arXiv.org%3AHEP-TH%2F9901001 are normalized to the following respective forms N1. info:oai/arXiv.org:hep-th%2F9901001 N2. info:oai/ARXIV.ORG:hep-th%2F9901001 N3. info:oai/arXiv.org:hep-th%2F9901001 N4. info:oai/arXiv.org:HEP-TH%2F9901001 If the "oai" public namespace were registered in the "info" Registry as being case-insensitive and normalized to a lowercase form, then the above URI forms can be reduced to the following unique canonical form N0. info:oai/arxiv.org:hep-th%2F9901001 7 Rationale Van de Sompel Expires - March 2004 [Page 8] The "info" URI Scheme September 2003 7.1 Why Create a New URI Scheme for Identifiers from Public Namespaces? Under RFC 2718, "Guidelines for new URL Schemes" [RFC2718], it is stated that a URI scheme SHOULD have a "demonstrated utility", and in particular SHOULD be applied to "things that cannot be referred to in any other way". The "info" URI scheme allows identifiers within registered public namespaces, used for the identification of information assets, to be referred to within the URI allocation. Once a namespace is registered in the "info" Registry, the "info" URI scheme enables an information asset with an identifier in that namespace to be referenced by means of a URI. As a result, the information asset SHALL be considered a resource as defined in RFC 2396 [RFC2396]. 7.2 Why Not Use a URN Namespace ID for Identifiers from Public Namespaces? RFC 2396 [RFC2396] states that a "URN differs from a URL in that itĘs primary purpose is persistent labeling of a resource with an identifier". An "info" URI on the other hand does not assert persistence of resource names or of the resource itself, but rather declares namespaces of identifiers managed according to the policies and business models of the Namespace Authorities. Some of these namespaces will not have persistence of identifiers as a primary purpose, while others will have locator semantics as well as name semantics. It would therefore be inappropriate to employ a URN Namespace ID for such namespaces. 8 Security Considerations The "info" URI scheme syntax is subject to the same security considerations as the generic URI syntax described in RFC 2396 [RFC2396]. An "info" URI provides no assurance that any resolution mechanism is deployed and therefore security considerations regarding the use of "info" URIs MUST be limited solely to the nature of the identifier string itself. This document makes no guarantees about the safety of resolving "info" URIs. Should a namespace registered under the "info" scheme provide mechanisms for resolving "info" URIs to information assets, an implementation MUST consult the "info" Registry for any security considerations. The definition of such resolution mechanisms falls outside the scope of the "info" URI scheme. If an implementation for whatever reason cannot consult the "info" Registry for entries regarding a particular "info" namespace it Van de Sompel Expires - March 2004 [Page 9] The "info" URI Scheme September 2003 MUST NOT make any assumptions regarding security arrangements for that namespace beyond the nature of the identifier string itself. 9 Acknowledgements The authors acknowledge the contributions of Michael Mealling, Verisign, and Patrick Hochstenbach, Los Alamos National Laboratory. 10 Normative References [RFC2119] Bradner, S., "Key Words for Use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. Retrieved September 20, 2003 from . [RFC2234] Crocker, D.H. and Overell, P., "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997. Retrieved September 20, 2003 from . [RFC2279] Yergeau, F., "UTF-8, A Transformation Format for Unicode and ISO10646", RFC 2279, October 1996. Retrieved September 20, 2003 from . [RFC2396] Berners-Lee, T., R. Fielding and L. Manister, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998. Retrieved September 20, 2003 from . [RFC2718] Masinter, L., H. Alvestrand, D. Zigmond and P. Petke, "Guidelines for new URL Schemes", RFC 2718, November 1999. Retrieved September 20, 2003 from . [UNICODE] The Unicode Consortium. The Unicode Standard, Version 4.0.0, defined by: The Unicode Standard, Version 4.0 (Reading, MA, Addison-Wesley, 2003). ISBN 0-321-18578-1. 11 Non-Normative References [BIBCODE] "NASA Astrophysics Data System Bibliographic Code". Retrieved August 1, 2003 from . [DEWEY] "Dewey Decimal Classification". Retrieved September 20, 2003 from . Van de Sompel Expires - March 2004 [Page 10] The "info" URI Scheme September 2003 [DOI] ANSI/NISO Z39.84-2000, "Syntax for the Digital Object Identifier", ISBN 1-880124-47-5. Retrieved September 25, 2003 from . [LCCN] "Library of Congress Control Number". Retrieved August 1, 2003 from . [NISO] National Information Standards Organization. Retrieved August 1, 2003 from . [OAI] Lagoze, C., H. Van de Sompel, M. Nelson and S. Warner. "Specification and XML Schema for the OAI Identifier Format", June 2002. Retrieved September 4, 2003 from . [OFI] Draft Standard for Trial Use ANSI/NISO Z39.88, "The OpenURL Framework for Context-Sensitive Services". Retrieved September 20, 2003 from [PII] "Publisher Item Identifier as a means of document identification". Retrieved September 25, 2003 from . [PUBMED] "PubMed Overview", National Library of Medicine. Retrieved September 25, 2003 from . [SICI] ANSI/NISO Z39.56-1996 (R2002), "Serial Item and Contribution Identifier (SICI)", ISBN 1-880124-28-9. Retrieved September 25, 2003 from 12 Authors' Addresses Herbert Van de Sompel Los Alamos National Laboratory, Research Library, MS-P362, PO Box 1663, Los Alamos, NM 87545-1362, USA herbertv@lanl.gov Tony Hammond Van de Sompel Expires - March 2004 [Page 11] The "info" URI Scheme September 2003 Elsevier Ltd 32 Jamestown Road London, NW1 7BY, UK t.hammond@elsevier.com Eamonn Neylon Manifest Solutions Bicester Oxfordshire, OX26 2HX, UK eneylon@manifestsolutions.com Stu Weibel OCLC Online Computer Library Center, Inc. 6565 Frantz Road Dublin, OH 43017-3395, USA weibel@oclc.org 13 Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process MUST be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Van de Sompel Expires - March 2004 [Page 12]