Network Working Group Sam X. Sun INTERNET-DRAFT CNRI Expires Jan, 16, 1999 July 16, 1998 draft-sun-handle-system-01.txt Handle System: A Persistent Global Name Service Overview and Syntax Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract The Handle System (r) is a comprehensive system for assigning, managing, and resolving persistent identifiers, known as 'handles' for digital objects and other resources on the Internet. Handles can be used as Uniform Resource Names (URNs). The Handle System defines:(a) an open set of protocols, (b) a global namespace, and (c) a distributed service model that provides the global name service. The system allows Internet resources to be named as handles. A handle may contain the information necessary to locate and access its named resources. This associated information can be changed as needed to reflect the current state of the identified resource without changing the handle, thus allowing the name of the item to persist over changes of location and other state information. Combined with a centrally administered naming authority registration service, the Handle System provides a general purpose, distributed global name service for the reliable management of information on networks over long periods of time. (Note that in this document we do not attempt to distinguish between the terms 'name' and 'indentifier' and will use them interchangably.) 1. Introduction The Handle System is a distributed information system that provides a persistent naming service for use on networks such as the Internet. Handles can be used to identify any network resources. Each handle may be assigned with a set of typed values that describes its named object. The Handle System provides the handle resolution service that allows these values to be retrieved. It also provides the handle administration service that allows individual handles to have their own administrator(s) assigned, and be administrated over the distributed environment. The Handle System ensures that every handle is unique within the context of the Handle System and may be retained and resolved over long time periods. The resolution information associated with each handle can be changed as needed, allowing the handle to persist over changes in location and other states of the named resource. Specifically, the Handle System was designed to address the following problems in network resource identification: * Persistence A named resource can outlast any specific computer system or organization. Any resource name which is inextricably linked with a specific system or name of an organization will not be able to survive the demise or radical change of that computer system or organization. By separating the object's name from location, ownership, and other state information, the Handle System allows that identifier to persist over time. * Location independence With handles, the name of the item is unrelated to the location of the item. This allows easy reorganization of information. Handles make it possible to transfer resources from one organization to another without affecting or breaking the existing user references (i.e., handles) to those collections. This is not possible using location based references. * Multiple instances of an item A single handle can refer to more than one instance of a network resource. A network service may thus define multiple entry points for its service with a single handle name. This allows the service to distribute its service load into multiple instances. The Handle System has been implemented and is currently in use in a number of prototype projects, including efforts with the Library of Congress, the Association of American Publishers, the Defense Technical Information Center, and the United States Information Agency. This is the first of a series of planned documents that will specify the handle protocol and services, and relate the Handle System to other IETF activities in URN/URI/URL working groups. This document provides a concise overview of the system and the syntax of handles. Additional information can be found on CNRI and related project web sites [4, 5, 6, 8, 16, 17, 18, 19]. 2. Handle Syntax Every handle in the Handle System is defined in two parts: its naming authority, otherwise know as its prefix, and a unique local name under that naming authority, otherwise known as its suffix. The Handle System protocol mandates UTF-8 [2] as the only encoding for any handles specified in the protocol packet. The naming authority identifies the administrative unit of the underlying handles. It is globally unique and will be persistent once obtained. Naming authorities may consist of any UTF-8 encoded characters defined in the Unicode 2.0 [1] standard except '.' (%x0E) or '/' (%x2F). ASCII characters in the naming authority are case insensitive and are converted into upper case before resolution taking place. The local name under the naming authority may consist of any UTF-8 encoded characters defined in the Unicode 2.0. It does not impose any reserved or excluded characters. ACSII characters within the local name are case insensitive, and are converted into upper case before resolution taking place. The following is the handle syntax described in ABNF [21] notation: = "/" = *( "." ) = 1*( %x00-FF ) ; any octets that map to UTF-8 encoded ; Unicode 2.0 characters. = 1*( %x00-2D / %x30-FF ) ; any octets that map to UTF-8 encoded ; Unicode 2.0 characters except octets ; %x2E-2F which map to ASCII characters ; '.' and '/'. Here are some examples of valid handles that may be used in the Handle System protocol: cnri.dlib/july95-arms 10.1002/0002-8231(199601)47:1<1:SPOTEO>2.3.TX;2-K any-printable-characters/a-zA-Z0-9!@#$%^&*()_"<>,.?/`~|\ handles-in-germany/Universit~{#?~}-Karlsruhe 3. Handle data A handle within the Handle System is associated with, and can be resolved to, one or more elements of typed data. Examples of data types in use include URLs, object request brokers, and other URNs. Other examples might include e-mail addresses or public key certificates. There is a controlled set of named types accepted by the system. This list can be extended as needed at the system level. Each handle will also have its administrative data. The administrative data, e.g., permissions to create handles or edit handle data, is initially provided by the handle server when the handle was first created. The administrative data can be used to define the handle administrator that manages the handle data. This administrative data is not returned as part of the handle resolution but is used for handle administration only. Other than the relationship between the Global Handle Registry and local handle services described above, there are no hierarchical relationships assumed among handle records. Note, however, that handles can include in their associated data references to other handles, thus allowing hierarchical or other relationships to be constructed as needed. 4. Using Handles in the World Wide Web 4.1 Handle URI syntax The Handle Syntax in section 2 defines the encoding rules for handles transferred over the wire via the Handle System protocol. Handles may also be referenced as a URI [22], which can be used in Web browsers or in HTML mark-up documents to refer to persistent Internet resources. The Handle URI syntax defines the syntax rule for handles specified in the URI format. Handles defined as a Handle URI may be resolved by the Handle System Resolver [4]. The Handle System Resolver will convert the URI into the Handle (as defined in section 2) before doing the resolution. The Handle URI Syntax is defined as follows: = "hdl:" [ "@" ] = [ ] [ "type=" ] = 1*40( %x01-7F ) ; A registered charset name [23] from IANA, ; which may be any printable ASCII characters. = 1*(%x30-39) ; digits 0 - 9. = 1*( %x00-%xFF ) ; Octets that encodes a using the ; in the optional . ; If no specified, UTF-8 encoding ; is the default encoding. When UTF-8 is the encoding used, the Handle URI Syntax has two reserved characters, % and ". The character % is used for hex encoding, which is necessary to allow any handles specified from the standard keyboard. And the character " is reserved to allow handles to be separated from the surrounding text in HTML documents. Reserved characters must be hex encoded when used in the URI context. The choice of % and hex encoding is also compatible with the current URI practice. Because some browser implementation (incorrectly!) drops the # character when processing the URI regardless of its scheme, hex encoding of character # is also recommended. Examples of handles using Handle URI Syntax are: hdl:cnri.dlib/july95-arms (which refers to handle "cnri.dlib/july95-arms") and hdl:handle-with-hex-encoding/handle%25abc (which refers to handle "handle-with-hex-encoding/handle%abc") It's worth noting that the handle namespace by itself does not impose any hex or escape encoding, nor does the underlying Handle System. The reserved characters and hex encoding are introduced only when handles are used in the URI context. It is the client software's responsibility to decode any hex encoding in the handle URI before sending the handles out for resolution. And on systems where other character set encoding is used, it is also the client software's responsibility to convert a natively displayed handle to its UTF-8 encoding before sending it out for resolution. 4.2 Handle Resolution service from Web browsers Handles specified using Handle URI Syntax (ie, hdl:) can be resolved from a Web browser directly using the Handle System Resolver [4]. The Resolver is a freely available extension to the current popular Web browsers. It resolves handles into corresponding URIs, which are then retrieved by the browser in the normal fashion. This is the suggested way to resolve the handles in the future, because it provides better performance, is more scalable, and is locally configurable. Handles can also be resolved using proxy services using Handle Proxy Syntax (ie, http:/). In this case, the proxy server performs the handle resolution task, and sends the resulting URL to the client browser for processing. Currently, CNRI provides global handle proxy server through "hdl.handle.net", and "dx.doi.org". The proxy server allows handles to be resolved without additional software for the client. For example, a handle "cnri.dlib/july95-arms" may be entered as "http://hdl.handle.net/cnri.dlib/july95-arms" resolvable by any browser. It is worth noting that even though using the proxy server approach is straight-forward and doesn't require any customer software customization, it has the effect of connecting the handles with the proxy server's URL location. Hence the selection of a proxy server should be made with care. 4.3 Creating handles for network resources The Handle System allows handles to be created in a distributed fashion. Organizations in need of providing a naming service for their persistent internet resources will be able to contact CNRI or other organizations to register for their own handle naming authority, as well as their own local handle services. This will enable them to create handles for their own organizational use. Policies and procedures for Naming Authority registration are currently under development. As an initiative for general public service, CNRI has established a public handle registration service for the IETF community. This service provides an open channel to allow individuals to create handles and experiment with the handle system. The service is provided for testing purposes only. Future availability of this service is not guaranteed. Details on how to use this service, as well as its terms and conditions can be obtained from http://www.handle.net/ietf/handle/register_handle.html. 5. Handle System Service Architecture The Handle System is distributed, scalable, and designed for widespread deployment. The current implementation consists of one global service and many local handle services. Each handle service consists of one of more physically distributed handle servers. (Currently, the global service consists of two servers in Virginia and two in California. A European location is planned.) And each handle server can have one or more secondary servers for mirroring. In addition, handle caching servers are provided for faster resolution service for a local environment, and they can also be used to provide proxy service through firewalls. 5.1 Handle services The Handle System consists of many services. Each service is responsible for part of the handle namespace. One specific service, called the Global Handle Registry, is globally unique, and has a special function, which is to know of the existence, location, and namespace responsibilities of all other public services, or local handle services. There can be an unlimited number of local handle services, managed by various organizations. In the current implementation each local handle service is registered with the Global Handle Registry to ensure efficient resolution. Policies and procedures for disconnected local handle services are under development. The primary issue here is to guarantee identifier uniqueness in disconnected systems. 5.2 Handle servers within a service Each handle service consists of one or more handle servers. Typically, each handle server runs on a separate computer but multiple handle servers can run on a single computer. Within a handle service, the distribution of handles across its constituent servers is determined by a hash table such that each of N servers within a service will be responsible for 1/N handles. The number of servers can be adjusted as required to meet the needs of a service. 5.3 Server replication Additionally, it may be desirable to mirror the contents of any of the handle servers within a service, presumably on a separate computer. This is referred to as replication and is accomplished by creating one or more additional servers whose sole purpose is to mirror the contents of the original server. Within each set of replicated servers, the initial server is called the primary server and all others are called secondary servers. The creation and administration of handles always takes place on the primary server, but resolution can use either the primary or any of its secondaries. This provides fault tolerance, as well as the potential for performance improvement. 5.4 Caching Server The Handle System Caching Server has been built to reduce the network traffic between handle clients and handle services and its use is strongly encouraged. Caching handle data or routing information on the caching server allows some handle resolution to be performed within an organization's local area network. 5.5 Proxy Server The Handle System Proxy Server has been developed to act as a client to the Handle System, allowing handles to be resolved using Handle Proxy Syntax (ie, http:/). Using this syntax, the browser passes a handle to the proxy server, which in turn passes the handle to the appropriate handle service for resolution. If the handle can be resolved into one or more URLs, a URL is returned from the handle server to the proxy, and from the proxy to the client browser. 5.6 Handle System Resolver The Handle System Resolver [4] is a software component which extends Netscape or Microsoft Web browsers, and allows handles to be resolved using Handle URI Syntax (ie, hdl:). Using this syntax, the browser passes the handle directly to the appropriate handle service for resolution. If the handle can be resolved into one or more URLs, one of the URLs is returned to the browser which then transparently retrieves and displays the intended content. 6. Handle resolution Handle clients and handle services use the Handle Resolution Protocol [5] to conduct resolution transactions. The Handle Resolution protocol uses registered port number 2641. By default, a handle resolution request will be answered with all of the typed data associated with a handle, with the exception of the administrative data. It is also possible to request data only of a certain type. Handle clients that do not know which handle service to query for a given handle start with the Global Handle Registry, which is guaranteed to know which service contains a given handle. Within a given service, a client uses the hash table specific to the service to discover the individual server, or set of replicated servers, which can resolve the given handle. A number of handle resolution clients have been constructed, all of which utilize the Handle Client Library [6], which is currently implemented as a C library. The clients include a Web proxy server, the Handle System Resolver [4], and the Grail Web browser [7]. 7. Handle administration Handle System administration is carried out using the Handle System Administration Protocol [8]. This protocol allows the creation and administration of handles and their associated data within the Handle System. A series of APIs currently under construction on top of this protocol will be made publicly available. 8. Security Consideration The Handle System has been designed to enable secure transactions between clients and servers and to allow secure and stable storage of handle data. Development and documentation of secure practices and policies is underway. A handle does not in itself pose a security threat. When specified or used in URL context, it is subject to all the security considerations in the URL specification [3]. 9. Handle System and URL/URN/URI While the Handle System is designed to be usable in many contexts and is not a subset or extension of current UR* schemes, it can be used in conjunction with those schemes. When used within those schemes it is, of course, subject to their constraints. The Handle System is designed to provide all the fundamental requirements outlined in the URN/URI specifications [9,10]. On the other hand, the Handle System differs from the current proposed URN implementations [11,12,13] discussed in the IETF URN working group in the following ways. First of all, the Handle System defines a namespace independent of URI, and is not subject to the current namespace constraints of URI. The namespace of handles is Unicode based, and imposes no reserved or excluded characters on the handle string. This allows handles to be specified in any national language natively in a globally unique and unambiguous manner. The elimination of any reserved characters also allows any legacy naming system, such as SICI [14], to be used with no or minimum change. The Handle System is designed to support, instead of exclude, the use of user friendly names in any native language. There are situations in which using descriptive names may hurt the persistence of the name once the identified object changes its association. Objects of this nature may be better served using non-descriptive names; for example, digits only. On the other hand, there are objects for which descriptive names are desirable. The current URN/URI was defined "generally to be for machine, rather than human, consumption" [20]. It uses a subset of ASCII character set, and requires a set of reserved/excluded characters. A Human Friendly Name Service is expected to work with it. URN services may be used to resolve handles from the Handle System. For example, the handle "cnri.dlib/july95-arms" may be specified as "urn:hdl:cnri.dlib/july95-arms". This will allow any URN-aware browsers to resolve the handle as a URN. Handles specified as an URN must follow the URN syntax [13]. 10. History and Acknowledgment The initial design and implementation of the Handle System was part of the Computer Science Technical Reports project, funded by the Defense Advanced Projects Agency (DARPA) under Grant No. MDA-972-92-J-1029. One aspect of this project was to develop a framework for the underlying infrastructure of digital libraries. It is described in a paper by Robert Kahn and Robert Wilensky [15]. The first implementation was created at CNRI in the fall of 1994. Subsequent work on the Handle System has been supported in part by the Advanced Research Projects Agency under Grant No. MDA972-92-J-1029. The following people have contributed to the Handle System design and implementation: David Ely, William Arms, Navjeet Chabbewal, Judith Grass, Robert Kahn, Timothy Kendall, Connie McLindon, Charles Orth, Ed Overly, Varna Puvvada, John Stewart, Allison Yu-McNamara, Ron Ely, Catherine Rey, Jane Euler, Larry Lannom, and Sam Sun. We also want to acknowledge the contribution of the other members of the Computer Science Technical Reports project. 11. Author's Address Sam X. Sun 1895 Preston White Dr. Suite 100 Reston, VA 20191-5434 (703) 620-8990 ssun@cnri.reston.va.us 12. References [1] The Unicode Consortium, "The Unicode Standard, Version 2.0", Addison-Wesley Developers Press, 1996. ISBN 0-201-48345-9 [2] Yergeau, Francois, "UTF-8, A Transform Format for Unicode and ISO10646", RFC2044, October 1996. http://ds.internic.net/rfc/rfc2044.txt [3] Berners-Lee, T., Masinter, L., McCahill, M., et al., "Uniform Resource Locators (URL)", RFC1738, December 1994. http://ds.internic.net/rfc/rfc1738.txt [4] Handle System Resolver. http://www.handle.net/resolver/ [5] Handle System Client Library download site. http://www.handle.net/download.html [6] Handle Resolution Protocol. http://www.handle.net/client_spec.html [7] The Grail Internet Browser. http://grail.cnri.reston.va.us/grail/ [8] Handle Administration Protocol. http://www.handle.net/handle_admin.html [9] Sollins, K. and L. Masinter, "Functional Requirements for Uniform Resource Names", RFC 1737, December 1994. http://ds.internic.net/rfc/rfc1737.txt [10] Berners-Lee, T., "Universal Resource Identifiers in WWW" RFC 1630, June 1994. http://ds.internic.net/rfc/rfc1630.txt [11] Daniel, Ron and Michael Mealling, "Resolution of Uniform Resource Identifiers using the Domain Name System", RFC 2168, June 1997. http://ds.internic.net/rfc/rfc2168.txt [12] Daniel, Jr., Ron, "A Trivial Convention for using HTTP in URN Resolution", RFC-2169, June 1997. http://ds.internic.net/rfc/rfc2169.txt [13] Moats, Ryan, "URN Syntax", RFC-2141, May 1997. http://ds.internic.net/rfc/rfc2141.txt [14] Serial Item and Contribution Identifier (SICI) Standard. http://sunsite.berkeley.edu/SICI/ [15] Kahn, Robert and Wilensky, Robert. "A Framework for Distributed Digital Object Services", May, 1995. http://www.cnri.reston.va.us/tmp_hp/k-w.html [16] Digital Object Identifier System. http://hdl.handle.net/10.1000/1 [17] National Digital Library Program. http://hdl.handle.net/4263537/003 [18] The CNRI Registry. http://hdl.handle.net/4263537/001 [19] Defense Virtual Library. http://hdl.handle.net/4263537/002 [20] Sollins, K., "Architectural Principles of Uniform Resource Name Resolution", September 26, 1997, Work in Progress. ftp://ftp.ietf.org/internet-drafts/draft-ietf-urn-req-frame-04.txt [21] D. Crocker, Ed., P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997, http://info.internet.isi.edu/in-notes/rfc/files/rfc2234.txt [22] T. Berners-Lee, L. Masinter, R. Fielding, "Uniform Resource Identifiers (URI): Generic Syntax", work in progress, June 1998, ftp://ftp.ietf.org/internet-drafts/draft-fielding- uri-syntax-03.txt [23] List of IANA registered charset names. ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets INTERNET-DRAFT draft-ietf-handle-system-01.txt Expires Jan 16, 1999