idnits 2.17.1 

draft-kunze-ark-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.  Found some kind of copyright notice around line 1783 but
     it does not match any copyright boilerplate known by this tool.

     Expected boilerplate is as follows today (2024-04-18) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents -- however, there's a paragraph
     with a matching beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 39
     longer pages, the longest (page 2) being 63 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 40 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 3 instances of too long lines in the document, the longest one
     being 5 characters in excess of 72.

  ** There are 1098 instances of lines with control characters in the
     document.

  == There are 13 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 557 has weird spacing: '...eful to  remem...'

  == Line 749 has weird spacing: '... regexp  repla...'

  == Line 1793 has weird spacing: '...for	the  purpo...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (20 February 2002) is 8093 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'MD5' is defined on line 1683, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCORE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DOI'

  ** Obsolete normative reference: RFC  822 (ref. 'EMHDRS') (Obsoleted by RFC
     2822)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ERC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HKMP'

  ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NAPTR'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NLMPerm'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PURL'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'REG'

  ** Obsolete normative reference: RFC 2396 (ref. 'URI') (Obsoleted by RFC
     3986)

  ** Downref: Normative reference to an Informational RFC: RFC 2288 (ref.
     'URNBIB')

  ** Obsolete normative reference: RFC 2141 (ref. 'URNSYN') (Obsoleted by RFC
     8141)

  ** Obsolete normative reference: RFC 2611 (ref. 'URNNID') (Obsoleted by RFC
     3406)


     Summary: 16 errors (**), 0 flaws (~~), 9 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet-Draft:	draft-kunze-ark-03.txt				J. Kunze
3	ARK Identifier Scheme			 University of California (UCSF)
4	Expires	20 August 2002					R. P. C. Rodgers
5						 US National Library of	Medicine
6								20 February 2002

8			  The ARK Persistent Identifier	Scheme

10	      (http://www.ietf.org/internet-drafts/draft-kunze-ark-03.txt)

12	Status of this Document

14	   This	document is an Internet-Draft and is in	full conformance with
15	   all provisions of Section 10	of RFC2026.

17	   Internet-Drafts are working documents of the	Internet Engineering
18	   Task	Force (IETF), its areas, and its working groups.  Note that
19	   other groups	may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is	inappropriate to use Internet-Drafts as	reference
25	   material or to cite them other than as ``work in progress.''

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be	accessed at
31	   http://www.ietf.org/shadow.html.

33	   Distribution	of this	document is unlimited.	Please send comments to
34	   jak@ckm.ucsf.edu.

36	   Copyright (C) The Internet Society (2002).  All Rights Reserved.

38	Abstract

40	   The ARK (Archival Resource Key) is a	scheme intended	to facilitate
41	   the persistent naming and retrieval of information objects.	It
42	   comprises an	identifier syntax and three services.  An ARK has four
43	   components:

45			  [http://NMAH/]ark:/NAAN/Name

47	   an optional and mutable Name	Mapping	Authority Hostport part	(NMAH,
48	   where "hostport" is a hostname followed optionally by a colon and
49	   port	number), the "ark:" label, the Name Assigning Authority	Number
50	   (NAAN), and the assigned Name.  The NAAN and	Name together form the
51	   immutable persistent	identifier for the object.

53	   An ARK request is an	ARK with a service request and a question mark
54	   appended to it.  Use	of an ARK request proceeds in two steps.  First,
55	   the NMAH, if	not specified, is discovered based on the NAAN.	 Two
56	   discovery methods are proposed:  one	is file	based, the other based
57	   on the DNS NAPTR record.  Second, the ARK request is	submitted to the
58	   NMAH.  Three	ARK services are defined, gaining access to:  (1) the
59	   object (or a	sensible substitute), (2) a description	of the object
60	   (metadata), and (3) a description of	the commitment made by the NMA
61	   regarding the persistence of	the object (policy).  These services are
62	   defined initially to	use the	HTTP protocol.	When the NMAH is
63	   specified, the ARK is a valid URL that can gain access to ARK
64	   services using an unmodified	Web client.

66	1.  Introduction

68	   This	document describes a scheme for	the high-quality naming	of
69	   information resources.  The scheme, called the Archival Resource Key
70	   (ARK), is well suited to long-term access and identification	for any
71	   information resources that accommodate reasonably regular electronic
72	   description.	 This includes digital documents, databases, software,
73	   and websites, as well as physical objects (such as books, bones, and
74	   statues) and	intangible objects (chemicals, diseases, vocabulary
75	   terms, performances).  Hereafter the	term "object" refers to	an
76	   information resource.  The term ARK itself refers both to the scheme
77	   and to any single identifier	that conforms to it.

79	   Schemes for persistent identification of network-accessible objects
80	   are not new.	 In the	early 1990's, the design of the	Uniform	Resource
81	   Name	[URNSYN] responded to the observed failure rate	of URLs	by
82	   articulating	an indirect, non-hostname-based	naming scheme and the
83	   need	for responsible	name management.  Meanwhile, promoters of the
84	   Digital Object Identifier [DOI] succeeded in	building a community of
85	   providers around a mature software system that supports name
86	   management.	The Persistent Uniform Resource	Locator	[PURL] was a
87	   third scheme	that has the unique advantage of working with unmodified
88	   web browsers.  The ARK scheme is a new approach.

90	   A founding principle	of the ARK is that persistence is purely a
91	   matter of service.  Persistence is neither inherent in an object nor
92	   conferred on	it by a	particular naming syntax.  Rather, persistence
93	   is achieved through a provider's successful stewardship of objects
94	   and their identifiers.  The highest level of	persistence will be
95	   reinforced by a provider's robust contingency, redundancy, and
96	   succession strategies.  It is further safeguarded to	the extent that
97	   a provider's	mission	is shielded from marketplace and political
98	   instabilities.

100	1.1.  Three Reasons to Use ARKs

102	   The first requirement of an ARK is to give users a link from	an
103	   object to a promise of stewardship for it.  That promise is a multi-
104	   faceted covenant that binds the word	of an identified service
105	   provider to a specific set of responsibilities.  No one can tell if
106	   successful stewardship will take place because no one can predict the
107	   future.  Reasonable conjecture, however, may	be based on past
108	   performance.	 There must be a way to	tie a promise of persistence to
109	   a provider's	demonstrated or	perceived ability -- its reputation --
110	   in that arena.  Provider reputations	would then rise	and fall as
111	   promises are	observed variously to be kept and broken.  This	is
112	   perhaps the best way	we have	for gauging the	strength of any
113	   persistence promise.

115	   The second requirement of an	ARK is to give users a link from an
116	   object to a description of it.  The problem with a naked identifier
117	   is that without a description real identification is	incomplete.
118	   Identifiers common today are	relatively opaque, though some contain
119	   ad hoc clues	that reflect fleeting life cycle events	such as	the
120	   address of a	short stay in a	filesystem hierarchy.  Possession of
121	   both	an identifier and an object is some improvement, but positive
122	   identification may still be elusive since the object	itself need not
123	   include a matching identifier or be transparent enough to reveal its
124	   identity without significant	research.  In either case, what	is
125	   called for is a record bearing witness to the identifier's
126	   association with the	object,	as supported by	a recorded set of object
127	   characteristics.  This descriptive record is	partly an identification
128	   "receipt" with which	users and archivists can verify	an object's
129	   identity after brief	inspection and a plausible match with recorded
130	   characteristics such	as title and size.

132	   The final requirement of an ARK is to give users a link to the object
133	   itself (or to a copy) if at all possible.  Persistent access	is the
134	   central duty	of an ARK, with	persistent identification playing a
135	   vital but supporting	role.  Object access may not be	feasible for
136	   various reasons, such as catastrophic loss of the object, a licensing
137	   agreement that keeps	an archive "dark" for a	period of years, or when
138	   an object's own lack	of tangible existence precludes	normal concepts
139	   of access (e.g., a vocabulary term might be accessed	through	its
140	   definition).	 In such cases the ARK's identification	role assumes a
141	   much	higher profile.	 But attempts to simplify the persistence
142	   problem by decoupling access	from identification and	concentrating
143	   exclusively on the latter are of questionable utility.  A perfect
144	   system for assigning	forever	unique identifiers might be created, but
145	   if it did so	without	reducing access	failure	rates, no one would be
146	   interested.	The central issue -- which may be summed up as the "HTTP
147	   404 Not Found" problem -- would not have been addressed.

149	1.2.  Organizing Support for ARKs

151	   Co-location of persistent access and	identification services	is
152	   natural.  Any organization that undertakes ongoing support of true
153	   persistent identification (which includes description) is well-served
154	   if it controls, owns, or otherwise has clear	internal access	to the
155	   identified objects, and this	gives it an advantage if it wishes also
156	   to support persistent external access.  Conversely, the latter
157	   implies a commitment	to collection management activities such as
158	   monitoring, acquisition, verification, and change control over
159	   objects that	are persistently identified at least for the sake of
160	   internal record keeping and accountability; this covers the major
161	   prerequisite	for external support of	persistent identification.
162	   Organizing ARK services under one roof thus tends to	make sense.

164	   ARK support is not for everybody.  By requiring specific, revealed
165	   commitments to preservation,	object access, and description,	the bar
166	   for providing ARK services is high.	On the other hand, it would be
167	   hard	to grant credence to a persistence promise from	an organization
168	   that	could not muster the minimum ARK services.  Not	that there isn't
169	   a business model for	an ARK-like, description-only service built on
170	   top of another organization's full complement of ARK	services.  For
171	   example, there might	be competition at the description level	for
172	   abstracting and indexing a body of scientific literature archived in
173	   a combination of open and fee-based repositories.  Such a business
174	   would benefit more from persistence than it would directly support
175	   it.

177	1.3.  A	Definition of Identifier

179	   Heretofore, persistence discussion has been hampered	by a borrowed
180	   meaning for "identifier" that emerged as a side effect of defining
181	   the Uniform Resource	Identifier in [URI]:

183		(formerly)  An identifier is a sequence	of characters with a
184		restricted syntax ... that can act as a	reference to something
185		that has identity.

187	   The term works in context, but falters when employed	for persistence.
188	   Troubling phrases arise, such as,

190		"The goal is to	create an identifier that does not break."

192	   As defined this kind	of identifier "breaks" when it sustains	damage
193	   to its character sequence, but really what breaks has to do with the
194	   identifier's	reference role.	 The following definition is proposed.

196		(new definition)  An identifier	is an association between a
197		string (a sequence of characters) and an information resource.
198		That association is made manifest by a record (e.g., a
199		cataloging or other metadata record) that binds	the identifier
200		string to a set	of identifying resource	characteristics.

202	   The identifier (the association) must be vouched for	by some	sort of
203	   record.  In the complete absence of any testimony (e.g., metadata)
204	   regarding an	association, a would-be	identifier string is a
205	   meaningless sequence	of characters.	To keep	an externally visible
206	   but otherwise internal identifier string opaque to outsiders, for
207	   example, it suffices	for an organization not	to disclose the	nature
208	   of its association.	For our	immediate purpose, actual existence of
209	   an association record is more important than	its authenticity.  If
210	   one is lucky	an object carries its own identifier as	part of	itself
211	   (e.g., imprinted on the first page),	but in processes such as
212	   resource discovery and retrieval the	typical	object is often	unwieldy
213	   or unavailable (such	as when	licensing restrictions are in effect).
214	   A metadata record that includes the identifier string is the	next
215	   best	thing -- a conveniently	manipulable surrogate that can act as
216	   both	an association "receipt" and "declaration".

218	   It now makes	sense to speak of preventing an	identifier, as an
219	   association,	from breaking.	Having said that, this document	still
220	   (ab)uses the	terms "ARK" and	"identifier" as	shorthands to refer to
221	   identifier strings, in other	words, to sequences of characters.  Thus
222	   a discussion	of ARK syntax refers to	a string format, not an
223	   association format.	The context should make	the meaning clear.

225	2.  ARK	Anatomy

227	   An ARK is represented by a sequence of characters (a	string)	that
228	   contains the	label, "ark:", optionally preceded by the beginning part
229	   of a	URL.  Here is a	diagrammed example.

231		     http://foobar.zaf.org/ark:/12025/654xz321
232		     \___________________/ \__/	\___/ \______/
233			(optional)	    |	  |	 |
234			    |	     ARK Label	  |    Name (assigned by the NAA)
235			    |			  |
236	      Name Mapping Authority		 Name Assigning	Authority
237		     Hostport (NMAH)		  Number (NAAN)

239	   The ARK syntax can be summarized,

241			  [http://NMAH/]ark:/NAAN/Name

243	   where the NMAH part is in brackets to indicate that it is optional.

245	2.1.  The Name Mapping Authority Hostport (NMAH)

247	   Before the "ark:" label may appear an optional Name Mapping Authority
248	   Hostport (NMAH) that	is a temporary address where ARK service
249	   requests may	be sent.  It consists of "http://" (or any service
250	   specification valid for a URL) followed by an Internet hostname or
251	   hostport combination	having the same	format and semantics as	the
252	   hostport part of a URL.  The	most important thing about the NMAH is
253	   that	it is "identity	inert" from the	point of view of object
254	   identification.  In other words, ARKs that differ only in the
255	   optional NMAH part identify the same	object.	 Thus, for example, the
256	   following three ARKs	are synonyms for but one information resource:

258		     http://foobar.zaf.org/ark:/12025/654xz321
259		   http://sneezy.dopey.com/ark:/12025/654xz321
260					   ark:/12025/654xz321

262	   The NMAH part makes an ARK into an actionable URL.  Conversely, any
263	   URL whose path component begins with	"ark:/"	stands a reasonable
264	   chance of being an ARK (only	because	such URLs are not common), but
265	   further verification	is still required (such	as probing the URL for
266	   the three ARK services).

268	   The NMAH part is temporary, disposable, and replaceable.  Over time
269	   the NMAH will likely	stop working and have to be replaced with a
270	   currently active service provider.  This relies on a	mapping
271	   authority discovery process,	of which two alternate methods are
272	   outlined in a later section.	 Meanwhile, a carefully	chosen NMAH can
273	   be as durable as any	Internet domain	name, and so may last for a
274	   decade or longer.  Users should be prepared,	however, to refresh the
275	   NMAH	because	the one	found in the URL form of the ARK may have
276	   stopped working.

278	   The above method for	creating an actionable identifier from a basic
279	   ARK (prepending "http://" and an NMAH) is itself temporary.	Assuming
280	   that	the reign of [HTTP] in information retrieval will end one day,
281	   ARKs	will have to be	converted into new kinds of actionable
282	   identifiers.	 In any	event, if ARKs see widespread use, web browsers
283	   would presumably evolve to perform this (currently simple)
284	   transformation automatically.

286	2.2.  The Name Assigning Authority Number (NAAN)

288	   The part of the ARK directly	following the "ark:" is	the Name
289	   Assigning Authority Number (NAAN) enclosed in `/' (slash) characters.
290	   This	part is	always required, as it identifies the organization that
291	   originally assigned the Name	of the object.	It is used to discover a
292	   currently valid NMAH	and to provide top-level partitioning of the
293	   space of all	ARKs.  NAANs are registered in a manner	similar	to URN
294	   Namespaces, but they	are pure numbers consisting of 5 digits	or 9
295	   digits.  Thus, the first 100,000 registered NAAs fit	compactly into
296	   the 5 digits, and if	growth warrants, the next billion fit into the 9
297	   digit form.	In either case the fixed odd number of digits helps
298	   reduce the chances of finding a NAAN	out of context and confusing it
299	   with	nearby quantities such as 4-digit dates.

301	2.3.  The Name Part

303	   The final part of the ARK is	the Name assigned by the NAA, and it is
304	   also	required.  The Name is a string	of visible ASCII characters and
305	   should be less than 128 bytes in length.  The length	restriction
306	   keeps the ARK short enough to append	ordinary ARK request strings
307	   without running into	transport restrictions within HTTP GET requests.
308	   Characters may be letters, digits, or any of	these six characters:

310	       =   @   $   _   *   +   #

312	   The following characters may	also be	used, but in limited ways:

314	       /   .   -   %

316	   The characters `/' and `.' are ignored if either appears as the last
317	   character of	an ARK.	 If used internally, they allow	a name assigning
318	   authority to	reveal object hierarchy	and object variants as described
319	   in the next two sections.

321	   A `-' (hyphen) may appear in	an ARK,	but must be ignored in lexical
322	   comparisons.	 The `%' character is reserved for %-encoding all other
323	   octets that would appear in the ARK string, in the same manner as for
324	   URIs	[URI].	A %-encoded octet consists of a	`%' followed by	two hex
325	   digits; for example,	"%7d" stands in	for `}'.  Lower	case hex digits
326	   are preferred to reduce the chances of false	acronym	recognition;
327	   thus	it is better to	use "%acT" instead of "%ACT".  The character `%'
328	   itself must be represented using "%25".  As with URNs, %-encoding
329	   permits ARKs	to support legacy namespaces (e.g., ISBN, ISSN,	SICI)
330	   that	have less restricted character repertoires [URNBIB].

332	   The creation	of names that include linguistically based constructs
333	   (having recognizable	meaning	from natural language) is strongly
334	   discouraged if long-term persistence	is a naming priority.  Such
335	   names do not	age or travel well.  Names that	look more or less like
336	   numbers avoid common	problems that defeat persistence and
337	   international acceptance.  The use of digits	is highly recommended.
338	   Mixing in non-vowel alphabetic characters is	a relatively safe and
339	   easy	way to achieve more compact names, although any	character
340	   repertoire can work if potentially troublesome names	will be
341	   discarded during a screening	process.  More on naming considerations
342	   is given in a later section.

344	2.3.1.	Names that Reveal Object Hierarchy

346	   A name assigning authority may choose to reveal the presence	of a
347	   hierarchical	relationship between objects using the `/' (slash)
348	   character in	the Name part of an ARK.  If the Name contains an
349	   internal slash, the piece to	its left indicates a containing	object.
350	   For example,	publishing an ARK of the form,

352			       ark:/12025/654/xz/321

354	   is equivalent to publishing three ARKs,

356			       ark:/12025/654/xz/321
357			       ark:/12025/654/xz
358			       ark:/12025/654

360	   together with a declaration that the	first object is	contained in the
361	   second object, and that the second object is	contained in the third.

363	   Revealing the presence of hierarchy is completely up	to the assigning
364	   authority.  It is hard enough to commit to one object's name, let
365	   alone to three objects' names and to	a specific, ongoing relatedness
366	   among them.	Thus, regardless of whether hierarchy was present
367	   initially, the assigning authority, by not using slashes, reveals no
368	   shared inferences about hierarchical	or other inter-relatedness in
369	   the following ARKs:

371			       ark:/12025/654_xz_321
372			       ark:/12025/654_xz
373			       ark:/12025/654xz321
374			       ark:/12025/654xz
375			       ark:/12025/654

377	   Note	that slashes around the	ARK's NAAN (/12025/ in these examples)
378	   are not part	of the ARK's Name and therefore	do not indicate	the
379	   existence of	some sort of NAAN super	object containing all objects in
380	   its namespace.  A slash must	have at	least one non-structural
381	   character (one that is neither a slash nor a	period)	on both	sides in
382	   order for it	to separate recognizable structural components.	 So
383	   initial or final slashes may	be removed, and	double slashes may be
384	   converted into single slashes.

386	2.3.2.	Names that Reveal Object Variants

388	   A name assigning authority may choose to reveal the possible	presence
389	   of variant objects using the	`.' (period) character in the Name part
390	   of an ARK.  If the Name contains an internal	period,	the piece to its
391	   left	is a base name and the piece to	its right up to	the end	of the
392	   ARK or to the next period is	a suffix.  A Name may have more	than one
393	   suffix, for example,

395			       ark:/12025/654.24
396			       ark:/12025/xz4/654.24
397			       ark:/12025/654.f55.g78.v20

399	   There are two main rules.  First, if	two ARKs share the same	base
400	   name	but have different suffixes, the corresponding objects were
401	   considered variants of each other (different	formats, languages,
402	   versions, etc.) by the assigning authority.	Thus, the following ARKs
403	   are variants	of each	other:

405			       ark:/12025/654.f55.g78.v20
406			       ark:/12025/654.321xz
407			       ark:/12025/654.44

409	   Second, publishing an ARK with a suffix implies the existence of at
410	   least one variant identified	by the ARK without its suffix.	The ARK
411	   otherwise permits no	further	assumptions about what variants	might
412	   exist.  So publishing the ARK,

414			       ark:/12025/654.f55.g78.v20

416	   is equivalent to publishing the four	ARKs,

418			       ark:/12025/654.f55.g78.v20
419			       ark:/12025/654.f55.g78
420			       ark:/12025/654.f55
421			       ark:/12025/654

423	   Revealing the possibility of	variants is completely up to the
424	   assigning authority.	 It is hard enough to commit to	one object's
425	   name, let alone to multiple variants' names and to a	specific,
426	   ongoing relatedness among them.  The	assigning authority is the sole
427	   arbiter of what constitutes a variant within	its namespace, and
428	   whether to reveal that kind of relatedness by using periods within
429	   its names.

431	   A period must have at least one non-structural character (one that is
432	   neither a slash nor a period) on both sides in order	for it to
433	   separate recognizable structural components.	 So initial or final
434	   periods may be removed, and double periods may be converted into
435	   single periods.  Multiple suffixes should be	arranged in sorted order
436	   (pure ASCII collating sequence) at the end of an ARK.

438	2.3.3.	Hyphens	are Ignored

440	   Hyphens are always ignored in ARKs.	Hyphens	may be added to	an ARK's
441	   Name	part for readability, or during	the formatting and wrapping of
442	   text	lines, but (as in phone	numbers) they are treated as if	they
443	   were	not present.  Thus, like the NMAH, hyphens are "identity inert"
444	   in comparing	ARKs for equivalence.  For example, the	following ARKs
445	   are equivalent for purposes of comparison and ARK service access:

447					  ark:/12025/65-4-xz-321
448			  ark:sneezy.dopey.com/12025/654--xz32-1
449					  ark:/12025/654xz321

451	2.4.  Normalization and	Lexical	Equivalence

453	   To determine	if two or more ARKs identify the same object, the ARKs
454	   are compared	for lexical equivalence	after first being normalized.
455	   Since ARK strings may appear	in various forms (e.g.,	having different
456	   NMAHs), normalizing them minimizes the chances that comparing two ARK
457	   strings for equality	will fail unless they actually identify
458	   different objects.  In a specified-host ARK (one having an NMAH), the
459	   NMAH	never participates in such comparisons.

461	   Normalization of an ARK for the purpose of octet-by-octet equality
462	   comparison with another ARK consists	of four	steps.	First, any upper
463	   case	letters	in the "ark:" label and	the two	characters following a
464	   `%' are converted to	lower case.  The case of all other letters in
465	   the ARK string must be preserved.  Second, any NMAH part is removed
466	   (everything from an initial "http://" up to the next	slash) and all
467	   hyphens are removed.

469	   Third, structural characters	(slash and period) are normalized.
470	   Initial and final occurrences are removed, and two structural
471	   characters in a row (e.g., // or ./)	are replaced by	the first
472	   character, iterating	until each occurrence has at least one non-
473	   structural character	on either side.	 Finally, if there are any
474	   components with a period on the left	and a slash on the right, either
475	   the component and the preceding period must be moved	to the end of
476	   the Name part or the	ARK must be thrown out as malformed.

478	   The fourth and final	step is	to arrange the suffixes	in ASCII
479	   collating sequence (that is,	to sort	them) and to remove duplicate
480	   suffixes, if	any.  It is also permissible to	throw out ARKs for which
481	   the suffixes	are not	sorted.

483	   The resulting ARK string is now normalized.	Comparisons between
484	   normalized ARKs are case-sensitive, meaning that upper case letters
485	   are considered different from their lower case counterparts.

487	   To keep ARK string variation	to a minimum, no reserved ARK characters
488	   should be %-encoded unless it is deliberately to conceal their
489	   reserved meanings.  No non-reserved ARK characters should ever be %-
490	   encoded.  Finally, no %-encoded character should ever appear	in an
491	   ARK in its decoded form.

493	2.5.  Naming Considerations

495	   The ARK has different goals from the	URI, so	it has different
496	   character set requirements.	Because	linguistic constructs imperil
497	   persistence,	for ARKs non-ASCII character support is	unimportant.
498	   ARKs	and URIs share goals of	transcribability and transportability
499	   within web documents, so characters are required to be visible, non-
500	   conflicting with HTML/XML syntax, and not subject to	tampering during
501	   transmission	across common transport	gateways.  Add the goal	of
502	   making an undelimited ARK recognizable in running prose, as in
503	   ark:/12025/=@_22*$, and certain punctuation characters (e.g., comma,
504	   period) end up being	excluded from the ARK lest the end of a	phrase
505	   or sentence be mistaken for part of the ARK.

507	   A valuable technique	for provision of persistent objects is to try to
508	   arrange for the complete identifier to appear on, with, or near its
509	   retrieved object.  An object	encountered at a moment	in time	when its
510	   discovery context has long since disappeared	could then easily be
511	   traced back to its metadata,	to alternate versions, to updates, etc.
512	   This	has seen reasonable success, for example, in book publishing and
513	   software distribution.

515	   If persistence is the goal, a deliberate local strategy for
516	   systematic name assignment is crucial.  Names must be chosen	with
517	   great care.	Poorly chosen and managed names	will devastate any
518	   persistence strategy, and they do not discriminate based on naming
519	   scheme.  Whether a mistakenly re-assigned identifier	is a URN, DOI,
520	   PURL, URL, or ARK, the damage -- failed access and confusion	-- is
521	   not mitigated more in one scheme than in another.  Conversely, in-
522	   house efforts to manage names responsibly will go much further
523	   towards safeguarding	persistence than any choice of naming scheme or
524	   name	resolution technology.

526	   Hostnames appearing in any identifier meant to be persistent	must be
527	   chosen with extra care.  The	tendency in hostname selection has
528	   traditionally been to choose	a token	with recognizable attributes,
529	   such	as a corporate brand, but that tendency	wreaks havoc with
530	   persistence that is to outlive brands, corporations,	subject
531	   classifications, and	natural	language semantics (e.g., what did the
532	   three letters "gay" mean 1958, 1978,	and 1998?).  Today's recognized
533	   and correct attributes are tomorrow's stale or incorrect attributes.
534	   In making hostnames (any names, actually) long-term persistent, it
535	   helps to eliminate recognizable attributes to the extent possible.
536	   This	affects	selection of any name based on URLs, including PURLs and
537	   the explicitly disposable NMAHs.  There is no excuse	for a provider
538	   that	manages	its internal names impeccably not to exercise the same
539	   care	in choosing what could be an exceptionally durable hostname,
540	   especially if it would form the prefix for all the provider's URL-
541	   based external names.  Registering an opaque	hostname in the	".org"
542	   or ".net" domain would not be a bad start.

544	   Dubious persistence speculation does	not make selecting naming
545	   strategies any easier.  For example,	despite	rumors to the contrary,
546	   there are really no obvious reasons why the organizations registering
547	   DNS names, URN Namespaces, and DOI publisher	IDs should have	among
548	   them	one that is intrinsically more fallible	than the next.
549	   Moreover, it	is a misconception that	the demise of DNS and of HTTP
550	   need	adversely affect the persistence of URLs.  At such a time,
551	   certainly URLs from the present day might not then be actionable by
552	   our present-day mechanisms, but resolution systems for future non-
553	   actionable URLs are no harder to imagine than resolution systems for
554	   present-day non-actionable URNs and DOIs.  There is no more stable a
555	   namespace than one that is dead and frozen, and that	would then
556	   characterize	the space of names bearing the "http://" prefix.  It is
557	   useful to  remember that just because hostnames have	been carelessly
558	   chosen in their brief history does not mean that they are unsuitable
559	   in NMAHs (and URLs) intended	for use	in situations demanding	the
560	   highest level of persistence	available in the Internet environment.
561	   A well-planned name assignment strategy is everything.

563	3.  Assigners of ARKs

565	   A Name Assigning Authority (NAA) is an organization that creates (or
566	   delegates creation of) long-term associations between identifiers and
567	   information objects.	 Examples of NAAs include national libraries,
568	   national archives, and publishers.  An NAA may arrange with an
569	   external organization for identifier	assignment.  The US Library of
570	   Congress, for example, allows OCLC (the Online Computer Library
571	   Center, a major world cataloger of books) to	create associations
572	   between Library of Congress call numbers (LCCNs) and	the books that
573	   OCLC	processes.  A cataloging record	is generated that testifies to
574	   each	association, and the identifier	is included by the publisher,
575	   for example,	in the front matter of a book.

577	   An NAA does not so much create an identifier	as create an
578	   association.	 The NAA first draws an	unused identifier string from
579	   its namespace, which	is the set of all identifiers under its	control.
580	   It then records the assignment of the identifier to an information
581	   object having sundry	witnessed characteristics, such	as a particular
582	   author and modification date.  A namespace is usually reserved for an
583	   NAA by agreement with recognized community organizations (such as
584	   IANA	and ISO) that all names	containing a particular	string be under
585	   its control.	 In the	ARK an NAA is represented by the Name Assigning
586	   Authority Number (NAAN).

588	   The ARK namespace reserved for an NAA is the	set of names bearing its
589	   particular NAAN.  For example, all strings beginning	with
590	   "ark:/12025/" are under control of the NAA registered under 12025,
591	   which might be the National Library of Finland.  Because each NAA has
592	   a different NAAN, names from	one namespace cannot conflict with those
593	   from	another.  Each NAA is free to assign names from	its namespace
594	   (or delegate	assignment) according to its own policies.  These
595	   policies must be documented in a manner similar to the declarations
596	   required for	URN Namespace registration [URNNID].

598	   For now, registration of ARK	NAAs is	in a bootstrapping phase.  To
599	   register, please read about the mapping authority discovery file in
600	   the next section and	send email to jak@ckm.ucsf.edu.

602	4.  Finding a Name Mapping Authority

604	   In order to derive an actionable identifier (these days, a URL) from
605	   an ARK, a hostport (hostname	or hostname plus port combination) for a
606	   working Name	Mapping	Authority (NMA)	must be	found.	An NMA is a
607	   service that	is able	to respond to the three	basic ARK service
608	   requests.  Relying on registration and client-side discovery, NMAs
609	   make	known which NAAs' identifiers they are willing to service.

611	   Upon	encountering an	ARK, a user (or	client software) looks inside it
612	   for the optional NMAH part (the hostport of the NMA's ARK service).
613	   If it contains an NMAH that is working, this	NMAH discovery step may
614	   be skipped; the NMAH	effectively uses the beginning of an ARK to
615	   cache the results of	a prior	mapping	authority discovery process.  If
616	   a new NMAH needs to found, the client looks inside the ARK again for
617	   the NAAN (Name Assigning Authority Number).	Querying a global
618	   database, it	then uses the NAAN to look up all current NMAHs	that
619	   service ARKs	issued by the identified NAA.  The global database is
620	   key,	and two	specific methods for querying it are given in this
621	   section.

623	   In the interests of long-term persistence, however, ARK mechanisms
624	   are first defined in	high-level, protocol-independent terms so that
625	   mechanisms may evolve and be	replaced over time without compromising
626	   fundamental service objectives.  Either or both specific methods
627	   given here may eventually be	supplanted by better methods since, by
628	   design, the ARK scheme does not depend on a particular method, but
629	   only	on having some method to locate	an active NMAH.

631	   At the time of issuance, at least one NMAH for an ARK should	be
632	   prepared to service it.  That NMA may or may	not be administered by
633	   the Name Assigning Authority	(NAA) that created it.	Consider the
634	   following hypothetical example of providing long-term access	to a
635	   cancer research journal.  The publisher wishes to turn a profit and
636	   the National	Library	of Medicine wishes to preserve the scholarly
637	   record.  An agreement might be struck whereby the publisher would act
638	   as the NAA and the national library would archive the journal issue
639	   when	it appears, but	without	providing direct access	for the	first
640	   six months.	During the first six months of peak commercial
641	   viability, the publisher would retain exclusive delivery rights and
642	   would charge	access fees.  Again, by	agreement, both	the library and
643	   the publisher would act as NMAs, but	during that initial period the
644	   library would redirect requests for issues less than	six months old
645	   to the publisher.  At the end of the	waiting	period,	the library
646	   would then begin servicing requests for issues older	than six months
647	   by tapping directly into its	own archives.  Meanwhile, the publisher
648	   might routinely redirect incoming requests for older	issues to the
649	   library.  Long-term access is thereby preserved, and	so is the
650	   commercial incentive	to publish content.

652	   There is never a requirement	that an	NAA also run an	NMA service,
653	   although it seems not an unlikely scenario.	Over time NAAs and NMAs
654	   would come and go.  One NMA would succeed another, and there	might be
655	   many	NMAs serving the same ARKs simultaneously (e.g., as mirrors or
656	   as competitors).  There might also be asymmetric but	coordinated NMAs
657	   as in the library-publisher example above.

659	4.1.  Looking Up NMAHs in a Globally Accessible	File

661	   This	subsection describes a way to look up NMAHs using a simple text
662	   file.  For efficient	access the file	may be stored in a local
663	   filesystem, but it needs to be reloaded periodically	to incorporate
664	   updates.  It	is not expected	that the size of the file or frequency
665	   of update should impose an undue maintenance	or searching burden any
666	   time	soon, for even primitive linear	search of a file with ten-
667	   thousand NAAs is a subsecond	operation on modern server machines.
668	   The proposed	file strategy is similar to the	/etc/hosts file	strategy
669	   that	supported Internet host	address	lookup for a period of years
670	   before the advent of	the Domain Name	System [DNS].

672	   A copy of the current file (at the time of writing) appears in an
673	   appendix and	is available on	the web.  A minimal version of the file
674	   appears below.  Comment lines (lines	that begin with	`#') explain the
675	   format and give the file's modification time, reloading address, and
676	   NAA registration instructions.  There is even a Perl	script that
677	   processes the file embedded in the file's comments.	Because	this is
678	   still a proposed file, none of the values in	it are real.

680	       #
681	       # Name Assigning	Authority / Name Mapping Authority Lookup Table
682	       #     Last change:   22 February	2001
683	       #     Reload from:   http://ark.nlm.nih.gov/etc/natab
684	       #     Mirrored at:   http://www.ckm.ucsf.edu/people/jak/home/etc/natab
685	       #		    http://....../etc/natab
686	       #     To	register:   mailto:jak@ckm.ucsf.edu?Subject=naareg
687	       #     Process with:  Perl script	at end of this file (optional)
688	       #
689	       # Each NAA appears at the beginning of a	line with the NAA Number
690	       # first,	a colon, and an	ARK or URL to a	statement of naming policy
691	       # (see http://ark.nlm.nih.gov/naapolicyeg.html for an example).
692	       # All the NMA hostports that service an NAA are listed, one per
693	       # line, indented, after the corresponding NAA line.
694	       #
695	       #   US National Library of Medicine
696	       12025:  http://www.nlm.nih.gov/xxx/naapolicy.html
697		       lhc.nlm.nih.gov:8080 USNLM
698		       foobar.zaf.org UCSF
699		       sneezy.dopey.com	BIREME
700	       #
701	       #   US Library of Congress
702	       12026:  http://www.loc.gov/xxx/naapolicy.html
703		       foobar.zaf.org USLC
704		       sneezy.dopey.com	USLC
705	       #
706	       #   US National Agriculture Library
707	       12027:  http://www.nal.gov/xxx/naapolicy.html
708		       foobar.zaf.gov:80 USNAL
709	       #
710	       #--- end	of data	---
711	       # The enclosed Perl script takes	an NAA as argument and outputs
712	       # the NMAs in this file listed under any	matching NAA.
713	       #
714	       # my $naa = shift;
715	       # while (<>) {
716	       #     next if (!	/^$naa:/);
717	       #     while (<>)	{
718	       #	 last if (! /^[#\s]./);
719	       #	 print "$1\n" if (/^\s+(\S+)/);
720	       #     }
721	       # }
722	       # end of	file

724	4.2.  Looking up NMAHs Distributed via DNS

726	   This	subsection introduces a	method for looking up NMAHs that is
727	   based on the	method for discovering URN resolvers described in
728	   [NAPTR].  It	relies on querying the DNS system already installed in
729	   the background infrastructure of most networked computers.  A query
730	   is submitted	to DNS asking for a list of resolvers that match a given
731	   NAAN.  DNS distributes the query to the particular DNS servers that
732	   can best provide the	answer,	unless the answer can be found more
733	   quickly in a	local DNS cache	as a side-effect of a recent query.
734	   Responses come back inside Name Authority Pointer (NAPTR) records.
735	   The normal result is	one or more candidate NMAHs.

737	   In its full generality the [NAPTR] algorithm	ambitiously accommodates
738	   a complex set of preferences, orderings, protocols, mapping services,
739	   regular expression rewriting	rules, and DNS record types.  This
740	   subsection proposes a drastic simplification	of it for the special
741	   case	of ARK mapping authority discovery.  The simplified algorithm is
742	   called Maptr.  It uses only one DNS record type (NAPTR) and restricts
743	   most	of its field values to constants.  The following hypothetical
744	   excerpt from	a DNS data file	for the	NAAN known as 12026 shows three
745	   example NAPTR records ready to use with the Maptr algorithm.

747	     12026.ark.arpa.
748	     ;;	US Library of Congress
749	     ;;	      order pref flags service regexp  replacement
750	      IN NAPTR	0     0	  "h"  "ark"   "USLC"  lhc.nlm.nih.gov:8080
751	      IN NAPTR	0     0	  "h"  "ark"   "USLC"  foobar.zaf.org
752	      IN NAPTR	0     0	  "h"  "ark"   "USLC"  sneezy.dopey.com

754	   All the fields are held constant for	Maptr except for the "flags",
755	   "regexp", and "replacement" fields.	The "service" field contains the
756	   constant value "ark"	so that	NAPTR records participating in the Maptr
757	   algorithm will not be confused with other NAPTR records.  The "order"
758	   and "pref" fields are held to 0 (zero) and otherwise	ignored	for now;
759	   the algorithm may evolve to use these fields	for ranking decisions
760	   when	usage patterns and local administrative	needs are better
761	   understood.

763	   When	a Maptr	query returns a	record with a flags field of "h" (for
764	   hostport, a Maptr extension to the NAPTR flags), the	replacement
765	   field contains the NMAH (hostport) of an ARK	service	provider.  When
766	   a query returns a record with a flags field of "" (the empty	string),
767	   the client needs to submit a	new query containing the domain	name
768	   found in the	replacement field.  This second	sort of	record exploits
769	   the distributed nature of DNS by redirecting	the query to another
770	   domain name.	 It looks like this.

772	     12345.ark.arpa.
773	     ;;	Digital	Library	Consortium
774	     ;;	      order pref flags service regexp replacement
775	      IN NAPTR	0     0	   ""  "ark"	 ""   dlc.spct.org.

777	   Here	is the Maptr algorithm for ARK mapping authority discovery.  In
778	   it replace <NAAN> with the NAAN from	the ARK	for which an NMAH is
779	   sought.

781		(1) Initialize the DNS query:  type=NAPTR,
782		query=<NAAN>.ark.arpa.

784		(2) Submit the query to	DNS and	retrieve (NAPTR) records,
785		discarding any record that does	not have "ark" for the service
786		field.

788		(3) All	remaining records with a flags fields of "h" contain
789		candidate NMAHs	in their replacement fields.  Set them aside, if
790		any.

792		(4) Any	record with an empty flags field ("") has a replacement
793		field containing a new domain name to which a subsequent query
794		should be redirected.  For each	such record, set
795		query=<replacement> then go to step (2).  When all such	records
796		have been recursively exhausted, go to step (5).

798		(5) All	redirected queries have	been resolved and a set	of
799		candidate NMAHs	has been accumulated from steps	(3).  If there
800		are zero NMAHs,	exit --	no mapping authority was found.	 If
801		there is one or	more NMAH, choose one using any	criteria you
802		wish, then exit.

804	   A Perl script that implements this algorithm	is included here.

806	   #!/depot/bin/perl

808	   use Net::DNS;			   # include simple DNS	package
809	   my $qtype = "NAPTR";			   # initialize	query type
810	   my $naa = shift;			   # get NAAN script argument
811	   my $mad = new Net::DNS::Resolver;	   # mapping authority discovery

813	   &maptr("$naa.ark.arpa");		   # call maptr	- that's it

815	   sub maptr {				   # recursive maptr algorithm
816		   my $dname = shift;		   # domain name as argument
817		   my ($rr, $order, $pref, $flags, $service, $regexp,
818			   $replacement);
819		   my $query = $mad->query($dname, $qtype);
820		   return			   # non-productive query
821			   if (! $query	|| ! $query->answer);
822		   foreach $rr ($query->answer)	{
823			   next			   # skip records of wrong type
824				   if ($rr->type ne $qtype);
825			   ($order, $pref, $flags, $service, $regexp,
826				   $replacement) = split(/\s/, $rr->rdatastr);
827			   if ($flags eq "") {
828				   &maptr($replacement);   # recurse
829			   } elsif ($flags eq "h") {
830				   print "$replacement\n"; # candidate NMAH
831			   }
832		   }
833	   }

835	   The global database thus distributed	via DNS	and the	Maptr algorithm
836	   can easily be seen to mirror	the contents of	the Name Authority Table
837	   file	described in the previous section.

839	5.  Generic ARK	Service	Definition

841	   An ARK request's output is delivered	information; examples include
842	   the object itself, a	policy declaration (e.g., a promise of support),
843	   a descriptive metadata record, or an	error message.	ARK services
844	   must	be couched in high-level, protocol-independent terms if
845	   persistence is to outlive today's networking	infrastructural
846	   assumptions.	 The high-level	ARK service definitions	listed below are
847	   followed in the next	section	by a concrete method (one of many
848	   possible methods) for delivering these services with	today's
849	   technology.

851	5.1.  Generic ARK Access Service (access, location)

853	   Returns (a copy of) the object or a redirect	to the same, although a
854	   sensible object proxy may be	substituted.  Examples of sensible
855	   substitutes include,

857	     - a table of contents instead of a	large complex document,
858	     - a home page instead of an entire	web site hierarchy,
859	     - a rights	clearance challenge before accessing protected data,
860	     - directions for access to	an offline object (e.g., a book),
861	     - a description of	an intangible object (a	disease, an event), or
862	     - an applet acting	as "player" for	a large	multimedia object.

864	   May also return a discriminated list	of alternate object locators.
865	   If access is	denied,	returns	an explanation of the object's current
866	   (perhaps permanent) inaccessibility.

868	5.2.  Generic Policy Service (permanence, naming, etc.)

870	   Returns declarations	of policy and support commitments for given
871	   ARKs.  Declarations are returned in either a	structured metadata
872	   format or a human readable text format; sometimes one format	may
873	   serve both purposes.	 Policy	subareas may be	addressed in separate
874	   requests, but the following areas should should be covered:	object
875	   permanence, object naming, object fragment addressing, and
876	   operational service support.

878	   The permanence declaration for an object is a rating	defined	with
879	   respect to an identified permanence provider	(guarantor), and may
880	   include the following aspects.  One permanence rating framework is
881	   given in [NLMPerm].

883		(a) "object availability" -- whether and how access to the
884		object is supported (e.g., online 24x7,	or offline only),

886		(b) "identifier	validity" -- under what	conditions the
887		identifier will	be or has been re-assigned,

889		(c) "content invariance" -- under what conditions the content of
890		the object is subject to change, and

892		(d) "change history" --	documentation, whether abbreviated or
893		detailed, of any or all	corrections, migrations, revisions, etc.

895	   Naming policy for an	object includes	an historical description of the
896	   NAA's (and its successor NAA's) policies regarding differentiation of
897	   objects.  It	may include the	following aspects.

899		(e) "similarity" -- (or	"unity") the limit, defined by the NAA,
900		to the level of	dissimilarity beyond which two similar objects
901		warrant	separate identifiers but before	which they share one
902		single identifier, and

904		(f) "granularity" -- the limit,	defined	by the NAA, to the level
905		of object subdivision beyond which sub-objects do not warrant
906		separately assigned identifiers	but before which sub-objects are
907		assigned separate identifiers.

909	   Addressing policy for an object includes a description of how, during
910	   access, object components (e.g., paragraphs,	sections) or views
911	   (e.g., image	conversions) may or may	not be "addressed", in other
912	   words, how the NMA permits arguments	or parameters to modify	the
913	   object delivered as the result of an	ARK request.  If supported,
914	   these sorts of operations would provide things like byte-ranged
915	   fragment delivery and open-ended format conversions,	or any set of
916	   possible transformations that would be too numerous to list or to
917	   identify with separately assigned ARKs.

919	   Operational service support policy includes a description of	general
920	   operational aspects of the NMA service, such	as after-hours staffing
921	   and trouble reporting procedures.

923	5.3.  Generic Description Service

925	   Returns a description of the	object.	 Descriptions are returned in
926	   either a structured metadata	format or a human readable text	format;
927	   sometimes one format	may serve both purposes.  A description	must at
928	   a minimum answer the	who, what, when, and where questions concerning
929	   an expression of the	object.	 Standalone descriptions should	be
930	   accompanied by the modification date	and source of the description
931	   itself.  May	also return discriminated lists	of ARKs	that are related
932	   to the given	ARK.

934	6.  Overview of	the HTTP Key Mapping Protocol (HKMP)

936	   The HTTP Key	Mapping	Protocol (HKMP)	is a way of taking a key (a kind
937	   of identifier) and asking such questions as,	what information does
938	   this	identify and how permanent is it?  [HKMP] is in	fact one
939	   specific method under development for delivering ARK	services.  The
940	   protocol runs over HTTP to exploit the web browser's	current	pre-
941	   eminence as user interface to the Internet.	HKMP is	designed so that
942	   a person can	enter ARK requests directly into the location field of
943	   current browser interfaces.	Because	it runs	over HTTP, HKMP	can be
944	   simulated and tested	within keyboard-based [TELNET] sessions.

946	   The asker (a	person or client program) starts with an identifier,
947	   such	as an ARK or a URL.  The identifier reveals to the asker (or
948	   allows the asker to infer) the Internet host	name and port number of
949	   a server system that	responds to questions.	Here, this is just the
950	   NMAH	that is	obtained by inspection and possibly lookup based on the
951	   ARK's NAAN.	The asker then sets up an HTTP session with the	server
952	   system, sends a question via	an HKMP	request	(contained within an
953	   HTTP	request), receives an answer via an HKMP response (contained
954	   within an HTTP response), and closes	the session.  That concludes the
955	   connected portion of	the protocol.

957	   An HKMP request is a	string of characters beginning with a `?'
958	   (question mark) that	is appended to the identifier string.  The
959	   resulting string is sent as an argument to HTTP's GET command.
960	   Request strings too long for	GET may	be sent	using HTTP's POST
961	   command.  The three most common requests correspond to three
962	   degenerate special cases that keep the user's learning and typing
963	   burden low.	First, a simple	key with no request at all is the same
964	   as an ordinary access request.  Thus	a plain	ARK entered into a
965	   browser's location field behaves much like a	plain URL, and returns
966	   access to the primary identified object, for	instance, an HTML
967	   document.

969	   The second special case is a	minimal	ARK description	request	string
970	   consisting of just "?".  For	example, entering the string,

972		   ark.nlm.nih.gov/12025/psbbantu?

974	   into	the browser's location field directly precipitates a request for
975	   a metadata record describing	the object identified by
976	   ark:/12025/psbbantu.	 The browser, unaware of HKMP, prepares	and
977	   sends an HTTP GET request in	the same manner	as for a URL.  HKMP is
978	   designed so that the	response (indicated by the returned HTTP content
979	   type) is normally displayed,	whether	the output is structured for
980	   machine processing (text/plain) or formatted	for human consumption
981	   (text/html).

983	   In the following example HKMP session, each line has	been annotated
984	   to include a	line number and	whether	it was the client or server that
985	   sent	it.  Without going into	much depth, the	session	has three pieces
986	   separated from each other by	blank lines:  the client's piece (lines
987	   1-3), the server's HTTP/HKMP	response headers (4-7),	and the	body of
988	   the server's	response (8-13).  The first and	last lines (1 and 13)
989	   correspond to the client's steps to start the TCP session and the
990	   server's steps to end it, respectively.

992	    1  C: [opens session]
993	       C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu? HTTP/1.1
994	       C:
995	       S: HTTP/1.1 200 OK
996	    5  S: Content-Type:	text/plain
997	       S: HKMP-Status: 0.1 200 OK
998	       S:
999	       S: erc:
1000	       S: who:	  Lederberg, Joshua
1001	   10  S: what:	  Studies of Human Families for	Genetic	Linkage
1002	       S: when:	  1974
1003	       S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1004	       S: [closes session]

1006	   The first two server	response lines (4-5) above are typical of HTTP.
1007	   The next line (6) is	peculiar to HKMP, and indicates	the HKMP version
1008	   and a normal	return status.	The balance of the response (8-11) is
1009	   the single metadata record that comprises the ARK description service
1010	   response.  The record is in the format of an	Electronic Resource
1011	   Citation [ERC], which is discussed in more detail in	the next
1012	   section.  For now, note that	it contains four elements that answer
1013	   the top priority questions regarding	an expression of the object:
1014	   who played a	major role in expressing it, what the expression was
1015	   called, when	is was created,	and where the expression may be	found.
1016	   This	quartet	of elements comes up again and again in	ERCs.

1018	   The third degenerate	special	case of	an ARK request (and no other
1019	   cases will be described in this document) is	the string "??",
1020	   corresponding to a minimal permanence policy	request.  It can be seen
1021	   in use appended to an ARK (on line 2) in the	example	session	that
1022	   follows.

1024	    1  C: [opens session]
1025	       C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu?? HTTP/1.1
1026	       C:
1027	       S: HTTP/1.1 200 OK
1028	    5  S: Content-Type:	text/plain
1029	       S: HKMP-Status: 0.1 200 OK
1030	       S:
1031	       S: erc:
1032	       S: who:	  Lederberg, Joshua
1033	   10  S: what:	  Studies of Human Families for	Genetic	Linkage
1034	       S: when:	  1974
1035	       S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1036	       S: erc-support:
1037	       S: who:	  USNLM
1038	   15  S: what:	  Permanent, Unchanging	Content
1039	       S: when:	  2001 04 21
1040	       S: where:  http://ark.nlm.nih.gov/yy22948
1041	       S: [closes session]

1043	   Again, a single metadata record (lines 8-17)	is returned, but it
1044	   consists of two segments.  The first	segment	(8-12) gives the same
1045	   basic citation information as in the	previous example.  It is
1046	   returned in order to	establish context for the persistence
1047	   declaration in the second segment (13-17).

1049	   Each	segment	in an ERC tells	a different story relating to the
1050	   object, so although the same	four questions (elements) appear in
1051	   each, the answers depend on the segment's story type.  While	the
1052	   first segment tells the story of an expression of the object, the
1053	   second segment tells	the story of the support commitment made to it:
1054	   who made the	commitment, what the nature of the commitment was, when
1055	   it was made,	and where a fuller explanation of the commitment may be
1056	   found.

1058	7.  Overview of	Electronic Resource Citations (ERCs)

1060	   An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a
1061	   simple, compact, and	printable record designed to hold data
1062	   associated with an information resource.  By	design,	the ERC	is a
1063	   metadata format that	balances the needs for expressive power, very
1064	   simple machine processing, and direct human manipulation.

1066	   A founding principle	of the ERC is that direct human	contact	with
1067	   metadata will be a necessary	and sufficient condition for the near
1068	   term	rapid development of metadata standards, systems, and services.
1069	   Thus	the machine-processable	ERC format must	only minimally strain
1070	   people's ability to read, understand, change, and transmit ERCs
1071	   without their relying on intermediation with	specialized software
1072	   tools.  The basic ERC needs to be succinct, transparent, and
1073	   trivially parseable by software.

1075	   In the current Internet, it is natural seriously to consider	using
1076	   XML as an exchange format because of	predictions that it will obviate
1077	   many	ad hoc formats and programs, and unify much of the world's
1078	   information under one reliable data structuring discipline that is
1079	   easy	to generate, verify, parse, and	render.	 It appears, however,
1080	   that	XML is still only catching on after years of standards work and
1081	   implementation experience.  The reasons for it are unclear, but for
1082	   now very simple XML interpretation is still out of reach.  Another
1083	   important caution is	that XML structures are	hard on	the eyeballs,
1084	   taking up an	amount of display (and page) space that	significantly
1085	   exceeds that	of traditional formats.	 Until these conflicts with ERC
1086	   principle are resolved, XML is not a	first choice for representing
1087	   ERCs.  Borrowing instead from the data structuring format that
1088	   underlies the successful spread of email and	web services, the first
1089	   ERC format is based on email	and HTTP headers (RFC822) [EMHDRS].
1090	   There is a naturalness to its label-colon-value format (seen	in the
1091	   previous section) that barely needs explanation to a	person beginning
1092	   to enter ERC	metadata.

1094	   Besides simplicity of ERC system implementation and data entry
1095	   mechanics, ERC semantics (what the record and its constituent parts
1096	   mean) must also be easy to explain.	ERC semantics are based	on a
1097	   reformulation and extension of the Dublin Core [DCORE] hypothesis,
1098	   which suggests that the fifteen Dublin Core metadata	elements have a
1099	   key role to play in cross-domain resource description.  The ERC
1100	   design recognizes that the Dublin Core's primary contribution is the
1101	   international, interdisciplinary consensus that identified fifteen
1102	   semantic buckets (element categories), regardless of	how they are
1103	   labeled.  The ERC then adds a definition for	a record and some
1104	   minimal compliance rules.  In pursuing the limits of	simplicity, the
1105	   ERC design combines and relabels some Dublin	Core buckets to	isolate
1106	   a tiny kernel (subset) of four elements for basic cross-domain
1107	   resource description.

1109	   For the cross-domain	kernel,	the ERC	uses the four basic elements --
1110	   who,	what, when, and	where -- to pretend that every object in the
1111	   universe can	have a uniform minimal description.  Each has a	name or
1112	   other identifier, a location, some responsible person or party, and a
1113	   date.  It doesn't matter what type of object	it is, or whether one
1114	   plans to read it, interact with it, smoke it, wear it, or navigate
1115	   it.	Of course, this	approach is flawed because uniformity of
1116	   description for some	object types requires more semantic contortion
1117	   and sacrifice than for others.  That	is why at the beginning	of this
1118	   document, the ARK was said to be suited to objects that accommodate
1119	   reasonably regular electronic description.

1121	   While insisting on uniformity at the	most basic level provides
1122	   powerful cross-domain leverage, the semantic	sacrifice is great for
1123	   many	applications.  So the ERC also permits a semantically rich and
1124	   nuanced description to co-exist in a	record along with a basic
1125	   description.	 In that way both sophisticated	and naive recipients of
1126	   the record can extract the level of meaning from it that best suits
1127	   their needs and abilities.  Key to unlocking	the richer description
1128	   is a	controlled vocabulary of ERC record types (not explained in this
1129	   document) that permit knowledgeable recipients to apply defined sets
1130	   of additional assumptions to	the record.

1132	7.1.  ERC Syntax

1134	   An ERC record is a sequence of metadata elements ending in a	blank
1135	   line.  An element consists of a label, a colon, and an optional
1136	   value.  Here	is an example of a record with five elements.

1138		erc:
1139		who: Gibbon, Edward
1140		what: The Decline and Fall of the Roman	Empire
1141		when: 1781
1142		where: http://www.ccel.org/g/gibbon/decline/

1144	   A long value	may be folded (continued) onto the next	line by
1145	   inserting a newline and indenting the next line.  A value can be thus
1146	   folded across multiple lines.  Here are two example elements, each
1147	   folded across four lines.

1149		who/created: University	of California, San Francisco, AIDS
1150		     Program at	San Francisco General Hospital | University
1151		     of	California, San	Francisco, Center for AIDS Prevention
1152		     Studies
1153		what/Topic:
1154		      Heart Attack | Heart Failure
1155		     | Heart
1156				      Diseases

1158	   An element value folded across several lines	is treated as if the
1159	   lines were joined together on one long line.	 For example, the second
1160	   element from	the previous example is	considered equivalent to

1162		what/Topic: Heart Attack | Heart Failure | Heart Diseases

1164	   An element value may	contain	multiple values, each one separated from
1165	   the next by a `|' (pipe) character.	The element from the previous
1166	   example contains three values.

1168	   For annotation purposes, any	line beginning with a `#' (hash)
1169	   character is	treated	as if it were not present; this	is a "comment"
1170	   line	(a feature not available in email or HTTP headers).  For
1171	   example, the	following element is spread across four	lines and
1172	   contains two	values:

1174		what/Topic:
1175		     Heart Attack
1176		#    | Heart Failure  -- hold off until	next review cycle
1177		     | Heart Diseases

1179	7.2.  ERC Stories

1181	   An ERC record is organized into one or more distinct	segments, where
1182	   where each segment tells a story about a different aspect of	the
1183	   information resource.  A segment boundary occurs whenever a segment
1184	   label (an element beginning with "erc") is encountered.  The	basic
1185	   label "erc:"	introduces the story of	an object's expression (e.g.,
1186	   its publication, installation, or performance).  The	label "erc-
1187	   about:"  introduces the story of an object's	content	(what it is
1188	   about) and "erc-support:" introduces	the story of a support
1189	   commitment made to it.  A story segment that	concerns the ERC itself
1190	   is introduced by the	label "erc-from:".  It is an important segment
1191	   that	tells the story	of the ERC's provenance.  Elements beginning
1192	   with	"erc" are reserved for segment labels and their	associated story
1193	   types.  From	an earlier example, here is an ERC with	two segments.

1195	       erc:
1196	       who:    Lederberg, Joshua
1197	       what:   Studies of Human	Families for Genetic Linkage
1198	       when:   1974
1199	       where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1200	       erc-support:
1201	       who:    NIH/NLM/LHNCBC
1202	       what:   Permanent, Unchanging Content
1203	       # Note to ops staff:  date needs	verification.
1204	       when:   2001 04 21
1205	       where:  http://ark.nlm.nih.gov/yy22948

1207	   Segment stories are told according to journalistic tradition.  While
1208	   any number of pertinent elements may	appear in a segment, priority is
1209	   placed on answering the questions who, what,	when, and where	at the
1210	   beginning of	each segment so	that readers can make the most important
1211	   selection or	rejection decisions as soon as possible.  To make things
1212	   simple, the listed ordering of the questions	is maintained in each
1213	   segment (as it happens most people who have been exposed to this
1214	   story telling technique are already familiar	with the above
1215	   ordering).

1217	   The four questions are answered by using corresponding element
1218	   labels.  The	four element labels can	be re-used in each story
1219	   segment, but	their meaning changes depending	on the segment (the
1220	   story type) in which	they appear.  In the example above, "who" is
1221	   first used to name a	document's author and subsequently used	to name
1222	   the permanence guarantor (provider).	 Similarly, "when" first lists
1223	   the date of object creation and in the next segment lists the date of
1224	   a commitment	decision.  Four	labels appearing across	three segments
1225	   effectively map to twelve semantically distinct elements.  Distinct
1226	   element meanings are	mapped to Dublin Core elements in a later
1227	   section.

1229	7.3.  The ERC Anchoring	Story

1231	   Each	ERC contains an	anchoring story.  It is	usually	the first
1232	   segment labeled "erc:" and it concerns an "anchoring" expression of
1233	   the object.	An "anchoring" expression is the one that a provider
1234	   deemed the most suitable basic referent given the audience and
1235	   application for which it produced the ERC.  If it sounds like the
1236	   provider has	great latitude in choosing its anchoring expression, it
1237	   is because it does.	A typical anchoring story in an	ERC for	a born-
1238	   digital document would be the story of the document's release on a
1239	   web site; such a document would then	be the anchoring expression.

1241	   An anchoring	story need not be the central descriptive goal of an ERC
1242	   record.  For	example, a museum provider may create an ERC for a
1243	   digitized photograph	of a painting but choose to anchor it in the
1244	   story of the	original painting instead of the story of the electronic
1245	   likeness; although the ERC may through other	segments prove to be
1246	   centrally concerned with describing the electronic likeness,	the
1247	   provider may	have chosen this particular anchoring story in order to
1248	   make	the ERC	visible	in a way that is most natural to patrons (who
1249	   would find the Mona Lisa under da Vinci sooner than they would find
1250	   it under the	name of	the person who snapped the photograph or scanned
1251	   the image).	In another example, a provider that creates an ERC for a
1252	   dramatic play as an abstract	work has the task of describing	a piece
1253	   of intangible intellectual property.	 To anchor this	abstract object
1254	   in the concrete world, if only through a derivative expression, it
1255	   makes sense for the provider	to choose a suitable printed edition of
1256	   the play as the anchoring object expression (to describe in the
1257	   anchoring story) of the ERC.

1259	   The anchoring story has special rules designed to keep ERC processing
1260	   simple and predictable.  Each of the	four basic elements (who, what,
1261	   when, and where) must be present, unless a best effort to supply it
1262	   fails.  In the event	of failure, the	element	still appears but a
1263	   special value (described later) is used to explain the missing value.
1264	   While the requirement that each of the four elements	be present only
1265	   applies to the anchoring story segment, as usual these elements
1266	   appear at the beginning of the segment and may only be used in the
1267	   prescribed order.  A	minimal	ERC would normally consist of just an
1268	   anchoring story and the element quartet, as illustrated in the next
1269	   example.

1271	       erc:
1272	       who:   National Research	Council
1273	       what:  The Digital Dilemma
1274	       when:  2000
1275	       where: http://books.nap.edu/html/digital%5Fdilemma

1277	   A minimal ERC can be	abbreviated so that it resembles a traditional
1278	   compact bibliographic citation that is nonetheless completely machine
1279	   processable.	 The required elements and ordering makes it possible to
1280	   eliminate the element labels, as shown here.

1282	       erc: National Research Council |	The Digital Dilemma | 2000
1283		      |	http://books.nap.edu/html/digital%5Fdilemma

1285	7.4.  ERC Elements

1287	   As mentioned, the four basic	ERC elements (who, what, when, and
1288	   where) take on different specific meanings depending	on the story
1289	   segment in which they are used.  By appearing in each segment, albeit
1290	   in different	guises,	the four elements serve	as a valuable mnemonic
1291	   device -- a kind of checklist -- for	constructing minimal story
1292	   segments from scratch.  Again, it is	only in	the anchoring segment
1293	   that	all four elements are mandatory.

1295	   Here	are some mappings between ERC elements and Dublin Core [DCORE]
1296	   elements.

1298		Segment	    ERC	Element	    Equivalent Dublin Core Element
1299	       ---------    -----------	    ------------------------------
1300		  erc	       who	    Creator/Contributor/Publisher
1301		  erc	       what		   Title
1302		  erc	       when		   Date
1303		  erc	       where		   Identifier
1304	       erc-about       who		    <none>
1305	       erc-about       what		   Subject
1306	       erc-about       when		   Coverage (temporal)
1307	       erc-about       where		   Coverage (spatial)

1309	   The basic element labels may	also be	qualified to add nuances to the
1310	   semantic categories that they identify.  Elements are qualified by
1311	   appending a `/' (slash) and a qualifier term.  Often	qualifier terms
1312	   appear as the past tense form of a verb because it makes re-using
1313	   qualifiers among elements easier.

1315	       who/published:  ...
1316	       when/published: ...
1317	       where/published:	...

1319	   Using past tense verbs for qualifiers also reminds providers	and
1320	   recipients that element values contain transient assertions that may
1321	   have	been true once,	but that tend to become	less true over time.
1322	   Recipients that don't understand the	meaning	of a qualifier can fall
1323	   back	onto the semantic category (bucket) designated by the
1324	   unqualified element label.  Inevitably recipients (people and
1325	   software) will have diverse abilities in understanding elements and
1326	   qualifiers.

1328	   Any number of other elements	and qualifiers may be used in
1329	   conjunction with the	quartet	of basic segment questions.  The only
1330	   semantic requirement	is that	they pertain to	the segment's story.
1331	   Also, it is only the	four basic elements that change	meaning
1332	   depending on	their segment context.	All other elements have	meaning
1333	   independent of the segment in which they appear.  If	an element label
1334	   stripped of its qualifier is	still not recognized by	the recipient, a
1335	   second fall back position is	to ignore it and rely on the four basic
1336	   elements.

1338	   Elements may	be either Canonical, Provisional, or Local.  Canonical
1339	   elements are	officially recognized via a registry as	part of	the
1340	   metadata vernacular.	 All elements, qualifiers, and segment labels
1341	   used	in this	document up until now belong to	that vernacular.
1342	   Provisional elements	are also officially recognized via the registry,
1343	   but have only been proposed for inclusion in	the vernacular.	 To be
1344	   promoted to the vernacular, a provisional element passes through a
1345	   vetting process during which	its documentation must be in order and
1346	   its community acceptance demonstrated.  Local elements are any
1347	   elements not	officially recognized in the registry.	The registry
1348	   [REG] is a work in progress.

1350	   Local elements can be immediately distinguishable from Canonical or
1351	   Provisional elements	because	all terms that begin with an upper case
1352	   letter are reserved for spontaneous local use.  No term beginning
1353	   with	an upper case letter will ever be assigned Canonical or
1354	   Provisional status, so it should be safe to use such	terms for local
1355	   purposes.  Any recipient of external	ERCs containing	such terms will
1356	   understand them to be part of the originating provider's local
1357	   metadata dialect.  Here's an	example	ERC with three segments, one
1358	   local element, and two local	qualifiers.  The segment boundaries have
1359	   been	emphasized by comment lines (which, as before, are ignored by
1360	   processors).

1362	       erc:
1363	       who: Bullock, TH	| Achimowicz, JZ | Duckrow, RB
1364		       | Spencer, SS | Iragui-Madoz, VJ
1365	       what: Bicoherence of intracranial EEG in	sleep,
1366		       wakefulness and seizures
1367	       when: 1997 12 00
1368	       where: http://cogprints.soton.ac.uk/%{
1369		       documents/disk0/00/00/01/22/index.html %}
1370	       in: EEG Clin Neurophysiol | 1997	12 00 |	v103, i6, p661-678
1371	       IDcode: cog00000122
1372	       # ---- new segment ----
1373	       erc-about:
1374	       what/Subcategory: Bispectrum | Nonlinearity | Epilepsy
1375		       | Cooperativity | Subdural | Hippocampus	| Higher moment
1376	       # ---- new segment ----
1377	       erc-from:
1378	       who: NIH/NLM/NCBI
1379	       what: pm9546494
1380	       when/Reviewed: 1998 04 18 021600
1381	       where: http://ark.nlm.nih.gov/12025/pm9546494?

1383	   The local element "IDcode" immediately precedes the "erc-about"
1384	   segment, which itself contains an element with the local qualifier
1385	   "Subcategory".  The second to last element also carries the local
1386	   qualifier "Reviewed".  Finally, what	might be a provisional element
1387	   "in"	appears	near the end of	the first segment.  It might have been
1388	   proposed as a way to	complete a citation for	an object originally
1389	   appearing inside another object (such as an article appearing in a
1390	   journal or an encyclopedia).

1392	7.5.  ERC Element Values

1394	   ERC element values tend to be straightforward strings.  If the
1395	   provider intends something special for an element, it will so
1396	   indicate with markers at the	beginning of its value string.	The
1397	   markers are designed	to be uncommon enough that they	would not likely
1398	   occur in normal data	except by deliberate intent.  Markers can only
1399	   occur near the beginning of a string, and once any octet of non-
1400	   marker data has been	encountered, no	further	marker processing is
1401	   done	for the	element	value.	In the absence of markers the string is
1402	   considered pure data; this has been the case	with all the examples
1403	   seen	thus far.  The fullest form of an element value	with all three
1404	   optional markers in place looks like	this.

1406	       VALUE =	  [markup_flags]    (:ccode)	,    DATA

1408	   In processing, the first non-whitespace character of	an ERC element
1409	   value is examined.  An initial `[' is reserved to introduce a
1410	   bracketed set of markup flags (not described	in this	document) that
1411	   ends	with `]'.  If ERC data is machine-generated, each value	string
1412	   may be preceded by "[]" to prevent any of its data from being
1413	   mistaken for	markup flags.  Once past the optional markup, the
1414	   remaining value may optionally begin	with a controlled code.	 A
1415	   controlled code always has the form "(:ccode)", for example,

1417	       who: (:unkn) Anonymous
1418	       what: (:791) Bee	Stings

1420	   Any string after such a code	is taken to be an uncontrolled (e.g.,
1421	   natural language) equivalent.  The code "unkn" indicates a
1422	   conventional	explanation for	a missing value	(stating that the value
1423	   is unknown).	 The remainder of the string makes an equivalent
1424	   statement in	a form that the	provider deemed	most suitable to its
1425	   (probably human) audience.  The code	"791" could be a fixed numeric
1426	   topic identifier within an unspecified topic	vocabulary.  Any code
1427	   may be ignored by those that	do not understand it.

1429	   There are several codes to explain different	ways in	which a	required
1430	   element's value may go missing.

1432	       (:unkn)	 unknown (e.g.,	Anonymous, Inconnue)
1433	       (:unav)	 value unavailable indefinitely
1434	       (:unac)	 temporarily inaccessible
1435	       (:unap)	 not applicable, makes no sense
1436	       (:unas)	 value unassigned (e.g., Untitled)
1437	       (:none)	 never had a value, never will
1438	       (:null)	 explicitly empty
1439	       (:unal)	 unallowed, suppressed intentionally

1441	   Once	past an	optional controlled code, the remaining	string value is
1442	   subjected to	one final test.	 If the	first next non-whitespace
1443	   character is	a `,' (comma), it indicates that the string value is
1444	   "sort-friendly".  This means	that the value is (a) laid out with an
1445	   inverted word order useful for sorting items	having comparably laid
1446	   out element values (items might be the containing ERC records) and
1447	   (b) that the	value may contain other	commas that indicate inversion
1448	   points should it become necessary to	recover	the value in natural
1449	   word	order.	Typically, this	feature	is used	to express Western-style
1450	   personal names in family-name-given-name order.  It can also	be used
1451	   wherever natural word order might make sorting tricky, such as when
1452	   data	contains titles	or corporate names.  Here are some example
1453	   elements.

1455	       who:   ,	 van Gogh, Vincent
1456	       who:,Howell, III, PhD, 1922-1987, Thurston
1457	       who:, Acme Rocket Factory, Inc.,	The
1458	       who:, Mao Tse Tung
1459	       who:, McCartney,	Paul, Sir,
1460	       what:, Health and Human Services, United	States Government
1461		       Department of, The,
1462	   There are rules to use in recovering	a copy of the value in natural
1463	   word	order, if desired.  The	above example strings have the following
1464	   natural word	order values, respectively.

1466	       Vincent van Gogh
1467	       Thurston	Howell,	III, PhD, 1922-1987
1468	       The Acme	Rocket Factory,	Inc.
1469	       Mao Tse Tung
1470	       Sir Paul	McCartney
1471	       The United States Government Department of Health and Human Services

1473	7.6.  ERC Element Encoding and Dates

1475	   Some	characters that	need to	appear in ERC element values might
1476	   conflict with special characters used for structuring ERCs, so there
1477	   needs to be a way to	include	them as	literal	characters that	are
1478	   protected from special interpretation.  This	is accomplished	through
1479	   an encoding mechanism that resembles	the %-encoding familiar	to [URI]
1480	   handlers.

1482	   The ERC encoding mechanism also uses	`%', but instead of taking two
1483	   following hexadecimal digits, it takes one non-alphanumeric character
1484	   or two alphabetic characters	that cannot be mistaken	for hex	digits.
1485	   It is designed not to be confused with normal web-style %-encoding.
1486	   In particular it can	be decoded without risking unintended decoding
1487	   of normal %-encoded data (which would introduce errors).  Here are
1488	   the one-character (non-alphanumeric)	ERC encoding extensions.

1490	       ERC	 Purpose
1491	       ---     ------------------------------------------------
1492	       %!      decodes to the element separator	`|'
1493	       %%      decodes to a percent sign `%'
1494	       %.      decodes to a comma `,'
1495	       %_      a non-character used as syntax shim
1496	       %{      a non-character that begins an expansion	block
1497	       %}      a non-character that ends an expansion block

1499	   One particularly useful construct in	ERC element values is the pair
1500	   of special encoding markers ("%{" and "%}") that indicates a
1501	   "expansion" block.  Whatever	string of characters they enclose will
1502	   be treated as if none of the	contained whitespace (SPACEs, TABs,
1503	   Newlines) were present.  This comes in handy	for writing long,
1504	   multi-part URLs in a	readable way.  For example, the	value in
1505	       where: http://foo.bar.org/node%{
1506			  ? db = foo
1507			  & start = 1
1508			  & end	= 5
1509			  & buf	= 2
1510			  & query = foo	+ bar +	zaf
1511		      %}

1513	   is decoded into an equivalent element, but with a correct and intact
1514	   URL:

1516	   where:
1517	    http://foo.bar.org/node?db=foo&start=1&end=5&buf=2&query=foo+bar+zaf

1519	   In a	parting	word about ERC element values, a commonly recurring
1520	   value type is a date, possibly followed by a	time.  ERC dates take on
1521	   one of the following	forms:

1523	       1999		   (four digit year)
1524	       2000 12 29	   (year, month, day)
1525	       2000 12 29 235955   (year, month, day, hour, minute, second)

1527	   21 Spring	   31 1st quarter	   25 Spring (so. hemisphere) 22
1528	   Summer	32 2nd quarter		26 Summer (so. hemisphere) 23
1529	   Fall		33 3rd quarter		27 Fall	(so. hemisphere) 24
1530	   Winter	34 4th quarter		28 Winter (so. hemisphere) In
1531	   dates, all internal whitespace is squeezed out to achieve a
1532	   normalized form suitable for	lexical	comparison and sorting.	 This
1533	   means that the following dates

1535	       2000 12 29 235955	   (recommended	for readability)
1536	       2000 12 29 23 59	55
1537	       20001229	23 59 55
1538	       20001229235955		   (normalized date and	time)

1540	   are all equivalent.	The first form is recommended for readability.
1541	   The last form (shortest and easiest to compute with)	is the
1542	   normalized form.  Hyphens and commas	are reserved to	create date
1543	   ranges and lists, for example,

1545	       1996-2000		   (a range of four years)
1546	       1952, 1957, 1969		   (a list of three years)
1547	       1952, 1958-1967,	1985	   (a mixed list of dates and ranges)
1548	       20001229-20001231	   (a range of three days)

1550	7.7.  ERC Stub Records and Internal Support

1552	   The ERC design introduces the concept of a "stub" record, which is an
1553	   incomplete ERC record intended to be	supplemented with additional
1554	   elements before being released as a standalone ERC record.  A stub
1555	   ERC record has no minimum required elements.	 It is just a group of
1556	   elements that does not begin	with "erc:" but	otherwise conforms to
1557	   the ERC record syntax.

1559	   ERC stubs may be useful in supporting internal procedures using the
1560	   ERC syntax.	Often they rely	on the convenience and accuracy	of
1561	   automatically supplied elements, even the basic ones.  To be	ready
1562	   for external	use, however, an ERC stub must be transformed into a
1563	   complete ERC	record having the usual	required elements.  An ERC stub
1564	   record can be convenient for	metadata embedded in a document, where
1565	   elements such as location, modification date, and size -- which one
1566	   would not omit from an externalized record -- are omitted simply
1567	   because they	are much better	supplied by a computation.  A separate
1568	   local administrative	procedure, not defined for ERC's in general,
1569	   would effect	the promotion of stubs into complete records.

1571	   While the ERC is a general-purpose container	for exchange of	resource
1572	   descriptions, it does not dictate how records must be internally
1573	   stored, laid	out, or	assembled by data providers or recipients.
1574	   Arbitrary internal descriptive frameworks can support ERCs simply by
1575	   mapping (e.g., on demand) local records to the ERC container	format
1576	   and making them available for export.  Therefore, to	support	ERCs
1577	   there is no need for	a data provider	to convert internal data to be
1578	   stored in an	ERC format.  On	the other hand,	any provider (such as
1579	   one just getting started in the business of resource	description) may
1580	   choose to store and manipulate local	data natively in the ERC format.

1582	8.  Advice to Web Clients

1584	   This	section	offers some advice to web client software developers.
1585	   It is hard to write about because it	tries to anticipate a series of
1586	   events that might lead to native web	browser	support	for ARKs.

1588	   ARKs	are envisaged to appear	wherever durable object	references are
1589	   planned.  Library cataloging	records, literature citations, and
1590	   bibliographies are important	examples.  In many of these places URLs
1591	   (Uniform Resource Locators) currently stand in, and URNs, DOIs, and
1592	   PURLs have been proposed as alternatives.

1594	   The strings representing ARKs are also envisaged to appear in some of
1595	   the places where URLs currently appear:  in hypertext links (where
1596	   they	are not	normally shown to users) and in	rendered text (displayed
1597	   or printed).	 Internet search engines, for example, tend to include
1598	   both	actionable and manifest	links when listing each	item found.  A
1599	   normal HTML link for	which the URL is not displayed looks like this.

1601		<a href	= "http://foo.bar.org/index.htm"> Click	Here <a>

1603	   The same link with an ARK instead of	a URL:

1605		<a href	= "ark:/14697/b12345x">	Click Here <a>

1607	   Web browsers	would in general require a small modification to
1608	   recognize and convert this ARK, via mapping authority discovery, to
1609	   the URL form.

1611		<a href	= "http://a.b.org/ark:/14697/b12345x"> Click Here <a>

1613	   A browser that knows	how to make that conversion could also
1614	   automatically detect	and replace a non-working NMAH.

1616	   An NAA will typically make known the	associations it	creates	by
1617	   publishing them in catalogs,	actively advertizing them, or simply
1618	   leaving them	on web sites for visitors (e.g., users,	indexing
1619	   spiders) to stumble across in browsing.

1621	9.  Security Considerations

1623	   The ARK naming scheme poses no direct risk to computers and networks.
1624	   Implementors	of ARK services	need to	be aware of security issues when
1625	   querying networks and filesystems for Name Mapping Authority
1626	   services, and the concomitant risks from spoofing and obtaining
1627	   incorrect information.  These risks are no greater for ARK mapping
1628	   authority discovery than for	other kinds of service discovery.  For
1629	   example, recipients of ARKs with a specified	hostport (NMAH)	should
1630	   treat it like a URL and be aware that the identified	ARK service may
1631	   no longer be	operational.

1633	   Apart from mapping authority	discovery, ARK clients and servers
1634	   subject themselves to all the risks that accompany normal operation
1635	   of the protocols underlying mapping services	(e.g., HTTP, Z39.50).
1636	   As specializations of such protocols, an ARK	service	may limit
1637	   exposure to the usual risks.	 Indeed, ARK services may enhance a kind
1638	   of security by helping users	identify long-term reliable references
1639	   to information objects.

1641	10.  Authors' Addresses

1643	   John	A. Kunze
1644	   Center for Knowledge	Management
1645	   University of California, San Francisco
1646	   530 Parnassus Ave, Box 0840
1647	   San Francisco, CA  94143-0840, USA

1649	   Fax:	  +1 415-476-4653
1650	   EMail: jak@ckm.ucsf.edu
1651	   R. P. C. Rodgers
1652	   US National Library of Medicine
1653	   8600	Rockville Pike,	Bldg. 38A
1654	   Bethesda, MD	 20894

1656	   Fax:	  +1 301-496-0673
1657	   EMail: rodgers@nlm.nih.gov

1659	11.  References

1661	   [DCORE]    Dublin Core Metadata Initiative, "Dublin Core Metadata
1662		      Element Set, Version 1.1:	 Reference Description", July
1663		      1999, http://dublincore.org/documents/dces/.

1665	   [DNS]      P.V. Mockapetris,	"Domain	Names -	Concepts and
1666		      Facilities", RFC 1034, November 1987.

1668	   [DOI]      International DOI	Foundation, "The Digital Object
1669		      Identifier (DOI) System",	February 2001,
1670		      http://dx.doi.org/10.1000/203.

1672	   [EMHDRS]   D. Crocker, "Standard for	the format of ARPA Internet text
1673		      messages", RFC 822, August 1982.

1675	   [ERC]      J. Kunze,	"Electronic Resource Citations", work in
1676		      progress.

1678	   [HKMP]     J. Kunze,	"HTTP Key Mapping Protocol", work in progress.

1680	   [HTTP]     R. Fielding, et al, "Hypertext Transfer Protocol --
1681		      HTTP/1.1", RFC 2616, June	1999.

1683	   [MD5]      R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
1684		      April 1992.

1686	   [NAPTR]    M. Mealling, Daniel, R., "The Naming Authority Pointer
1687		      (NAPTR) DNS Resource Record", RFC	2915, September	2000.

1689	   [NLMPerm]  M. Byrnes, "Defining NLM's Commitment to the Permanence of
1690		      Electronic Information", ARL 212:8-9, October 2000,
1691		      http://www.arl.org/newsltr/212/nlm.html

1693	   [PURL]     K. Shafer, et al,	"Introduction to Persistent Uniform
1694		      Resource Locators", 1996,
1695		      http://purl.oclc.org/OCLC/PURL/INET96

1697	   [REG]      J. Kunze,	"Resource Metadata Vocabulary",	work in
1698		      progress.

1700	   [URI]      T. Berners-Lee, et al, "Uniform Resource Identifiers
1701		      (URI):  Generic Syntax", RFC 2396, August	1998.

1703	   [URNBIB]   C. Lynch,	et al, "Using Existing Bibliographic Identifiers
1704		      as Uniform Resource Names", RFC 2288, February 1998.

1706	   [URNSYN]   R. Moats,	"URN Syntax", RFC 2141,	May 1997.

1708	   [URNNID]   L. Daigle, et al,	"URN Namespace Definition Mechanisms",
1709		      RFC 2611,	June 1999.

1711	   [TELNET]   J. Postel, J.K. Reynolds,	"Telnet	Protocol Specification",
1712		      RFC 854, May 1983.

1714	12.  Appendix:	An NLM Prototype ARK Service

1716	   The US National Library of Medicine (NLM) has an experimental,
1717	   prototype ARK service under development.  It	is being made available
1718	   for purposes	of demonstrating various aspects of the	ARK system, but
1719	   is subject to temporary or permanent	withdrawal (without notice)
1720	   depending upon the circumstances of the small research group
1721	   responsible for making it available.	 It is described at:

1723	       http://ark.nlm.nih.gov/

1725	   Comments and	feedback may be	addressed to rodgers@nlm.nih.gov.

1727	13.  Appendix:	Current	ARK Name Authority Table

1729	   This	appendix contains a copy of the	Name Authority Table (a	file) at
1730	   the time of writing.	 It may	be loaded into a local filesystem (e.g.,
1731	   /etc/natab) for use in mapping NAAs (Name Assigning Authorities) to
1732	   NMAHs (Name Mapping Authority Hostports).  It contains Perl code that
1733	   can be copied into a	standalone script that processes the table (as a
1734	   file).  Because this	is still a proposed file, none of the values in
1735	   it are real.

1737	   #
1738	   # Name Assigning Authority /	Name Mapping Authority Lookup Table
1739	   #	 Last change:	22 February 2001
1740	   #	 Reload	from:	http://ark.nlm.nih.gov/etc/natab
1741	   #	 Mirrored at:	http://www.ckm.ucsf.edu/people/jak/home/etc/natab
1742	   #			http://....../etc/natab
1743	   #	 To register:	mailto:jak@ckm.ucsf.edu?Subject=naareg
1744	   #	 Process with:	Perl script at end of this file	(optional)
1745	   #
1746	   # Each NAA appears at the beginning of a line with the NAA Number
1747	   # first, a colon, and an ARK	or URL to a statement of naming	policy
1748	   # (see http://ark.nlm.nih.gov/naapolicyeg.html for an example).
1749	   # All the NMA hostports that	service	an NAA are listed, one per
1750	   # line, indented, after the corresponding NAA line.
1751	   #
1752	   #   US Library of Congress
1753	   12025:  http://www.loc.gov/xxx/naapolicy.html
1754		   foobar.zaf.org
1755		   sneezy.dopey.com
1756	   #
1757	   #   US National Library of Medicine
1758	   12026:  http://www.nlm.nih.gov/xxx/naapolicy.html
1759		   lhc.nlm.nih.gov:8080
1760		   foobar.zaf.org
1761		   sneezy.dopey.com
1762	   #
1763	   #   US National Agriculture Library
1764	   12027:  http://www.nal.gov/xxx/naapolicy.html
1765		   foobar.zaf.gov:80
1766	   #
1767	   #---	end of data ---
1768	   # The enclosed Perl script takes an NAA as argument and outputs
1769	   # the NMAs in this file listed under	any matching NAA.
1770	   #
1771	   # my	$naa = shift;
1772	   # while (<>)	{
1773	   #	 next if (! /^$naa:/);
1774	   #	 while (<>) {
1775	   #	     last if (!	/^[#\s]./);
1776	   #	     print "$1\n" if (/^\s+(\S+)/);
1777	   #	 }
1778	   # }
1779	   # end of file

1781	14.  Copyright Notice

1783	   Copyright (C) The Internet Society (2002).  All Rights Reserved.

1785	   This	document and translations of it	may be copied and furnished to
1786	   others, and derivative works	that comment on	or otherwise explain it
1787	   or assist in	its implementation may be prepared, copied, published
1788	   and distributed, in whole or	in part, without restriction of	any
1789	   kind, provided that the above copyright notice and this paragraph are
1790	   included on all such	copies and derivative works.  However, this
1791	   document itself may not be modified in any way, such	as by removing
1792	   the copyright notice	or references to the Internet Society or other
1793	   Internet organizations, except as needed for	the  purpose of
1794	   developing Internet standards in which case the procedures for
1795	   copyrights defined in the Internet Standards	process	must be
1796	   followed, or	as required to translate it into languages other than
1797	   English.

1799	   The limited permissions granted above are perpetual and will	not be
1800	   revoked by the Internet Society or its successors or	assigns.

1802	   This	document and the information contained herein is provided on an
1803	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1804	   TASK	FORCE DISCLAIMS	ALL WARRANTIES,	EXPRESS	OR IMPLIED, INCLUDING
1805	   BUT NOT LIMITED TO ANY WARRANTY THAT	THE USE	OF THE INFORMATION
1806	   HEREIN WILL NOT INFRINGE ANY	RIGHTS OR ANY IMPLIED WARRANTIES OF
1807	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1809	   The IETF invites any	interested party to bring to its attention any
1810	   copyrights, patents or patent applications, or other	proprietary
1811	   rights which	may cover technology that may be required to practice
1812	   this	standard.  Please address the information to the IETF Executive
1813	   Director.

1815	Expires	20 August 2002
1816				   Table of Contents

1818	Status of this Document	...........................................    1
1819	Abstract ..........................................................    1
1820	1.  Introduction ..................................................    3
1821	1.1.  Three Reasons to Use ARKs	...................................    3
1822	1.2.  Organizing Support for ARKs .................................    4
1823	1.3.  A	Definition of Identifier ..................................    5
1824	2.  ARK	Anatomy	...................................................    6
1825	2.1.  The Name Mapping Authority Hostport (NMAH) ..................    6
1826	2.2.  The Name Assigning Authority Number (NAAN) ..................    7
1827	2.3.  The Name Part ...............................................    7
1828	2.3.1.	Names that Reveal Object Hierarchy ........................    8
1829	2.3.2.	Names that Reveal Object Variants .........................    9
1830	2.3.3.	Hyphens	are Ignored .......................................   10
1831	2.4.  Normalization and	Lexical	Equivalence .......................   10
1832	2.5.  Naming Considerations .......................................   11
1833	3.  Assigners of ARKs .............................................   12
1834	4.  Finding a Name Mapping Authority ..............................   13
1835	4.1.  Looking Up NMAHs in a Globally Accessible	File ..............   14
1836	4.2.  Looking up NMAHs Distributed via DNS ........................   16
1837	5.  Generic ARK	Service	Definition ................................   19
1838	5.1.  Generic ARK Access Service (access, location) ...............   19
1839	5.2.  Generic Policy Service (permanence, naming, etc.)	 ..........   20
1840	5.3.  Generic Description Service .................................   21
1841	6.  Overview of	the HTTP Key Mapping Protocol (HKMP) ..............   21
1842	7.  Overview of	Electronic Resource Citations (ERCs) ..............   24
1843	7.1.  ERC Syntax ..................................................   25
1844	7.2.  ERC Stories .................................................   26
1845	7.3.  The ERC Anchoring	Story .....................................   27
1846	7.4.  ERC Elements ................................................   28
1847	7.5.  ERC Element Values ..........................................   30
1848	7.6.  ERC Element Encoding and Dates ..............................   32
1849	7.7.  ERC Stub Records and Internal Support .......................   34
1850	8.  Advice to Web Clients .........................................   34
1851	9.  Security Considerations .......................................   35
1852	10.  Authors' Addresses	...........................................   35
1853	11.  References	...................................................   36
1854	12.  Appendix:	An NLM Prototype ARK Service ......................   37
1855	13.  Appendix:	Current	ARK Name Authority Table ..................   37
1856	14.  Copyright Notice .............................................   38