idnits 2.17.1 

draft-kunze-ark-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 4 instances of too long lines in the document, the longest one
     being 5 characters in excess of 72.

  ** The abstract seems to contain references ([NMAH]), which it shouldn't. 
     Please replace those with straight textual mentions of the documents in
     question.

  == There are 13 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 439 has weird spacing: '...eful to  remem...'

  == Line 1673 has weird spacing: '...for the  purpo...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (8 March 2001) is 8444 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCORE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DOI'

  ** Obsolete normative reference: RFC  822 (ref. 'EMHDRS') (Obsoleted by RFC
     2822)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ERC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'HKMP'

  ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')

  ** Obsolete normative reference: RFC 2915 (ref. 'NAPTR') (Obsoleted by RFC
     3401, RFC 3402, RFC 3403, RFC 3404)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NLMPerm'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PURL'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'REG'

  ** Obsolete normative reference: RFC 2396 (ref. 'URI') (Obsoleted by RFC
     3986)

  ** Downref: Normative reference to an Informational RFC: RFC 2288 (ref.
     'URNBIB')

  ** Obsolete normative reference: RFC 2141 (ref. 'URNSYN') (Obsoleted by RFC
     8141)

  ** Obsolete normative reference: RFC 2611 (ref. 'URNNID') (Obsoleted by RFC
     3406)


     Summary: 15 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet-Draft: draft-kunze-ark-01.txt                          J. Kunze
2	ARK Identifier Scheme                    University of California (UCSF)
3	Expires 8 September 2001                                R. P. C. Rodgers
4	                                         US National Library of Medicine
5	                                                            8 March 2001

7	                  The ARK Persistent Identifier Scheme

9	          (http://www.ckm.ucsf.edu/people/jak/home/ark-01.txt)
10	          (http://www.ckm.ucsf.edu/people/jak/home/ark-01.ps)

12	Status of this Document

14	   This document is an Internet-Draft and is in full conformance with
15	   all provisions of Section 10 of RFC2026.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as ``work in progress.''

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   Distribution of this document is unlimited.  Please send comments to
34	   jak@ckm.ucsf.edu.

36	   Copyright (C) The Internet Society (2001).  All Rights Reserved.

38	Abstract

40	   The ARK (Archival Resource Key) is a scheme intended to facilitate
41	   the persistent naming and retrieval of information objects.  It
42	   comprises an identifier syntax and three services.  An ARK has four
43	   components:

45	                  ark:[NMAH]/NAAN/Name

47	   the prefix "ark:", the (optional and mutable) Name Mapping Authority
48	   Hostport (NMAH, where "hostport" is a hostname followed optionally by
49	   a colon and port number), the Name Assigning Authority Number (NAAN),
50	   and the assigned Name.  The NAAN and Name together form the immutable
51	   persistent identifier for the object.

53	   An ARK request is an ARK to which is appended a service request
54	   beginning with a question mark.  Use of an ARK request proceeds in
55	   two steps.  First, the NMAH, if not specified, is discovered based on
56	   the NAAN.  Two methods for discovery are proposed:  one is file
57	   based, the other based on the DNS NAPTR record.  Second, the ARK
58	   request is submitted to the NMAH.  Three ARK services are defined,
59	   gaining access to:  (1) the object (or a sensible substitute), (2) a
60	   description of the object (metadata), and (3) a description of the
61	   commitment made by the NMA regarding the persistence of the object
62	   (policy).  These services are defined initially to use the HTTP
63	   protocol, given the World Wide Web's pre-eminence among Internet
64	   information retrieval systems.  When the NMAH is specified, the
65	   "ark:" prefix may be replaced with "http://", to produce a valid URL
66	   that can gain access to ARK services using an unmodified Web client.

68	1.  Introduction

70	   This document describes a scheme for the high-quality naming of
71	   information resources.  The scheme, called the Archival Resource Key
72	   (ARK), is well suited to long-term access and identification for any
73	   information resources that accommodate reasonably regular electronic
74	   description.  This includes digital documents, databases, software,
75	   and websites, as well as physical objects (such as books, bones, and
76	   statues) and intangible objects (chemicals, diseases, vocabulary
77	   terms, performances).  Hereafter the term "object" refers to an
78	   information resource.  The term ARK itself refers both to the scheme
79	   and to any single identifier that conforms to it.

81	   Schemes for persistent identification of network-accessible objects
82	   are not new.  In the early 1990's, the design of the Uniform Resource
83	   Name [URNSYN] responded to the observed failure rate of URLs by
84	   articulating an indirect, non-hostname-based naming scheme and the
85	   need for responsible name management.  Meanwhile, promoters of the
86	   Digital Object Identifier [DOI] succeeded in building a community of
87	   providers around a mature software system that supports name
88	   management.  The Persistent Uniform Resource Locator [PURL] was a
89	   third scheme that has the unique advantage of working with unmodified
90	   web browsers.  The ARK scheme is a new approach.

92	   A founding principle of the ARK is that persistence is purely a
93	   matter of service.  Persistence is neither inherent in an object nor
94	   conferred on it by a particular naming syntax.  Rather, persistence
95	   is achieved through a provider's successful stewardship of objects
96	   and their identifiers.  The highest level of persistence will be
97	   reinforced by a provider's robust contingency, redundancy, and
98	   succession strategies.  It is further safeguarded to the extent that
99	   a provider's mission is shielded from marketplace and political
100	   instabilities.

102	1.1.  Three Reasons to Use ARKs

104	   The first requirement of an ARK is to give users a link from an
105	   object to a promise of stewardship for it.  That promise is a multi-
106	   faceted covenant that binds the word of an identified service
107	   provider to a specific set of responsibilities.  No one can tell if
108	   successful stewardship will take place because no one can predict the
109	   future.  Reasonable conjecture, however, may be based on past
110	   performance.  There must be a way to tie a promise of persistence to
111	   a provider's demonstrated or perceived ability -- its reputation --
112	   in that arena.  Provider reputations would then rise and fall as
113	   promises are observed variously to be kept and broken.  This is
114	   perhaps the best way we have for gauging the strength of any
115	   persistence promise.

117	   The second requirement of an ARK is to give users a link from an
118	   object to a description of it.  The problem with a naked identifier
119	   is that without a description real identification is incomplete.
120	   Identifiers common today are relatively opaque, though some contain
121	   ad hoc clues that reflect fleeting life cycle events such as the
122	   address of a short stay in a filesystem hierarchy.  Possession of
123	   both an identifier and an object is some improvement, but positive
124	   identification may still be elusive since the object itself need not
125	   include a matching identifier or be transparent enough to reveal its
126	   identity without significant research.  In either case, what is
127	   called for is a record bearing witness to the identifier's
128	   association with the object, as supported by a recorded set of object
129	   characteristics.  This descriptive record is partly an identification
130	   "receipt" with which users and archivists can verify an object's
131	   identity after brief inspection and a plausible match with recorded
132	   characteristics such as title and size.  Among the recorded
133	   characteristics, a checksum (e.g., [MD5]) recorded at the time of
134	   last handling may assist automated identification of digital objects
135	   (although checksums will require recomputation periodically if
136	   extremely persistent objects' bitstreams change as predicted due to
137	   inevitable media migration).

139	   The final requirement of an ARK is to give users a link to the object
140	   itself (or to a copy) if at all possible.  Persistent access is the
141	   central duty of an ARK, with persistent identification playing a
142	   vital but supporting role.  Object access may not be feasible for
143	   various reasons, such as catastrophic loss of the object, a licensing
144	   agreement that keeps an archive "dark" for a period of years, or when
145	   an object's own lack of tangible existence precludes normal concepts
146	   of access (e.g., a vocabulary term might be accessed through its
147	   definition).  In such cases the ARK's identification role assumes a
148	   much higher profile.  But attempts to simplify the persistence
149	   problem by decoupling access from identification and concentrating
150	   exclusively on the latter are of questionable utility.  A perfect
151	   system for assigning forever unique identifiers might be created, but
152	   if it did so without reducing access failure rates, no one would be
153	   interested.  The central issue -- which may be summed up as the "HTTP
154	   404 Not Found" problem -- would not have been addressed.

156	1.2.  Organizing Support for ARKs

158	   Co-location of persistent access and identification services is
159	   natural.  Any organization undertaking persistent identification and
160	   description is in an advantaged position to undertake persistent
161	   access, and vice versa.  The former task becomes all the easier if
162	   the organization controls, owns, or otherwise has clear access to the
163	   objects.  Similarly, the latter cannot be managed without at least
164	   internal support for the former, since collection management
165	   activities such as monitoring, acquisition, verification, and change
166	   control all require record keeping and accountability.  Organizing
167	   ARK services under one roof tends to make sense.

169	   ARK support is not for everybody.  By requiring specific, revealed
170	   commitments to preservation, object access, and description, the bar
171	   for providing ARK services is high.  On the other hand, it would be
172	   hard to grant credence to a persistence promise from an organization
173	   that could not muster the minimum ARK services.  Not that there isn't
174	   a business model for an ARK-like, description-only service built on
175	   top of another organization's full complement of ARK services.  For
176	   example, there might be competition at the description level for
177	   abstracting and indexing a body of scientific literature archived in
178	   a combination of open and fee-based repositories.  Such a business
179	   would benefit more from persistence than it would directly support
180	   it.

182	1.3.  A Definition of Identifier

184	   Heretofore, persistence discussion has been hampered by a borrowed
185	   meaning for "identifier" that emerged as a side effect of defining
186	   the Uniform Resource Identifier in [URI]:

188	        (formerly)  An identifier is a sequence of characters with a
189	        restricted syntax ... that can act as a reference to something
190	        that has identity.

192	   The term works in context, but falters when employed for persistence.
193	   Troubling phrases arise, such as,

195	        "The goal is to create an identifier that does not break."

197	   As defined this kind of identifier "breaks" when it sustains damage
198	   to its character sequence, but really what breaks has to do with the
199	   identifier's reference role.  The following definition is proposed.

201	        (new definition)  An identifier is an association between a
202	        string (a sequence of characters) and an information resource.
203	        That association is made manifest by a record (e.g., a
204	        cataloging or other metadata record) that binds the identifier
205	        string to a set of identifying resource characteristics.

207	   The identifier (the association) must be vouched for by some sort of
208	   record.  In the complete absence of any testimony (e.g., metadata)
209	   regarding an association, a would-be identifier string is a
210	   meaningless sequence of characters.  To keep an externally visible
211	   but otherwise internal identifier string opaque to outsiders, for
212	   example, it suffices for an organization not to disclose the nature
213	   of its association.  For our immediate purpose, actual existence of
214	   an association record is more important than its authenticity.  If
215	   one is lucky an object carries its own identifier as part of itself
216	   (e.g., imprinted on the first page), but in processes such as
217	   resource discovery and retrieval the typical object is often unwieldy
218	   or unavailable (such as when licensing restrictions are in effect).
219	   A metadata record that includes the identifier string is the next
220	   best thing -- a conveniently manipulable surrogate that can act as
221	   both an association "receipt" and "declaration".

223	   It now makes sense to speak of preventing an identifier, as an
224	   association, from breaking.  Having said that, this document still
225	   (ab)uses the terms "ARK" and "identifier" as shorthands to refer to
226	   identifier strings, in other words, to sequences of characters.  Thus
227	   a discussion of ARK syntax refers to a string format, not an
228	   association format.  The context should make the meaning clear.

230	2.  ARK Anatomy

232	   An ARK is represented by a sequence of characters (a string) that
233	   begins with the prefix "ark:".  Here is a diagrammed example.

235	                  ark:foobar.zaf.org/12025/654xz321
236	                  \__/\____________/ \___/ \______/
237	                   |    (optional)     |      |
238	           ARK Prefix       |          |    Name (assigned by the NAA)
239	                            |          |
240	         Name Mapping Authority       Name Assigning Authority
241	                Hostport (NMAH)        Number (NAAN)

243	   The ARK syntax can be summarized,

245	                  ark:[NMAH]/NAAN/Name

247	   where the NMAH is in brackets to indicate that it is optional.

249	2.1.  The Name Mapping Authority Hostport (NMAH)

251	   After the prefix may appear an optional Name Mapping Authority
252	   Hostport (NMAH) that is a temporary address where ARK service
253	   requests may be sent.  It consists of an Internet hostname or
254	   hostport combination having the same format and semantics as the
255	   hostport part of a URL.  The most important thing about the NMAH is
256	   that it is "identity inert" from the point of view of object
257	   identification.  In other words, ARKs that differ only in the
258	   optional NMAH part identify the same object.  Thus, for example, the
259	   following three ARKs are synonyms for but one information resource:

261	                    ark:foobar.zaf.org/12025/654xz321
262	                  ark:sneezy.dopey.com/12025/654xz321
263	                                  ark:/12025/654xz321

265	   The NMAH makes it easy to derive an identifier that is actionable in
266	   today's web browsers (i.e., a URL).  This amounts to substituting
267	   "http://" for the "ark:" prefix (although in most browsers simply
268	   deleting "ark:" works).  The first example ARK above thus becomes

270	                  http://foobar.zaf.org/12025/654xz321

272	   The NMAH part is temporary, disposable, and replaceable.  Over time
273	   the NMAH will likely stop working and have to be replaced with a
274	   currently active service provider.  This relies on a mapping
275	   authority discovery process, of which two alternate methods are
276	   outlined in a later section.  Meanwhile, a carefully chosen NMAH can
277	   be as durable as any Internet domain name, and so may last for a
278	   decade or longer.  Users should be prepared, however, to refresh the
279	   NMAH because the one found in an ARK may have stopped working.

281	   The above method for creating an actionable identifier from an ARK
282	   (replacing "ark:" with "http://") is also temporary.  Assuming that
283	   the reign of [HTTP] in information retrieval will end one day, ARKs
284	   will have to be converted into new kinds of actionable identifiers.
285	   In any event, if ARKs see widespread use, web browsers would
286	   presumably evolve to perform this simple transformation
287	   automatically.

289	2.2.  The Name Assigning Authority Number (NAAN)

291	   The next part of the ARK is the Name Assigning Authority Number
292	   (NAAN) enclosed in `/' (slash) characters.  This part is always
293	   required, as it identifies the organization that originally assigned
294	   the Name of the object.  It is used to discover a currently valid
295	   NMAH and to provide top-level partitioning of the space of all ARKs.
296	   NAANs are registered in a manner similar to URN Namespaces, but they
297	   are pure numbers consisting of 5 digits or 9 digits.  Thus, the first
298	   100,000 registered NAAs fit compactly into the 5 digits, and if
299	   growth warrants, the next billion fit into the 9 digit form.  In
300	   either case the fixed odd number of digits helps reduce the chances
301	   of finding a NAAN out of context and confusing it with nearby
302	   quantities such as 4-digit dates.

304	2.3.  The Name Part

306	   The final part of the ARK is the Name assigned by the NAA, and it is
307	   also required.  The Name is a string of visible ASCII characters and
308	   should be less than 128 bytes in length.  The length restriction
309	   keeps the ARK short enough to append ordinary ARK request strings
310	   without running into transport restrictions within HTTP GET requests.
311	   Characters may be letters, digits, or any of these seven characters:

313	       =   @   $   _   *   '   #

315	   The characters `/', `+', and `?' are reserved and must not be used at
316	   this time.  A `-' (hyphen) may appear in an ARK, but must be ignored
317	   in lexical comparisons.  The `%' character is reserved for %-encoding
318	   all other octets that would appear in the ARK string, in the same
319	   manner as for URIs [URI].  A %-encoded octet consists of a `%'
320	   followed by two hex digits; for example, "%7d" stands in for `}'.
321	   Lower case hex digits are preferred to reduce the chances of false
322	   acronym recognition; thus it is better to use "%acT" instead of
323	   "%ACT".  The character `%' itself must be represented using "%25".
324	   As with URNs, %-encoding permits ARKs to support legacy namespaces
325	   (e.g., ISBN, ISSN, SICI) that have less restricted character
326	   repertoires [URNBIB].

328	   The creation of names that include linguistically based constructs
329	   (having recognizable meaning from natural language) is strongly
330	   discouraged if long-term persistence is a naming priority.  Such
331	   names do not age or travel well.  Names that look more or less like
332	   numbers avoid common problems that defeat persistence and
333	   international acceptance.  The use of digits is highly recommended.
334	   Mixing in non-vowel alphabetic characters is a relatively safe and
335	   easy way to achieve more compact names, although any character
336	   repertoire can work if potentially troublesome names will be
337	   discarded during a screening process.

339	2.4.  Lexical Equivalence

341	   Hyphens are always ignored in ARKs.  Hyphens may be added to an ARK's
342	   Name part for readability, or during the formatting and wrapping of
343	   text lines, but (as in phone numbers) they are treated as if they
344	   were not present.  Thus, like the NMAH, hyphens are "identity inert"
345	   in comparing ARKs for equivalence.  For example, the following ARKs
346	   are equivalent for purposes of comparison and ARK service access:

348	                                  ark:/12025/65-4-xz-321
349	                  ark:sneezy.dopey.com/12025/654--xz32-1
350	                                  ark:/12025/654xz321

352	   To determine if two or more ARKs identify the same object, the ARKs
353	   are compared for lexical equivalence after first being normalized.
354	   Since ARK strings may appear in various forms (e.g., having different
355	   NMAHs), normalizing them minimizes the chances that comparing two ARK
356	   strings for equality will fail unless they actually identify
357	   different objects.  In a specified-host ARK (one having an NMAH), the
358	   NMAH never participates in such comparisons.

360	   Normalization of ARKs for the purpose of octet-by-octet equality
361	   comparison consists of two steps for each ARK.  First, any upper case
362	   letters in the "ark:" prefix and %-encoded hex digits are converted
363	   to lower case.  The case of all other letters in the ARK string must
364	   be preserved.  Then, any NMAH is removed and all hyphens are removed.
365	   The resulting ARK string is now normalized.  Comparisons between
366	   normalized ARKs are case-sensitive, meaning that upper case letters
367	   are considered different from their lower case counterparts.

369	   To keep ARK string variation to a minimum, no reserved ARK characters
370	   should be %-encoded unless it is deliberately to conceal their
371	   reserved meanings.  No non-reserved ARK characters should ever be %-
372	   encoded.  Finally, no %-encoded character should ever appear in an
373	   ARK in its decoded form.

375	2.5.  Naming Considerations

377	   The ARK has different goals from the URI, so it has different
378	   character set requirements.  Because linguistic constructs imperil
379	   persistence, for ARKs non-ASCII character support is unimportant.
380	   ARKs and URIs share goals of transcribability and transportability
381	   within web documents, so characters are required to be visible, non-
382	   conflicting with HTML/XML syntax, and not subject to tampering during
383	   transmission across common transport gateways.  Add the goal of
384	   making an undelimited ARK recognizable in running prose, as in
385	   ark:/12025/=@_22*$, and certain punctuation characters (e.g., comma,
386	   period) end up being excluded from the ARK lest the end of a phrase
387	   or sentence be mistaken as part of the ARK.

389	   A valuable technique for provision of persistent objects is to try to
390	   have the complete identifier appear on, with, or near its retrieved
391	   object.  An object encountered at a moment in time when its discovery
392	   context has long since disappeared could then easily be traced back
393	   to its metadata, to alternate versions, to updates, etc.  This has
394	   seen reasonable success, for example, in book publishing and software
395	   distribution.

397	   If persistence is the goal, a deliberate local strategy for
398	   systematic name assignment is crucial.  Names must be chosen with
399	   great care.  Poorly chosen and managed names will devastate any
400	   persistence strategy, and they do not discriminate based on naming
401	   scheme.  Whether a mistakenly re-assigned identifier is a URN, DOI,
402	   PURL, URL, or ARK, the damage -- failed access and confusion -- is
403	   not mitigated more in one scheme than in another.  Conversely, in-
404	   house efforts to manage names responsibly will go much further
405	   towards safeguarding persistence than any choice of naming scheme or
406	   name resolution technology.

408	   Hostnames appearing in any identifier meant to be persistent must be
409	   chosen with extra care.  The tendency in hostname selection has
410	   traditionally been to choose a token with recognizable attributes,
411	   such as a corporate brand, but that tendency wreaks havoc with
412	   persistence that is to outlive brands, corporations, subject
413	   classifications, and natural language semantics (e.g., what did the
414	   three letters "gay" mean forty, twenty, and two years ago?).  Today's
415	   recognized and correct attributes are tomorrow's stale or incorrect
416	   attributes.  In making hostnames (any names, actually) long-term
417	   persistent, it helps to eliminate recognizable identity to the extent
418	   possible.  This affects selection of any name based on URLs,
419	   including PURLs and the explicitly disposable NMAHs.  There is no
420	   excuse for a provider that manages its internal names impeccably not
421	   to exercise the same care in choosing what could be an exceptionally
422	   durable hostname, especially if it would form the prefix for all the
423	   provider's URL-based external names.  Registering an opaque hostname
424	   in the ".org" or ".net" domain would not be a bad start.

426	   Dubious persistence speculation does not make selecting naming
427	   strategies any easier.  For example, despite rumors to the contrary,
428	   there are really no obvious reasons why the organizations registering
429	   DNS names, URN Namespaces, and DOI publisher IDs should have among
430	   them one that is intrinsically more fallible than the next.
431	   Moreover, it is a misconception that the demise of DNS and of HTTP
432	   need adversely affect the persistence of URLs.  At such a time,
433	   certainly URLs from the present day would not then be actionable by
434	   our present-day mechanisms, but resolution systems for future non-
435	   actionable URLs are no harder to imagine than resolution systems for
436	   present-day non-actionable URNs and DOIs.  There is no more stable a
437	   namespace than one that is dead and frozen, and that would then
438	   characterize the space of names bearing the "http://" prefix.  It is
439	   useful to  remember that just because hostnames have been carelessly
440	   chosen in their brief history does not mean that they are unsuitable
441	   in NMAHs (and URLs) intended for use in situations demanding the
442	   highest level of persistence available in the Internet environment.
443	   A well-planned name assignment strategy is everything.

445	3.  Assigners of ARKs

447	   A Name Assigning Authority (NAA) is an organization that creates (or
448	   delegates creation of) long-term associations between identifiers and
449	   information objects.  Examples of NAAs include national libraries,
450	   national archives, and publishers.  An NAA may arrange with an
451	   external organization for identifier assignment.  The US Library of
452	   Congress, for example, allows OCLC (the Online Computer Library
453	   Center, a major world cataloger of books) to create associations
454	   between Library of Congress call numbers (LCCNs) and the books that
455	   OCLC processes.  A cataloging record is generated that testifies to
456	   each association, and the identifier is included by the publisher in
457	   places like the front matter of a book.

459	   An NAA does not so much create an identifier as create an
460	   association.  The NAA first draws an identifier from its namespace,
461	   which is the set of all identifiers under its control.  It then
462	   records the assignment of the identifier to an information object
463	   having sundry witnessed characteristics, such as a particular author
464	   and modification date.  A namespace is usually reserved for an NAA by
465	   agreement with recognized community organizations (such as IANA and
466	   ISO) that all names containing a particular string be under its
467	   control.  In the ARK an NAA is represented by the Name Assigning
468	   Authority Number (NAAN).

470	   The ARK namespace reserved for an NAA is the set of names bearing its
471	   particular NAAN.  For example, all strings beginning with
472	   "ark:/12025/" are under control of the NAA registered under 12025,
473	   which might be the National Library of Finland.  Because each NAA has
474	   a different NAAN, names from one namespace cannot conflict with those
475	   from another.  Each NAA is free to assign names from its namespace
476	   (or delegate assignment) according to its own policies.  These
477	   policies must be documented in a manner similar to the declarations
478	   required for URN Namespace registration [URNNID].

480	   For now, registration of ARK NAAs is in a bootstrapping phase.  To
481	   register, please read about the mapping authority discovery file in
482	   the next section and send email to jak@ckm.ucsf.edu.

484	4.  Finding a Name Mapping Authority

486	   In order to derive an actionable identifier from an ARK, a hostport
487	   (hostname or hostname plus port combination) for a working Name
488	   Mapping Authority (NMA) must be found.  An NMA is a service that is
489	   able to respond to the three basic ARK service requests.  Relying on
490	   registration and client-side discovery, NMAs make known which NAAs'
491	   identifiers they are willing to service.

493	   Upon encountering an ARK, a user (or client software) looks inside it
494	   for the optional NMAH part (the hostport of the NMA's ARK service).
495	   If it contains an NMAH that is working, this NMAH discovery step may
496	   be skipped; the NMAH effectively uses the beginning of an ARK to
497	   cache the results of a prior mapping authority discovery process.  If
498	   a new NMAH needs to found, the client looks inside the ARK again for
499	   the NAAN (Name Assigning Authority Number).  Querying a global
500	   database, it then uses the NAAN to look up all current NMAHs that
501	   service ARKs issued by the identified NAA.  The global database is
502	   key, and two specific methods for querying it are given in this
503	   section.

505	   In the interests of long-term persistence, however, ARK mechanisms
506	   are first defined in high-level, protocol-independent terms so that
507	   mechanisms may evolve and be replaced over time without compromising
508	   fundamental service objectives.  Either or both specific methods
509	   given here may eventually be supplanted by better methods since, by
510	   design, the ARK scheme does not depend on a particular method, but
511	   only on having some method to locate an active NMAH.

513	   At the time of issuance, at least one NMAH for an ARK should be
514	   prepared to service it.  That NMA may or may not be administered by
515	   the Name Assigning Authority (NAA) that created it.  Consider the
516	   following hypothetical example of providing long-term access to a
517	   cancer research journal.  The publisher wishes to turn a profit and
518	   the National Library of Medicine wishes to preserve the scholarly
519	   record.  An agreement might be struck whereby the publisher would act
520	   as the NAA and the national library would archive the journal issue
521	   when it appears, but without providing direct access for the first
522	   six months.  During the first six months of peak commercial
523	   viability, the publisher would retain exclusive delivery rights and
524	   would charge access fees.  Again, by agreement, both the library and
525	   the publisher would act as NMAs, but during that initial period the
526	   library would redirect requests for issues less than six months old
527	   to the publisher.  At the end of the waiting period, the library
528	   would then begin servicing requests for issues older than six months
529	   by tapping directly into its own archives.  Meanwhile, the publisher
530	   might routinely redirect incoming requests for older issues to the
531	   library.  Long-term access is thereby preserved, and so is the
532	   commercial incentive to publish content.

534	   There is never a requirement that an NAA also run an NMA service,
535	   although it seems not an unlikely scenario.  Over time NAAs and NMAs
536	   would come and go.  One NMA would succeed another, and there might be
537	   many NMAs serving the same ARKs simultaneously (e.g., as mirrors or
538	   as competitors).  There might also be asymmetric but coordinated NMAs
539	   as in the library-publisher example above.

541	4.1.  Looking Up NMAHs in a Globally Accessible File

543	   This subsection describes a way to look up NMAHs using a simple text
544	   file.  For efficient access the file may be stored in a local
545	   filesystem, but it needs to be reloaded periodically to incorporate
546	   updates.  It is not expected that the size of the file or frequency
547	   of update should impose an undue maintenance or searching burden any
548	   time soon, for even primitive linear search of a file with ten-
549	   thousand NAAs is a subsecond operation on modern server machines.
550	   The proposed file strategy is similar to the /etc/hosts file strategy
551	   that supported Internet host address lookup for a period of years
552	   before the advent of the Domain Name System [DNS].

554	   A copy of the current file (at the time of writing) appears in an
555	   appendix and is available on the web.  A minimal version of the file
556	   appears below.  Comment lines (lines that begin with `#') explain the
557	   format and give the file's modification time, reloading address, and
558	   NAA registration instructions.  There is even a Perl script that
559	   processes the file embedded in the file's comments.  Because this is
560	   still a proposed file, none of the values in it are real.

562	       #
563	       # Name Assigning Authority / Name Mapping Authority Lookup Table
564	       #     Last change:   22 February 2001
565	       #     Reload from:   http://ark.nlm.nih.gov/etc/natab
566	       #     Mirrored at:   http://www.ckm.ucsf.edu/people/jak/home/etc/natab
567	       #                    http://....../etc/natab
568	       #     To register:   mailto:jak@ckm.ucsf.edu?Subject=naareg
569	       #     Process with:  Perl script at end of this file (optional)
570	       #
571	       # Each NAA appears at the beginning of a line with the NAA Number
572	       # first, a colon, and an ARK or URL to a statement of naming policy
573	       # (see http://ark.nlm.nih.gov/naapolicyeg.html for an example).
574	       # All the NMA hostports that service an NAA are listed, one per
575	       # line, indented, after the corresponding NAA line.
576	       #
577	       #   US Library of Congress
578	       12025:  http://www.loc.gov/xxx/naapolicy.html
579	               foobar.zaf.org
580	               sneezy.dopey.com
581	       #
582	       #   US National Library of Medicine
583	       12026:  http://www.nlm.nih.gov/xxx/naapolicy.html
584	               lhc.nlm.nih.gov:8080
585	               foobar.zaf.org
586	               sneezy.dopey.com
587	       #
588	       #   US National Agriculture Library
589	       12027:  http://www.nal.gov/xxx/naapolicy.html
590	               foobar.zaf.gov:80
591	       #
592	       #--- end of data ---
593	       # The enclosed Perl script takes an NAA as argument and outputs
594	       # the NMAs in this file listed under any matching NAA.
595	       #
596	       # my $naa = shift;
597	       # while (<>) {
598	       #     next if (! /^$naa:/);
599	       #     while (<>) {
600	       #         last if (! /^[#\s]./);
601	       #         print "$1\n" if (/^\s+(\S+)/);
602	       #     }
603	       # }
604	       # end of file

606	4.2.  Looking up NMAHs Distributed via DNS

608	   This subsection introduces a method for looking up NMAHs that is
609	   based on the method for discovering URN resolvers described in
610	   [NAPTR].  It relies on querying the DNS system already installed in
611	   the background infrastructure of most networked computers.  A query
612	   is submitted to DNS asking for a list of resolvers that match a given
613	   NAAN.  DNS distributes the query to the particular DNS servers that
614	   can best provide the answer, unless the answer can be found more
615	   quickly in a local DNS cache as a side-effect of a recent query.
616	   Responses come back inside Name Authority Pointer (NAPTR) records.
617	   The normal result is one or more candidate NMAHs.

619	   In its full generality the [NAPTR] algorithm ambitiously accommodates
620	   a complex set of preferences, orderings, protocols, mapping services,
621	   regular expression rewriting rules, and DNS record types.  This
622	   subsection proposes a drastic simplification of it for the special
623	   case of ARK mapping authority discovery.  The simplified algorithm is
624	   called Maptr.  It uses only one DNS record type (NAPTR) and restricts
625	   most of its field values to constants.  The following hypothetical
626	   excerpt from a DNS data file for the NAAN known as 12026 shows three
627	   example NAPTR records ready to use with the Maptr algorithm.

629	     12026.ark.arpa.
630	     ;; US Library of Congress
631	     ;;       order pref flags service regexp replacement
632	      IN NAPTR  0     0   "h"  "ark"     ""   lhc.nlm.nih.gov:8080
633	      IN NAPTR  0     0   "h"  "ark"     ""   foobar.zaf.org
634	      IN NAPTR  0     0   "h"  "ark"     ""   sneezy.dopey.com

636	   All the fields are held constant for Maptr except for the "flags" and
637	   "replacement" fields.  The "service" field contains the constant
638	   value "ark" so that NAPTR records participating in the Maptr
639	   algorithm will not be confused with other NAPTR records.  The "order"
640	   and "pref" fields are held to 0 (zero) and otherwise ignored for now;
641	   the algorithm may evolve to use these fields for ranking decisions
642	   when usage patterns and local administrative needs are better
643	   understood.

645	   When a Maptr query returns a record with a flags field of "h" (for
646	   hostport, a Maptr extension to the NAPTR flags), the replacement
647	   field contains the NMAH (hostport) of an ARK service provider.  When
648	   a query returns a record with a flags field of "" (the empty string),
649	   the client needs to submit a new query containing the domain name
650	   found in the replacement field.  This second sort of record exploits
651	   the distributed nature of DNS by redirecting the query to another
652	   domain name.  It looks like this.

654	     12345.ark.arpa.
655	     ;; Digital Library Consortium
656	     ;;       order pref flags service regexp replacement
657	      IN NAPTR  0     0    ""  "ark"     ""   dlc.spct.org.

659	   Here is the Maptr algorithm for ARK mapping authority discovery.  In
660	   it replace <NAAN> with the NAAN from the ARK for which an NMAH is
661	   sought.

663	        (1) Initialize the DNS query:  type=NAPTR,
664	        query=<NAAN>.ark.arpa.

666	        (2) Submit the query to DNS and retrieve (NAPTR) records,
667	        discarding any record that does not have "ark" for the service
668	        field.

670	        (3) All remaining records with a flags fields of "h" contain
671	        candidate NMAHs in their replacement fields.  Set them aside, if
672	        any.

674	        (4) Any record with an empty flags field ("") has a replacement
675	        field containing a new domain name to which a subsequent query
676	        should be redirected.  For each such record, set
677	        query=<replacement> then go to step (2).  When all such records
678	        have been recursively exhausted, go to step (5).

680	        (5) All redirected queries have been resolved and a set of
681	        candidate NMAHs has been accumulated from steps (3).  If there
682	        are zero NMAHs, exit -- no mapping authority was found.  If
683	        there is one or more NMAH, choose one using any criteria you
684	        wish, then exit.

686	   The global database thus distributed via DNS and the Maptr algorithm
687	   can easily be seen to mirror the contents of the Name Authority Table
688	   file described in the previous section.

690	5.  Generic ARK Service Definition

692	   An ARK request's output is delivered information; examples include
693	   the object itself, a policy declaration (e.g., a promise of support),
694	   a descriptive metadata record, or an error message.  ARK services
695	   must be couched in high-level, protocol-independent terms if
696	   persistence is to outlive today's networking infrastructural
697	   assumptions.  The high-level ARK service definitions listed below are
698	   followed in the next section by a concrete method (one of many
699	   possible methods) for delivering these services with today's
700	   technology.

702	5.1.  Generic ARK Access Service (access, location)

704	   Returns (a copy of) the object or a redirect to the same, although a
705	   sensible object proxy may be substituted.  Examples of sensible
706	   substitutes include,

708	     - a table of contents instead of a large complex document,
709	     - a home page instead of an entire web site hierarchy,
710	     - a rights clearance challenge before accessing protected data,
711	     - directions for access to an offline object (e.g., a book),
712	     - a description of an intangible object (a disease, an event), or
713	     - an applet acting as "player" for a large multimedia object.

715	   May also return a discriminated list of alternate object locators.
716	   If access is denied, returns an explanation of the object's current
717	   (perhaps permanent) inaccessibility.

719	5.2.  Generic Policy Service (permanence, naming, etc.)

721	   Returns declarations of policy and support commitments for given
722	   ARKs.  Declarations are returned in either a structured metadata
723	   format or a human readable text format; sometimes one format may
724	   serve both purposes.  Policy subareas may be addressed in separate
725	   requests, but the following areas should should be covered:  object
726	   permanence, object naming, object fragment addressing, and
727	   operational service support.

729	   The permanence declaration for an object is a rating defined with
730	   respect to an identified permanence provider (guarantor), and may
731	   include the following aspects.  One permanence rating framework is
732	   given in [NLMPerm].

734	        (a) "object availability" -- whether and how access to the
735	        object is supported (e.g., online 24x7, or offline only),

737	        (b) "identifier validity" -- under what conditions the
738	        identifier will be or has been re-assigned,

740	        (c) "content invariance" -- under what conditions the content of
741	        the object is subject to change, and

743	        (d) "change history" -- documentation, whether abbreviated or
744	        detailed, of any or all corrections, migrations, revisions, etc.

746	   Naming policy for an object includes an historical description of the
747	   NAA's (and its successor NAA's) policies regarding differentiation of
748	   objects.  It may include the following aspects.

750	        (e) "similarity" -- (or "unity") the limit, defined by the NAA,
751	        to the level of dissimilarity beyond which two similar objects
752	        warrant separate identifiers but before which they share one
753	        single identifier, and

755	        (f) "granularity" -- the limit, defined by the NAA, to the level
756	        of object subdivision beyond which sub-objects do not warrant
757	        separately assigned identifiers but before which sub-objects are
758	        assigned separate identifiers.

760	   Addressing policy for an object includes a description of how, during
761	   access, object components (e.g., paragraphs, sections) or views
762	   (e.g., image conversions) may or may not be "addressed", in other
763	   words, how the NMA permits arguments or parameters to modify the
764	   object delivered as the result of an ARK request.  If supported,
765	   these sorts of operations would provide things like byte-ranged
766	   fragment delivery and open-ended format conversions, or any set of
767	   possible transformations that would be too numerous to list or to
768	   identify with separately assigned ARKs.

770	   Operational service support policy includes a description of general
771	   operational aspects of the NMA service, such as after-hours staffing
772	   and trouble reporting procedures.

774	5.3.  Generic Description Service

776	   Returns a description of the object.  Descriptions are returned in
777	   either a structured metadata format or a human readable text format;
778	   sometimes one format may serve both purposes.  A description must at
779	   a minimum answer the who, what, when, and where questions concerning
780	   an expression of the object.  Standalone descriptions should be
781	   accompanied by the modification date and source of the description
782	   itself.  May also return discriminated lists of ARKs that are related
783	   to the given ARK.

785	6.  Overview of the HTTP Key Mapping Protocol (HKMP)

787	   The HTTP Key Mapping Protocol (HKMP) is a way of taking a key (a kind
788	   of identifier) and asking such questions as, what information does
789	   this identify and how permanent is it?  [HKMP] is in fact one
790	   specific method under development for delivering ARK services.  The
791	   protocol runs over HTTP to exploit the web browser's current pre-
792	   eminence as user interface to the Internet.  HKMP is designed so that
793	   a person can enter ARK requests directly into the location field of
794	   current browser interfaces.  Because it runs over HTTP, HKMP can be
795	   simulated and tested within keyboard-based [TELNET] sessions.

797	   The asker (a person or client program) starts with an identifier,
798	   such as an ARK or a URL.  The identifier reveals to the asker (or
799	   allows the asker to infer) the Internet host name and port number of
800	   a server system that responds to questions.  Here, this is just the
801	   NMAH that is obtained by inspection and possibly lookup based on the
802	   ARK's NAAN.  The asker then sets up an HTTP session with the server
803	   system, sends a question via an HKMP request (contained within an
804	   HTTP request), receives an answer via an HKMP response (contained
805	   within an HTTP response), and closes the session.  That concludes the
806	   connected portion of the protocol.

808	   An HKMP request is a string of characters beginning with a `?'
809	   (question mark) that is appended to the identifier string.  The
810	   resulting string is sent as an argument to HTTP's GET command.
811	   Request strings too long for GET may be sent using HTTP's POST
812	   command.  The three most common requests correspond to three
813	   degenerate special cases that keep the user's learning and typing
814	   burden low.  First, a simple key with no request at all is the same
815	   as an ordinary access request.  Thus a plain ARK entered into a
816	   browser's location field behaves much like a plain URL, and returns
817	   access to the primary identified object, for instance, an HTML
818	   document.

820	   The second special case is a minimal ARK description request string
821	   consisting of just "?".  For example, entering the string,

823	           ark.nlm.nih.gov/12025/psbbantu?

825	   into the browser's location field directly precipitates a request for
826	   a metadata record describing the object identified by
827	   ark:/12025/psbbantu.  The browser, unaware of HKMP, prepares and
828	   sends an HTTP GET request in the same manner as for a URL.  HKMP is
829	   designed so that the response (indicated by the returned HTTP content
830	   type) is normally displayed, whether the output is structured for
831	   machine processing (text/plain) or formatted for human consumption
832	   (text/html).

834	   In the following example HKMP session, each line has been annotated
835	   to include a line number and whether it was the client or server that
836	   sent it.  Without going into much depth, the session has three pieces
837	   separated from each other by blank lines:  the client's piece (lines
838	   1-3), the server's HTTP/HKMP response headers (4-7), and the body of
839	   the server's response (8-13).  The first and last lines (1 and 13)
840	   correspond to the client's steps to start the TCP session and the
841	   server's steps to end it, respectively.

843	    1  C: ...  [opens session]
844	       C: GET http://ark.nlm.nih.gov/12025/psbbantu? HTTP/1.1
845	       C:
846	       S: HTTP/1.1 200 OK
847	    5  S: Content-Type: text/plain
848	       S: HKMP-Status: 0.1 200 OK
849	       S:
850	       S: erc:
851	       S: who:    Lederberg, Joshua
852	   10  S: what:   Studies of Human Families for Genetic Linkage
853	       S: when:   1974
854	       S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
855	       S: ...  [closes session]

857	   The first two server response lines (4-5) above are typical of HTTP.
858	   The next line (6) is peculiar to HKMP, and indicates the HKMP version
859	   and a normal return status.  The balance of the response (8-11) is
860	   the single metadata record that comprises the ARK description service
861	   response.  The record is in the format of an Electronic Resource
862	   Citation [ERC], which is discussed in more detail in the next
863	   section.  For now, note that it contains four elements that answer
864	   the top priority questions regarding an expression of the object:
865	   who played a major role in expressing it, what the expression was
866	   called, when is was created, and where the expression may be found.
867	   This quartet of elements comes up again and again in ERCs.

869	   The third degenerate special case of an ARK request (and no other
870	   cases will be described in this document) is the string "??",
871	   corresponding to a minimal permanence policy request.  It can be seen
872	   in use appended to an ARK (on line 2) in the example session that
873	   follows.

875	    1  C: ...  [opens session]
876	       C: GET http://ark.nlm.nih.gov/12025/psbbantu?? HTTP/1.1
877	       C:
878	       S: HTTP/1.1 200 OK
879	    5  S: Content-Type: text/plain
880	       S: HKMP-Status: 0.1 200 OK
881	       S:
882	       S: erc:
883	       S: who:    Lederberg, Joshua
884	   10  S: what:   Studies of Human Families for Genetic Linkage
885	       S: when:   1974
886	       S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
887	       S: erc-support:
888	       S: who:    NIH/NLM/LHNCBC
889	   15  S: what:   Permanent, Unchanging Content
890	       S: when:   2001 04 21
891	       S: where:  http://ark.nlm.nih.gov/yy22948
892	       S: ...  [closes session]

894	   Again, a single metadata record (lines 8-17) is returned, but it
895	   consists of two segments.  The first segment (8-12) gives the same
896	   basic citation information as in the previous example.  It is
897	   returned in order to establish context for the persistence
898	   declaration in the second segment (13-17).

900	   Each segment in an ERC tells a different story relating to the
901	   object, so although the same four questions (elements) appear in
902	   each, the answers depend on the segment's story type.  While the
903	   first segment tells the story of an expression of the object, the
904	   second segment tells the story of the support commitment made to it:
905	   who made the commitment, what the nature of the commitment was, when
906	   it was made, and where a fuller explanation of the commitment may be
907	   found.

909	7.  Overview of Electronic Resource Citations (ERCs)

911	   An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a
912	   simple, compact, and printable record designed to hold data
913	   associated with an information resource.  By design, the ERC is a
914	   metadata format that balances the needs for expressive power, very
915	   simple machine processing, and direct human manipulation.

917	   A founding principle of the ERC is that direct human contact with
918	   metadata will be a necessary and sufficient condition for the near
919	   term rapid development of metadata standards, systems, and services.
920	   Thus the machine-processable ERC format must only minimally strain
921	   people's ability to read, understand, change, and transmit ERCs
922	   without their relying on intermediation with specialized software
923	   tools.  The basic ERC needs to be succinct, transparent, and
924	   trivially parseable by software.

926	   In the current Internet, it is natural seriously to consider using
927	   XML as an exchange format because of predictions that it will obviate
928	   many ad hoc formats and programs, and unify much of the world's
929	   information under one reliable data structuring discipline that is
930	   easy to generate, verify, parse, and render.  It appears, however,
931	   that XML is still only catching on after years of standards work and
932	   implementation experience.  The reasons for it are unclear, but for
933	   now very simple XML interpretation is still out of reach.  Another
934	   important caution is that XML structures are hard on the eyeballs,
935	   taking up an amount of display (and page) space that significantly
936	   exceeds that of traditional formats.  Until these conflicts with ERC
937	   principle are resolved, XML is not a first choice for representing
938	   ERCs.  Borrowing instead from the data structuring format that
939	   underlies the successful spread of email and web services, the first
940	   ERC format is based on email and HTTP headers (RFC822) [EMHDRS].
941	   There is a naturalness to its label-colon-value format (seen in the
942	   previous section) that barely needs explanation to a person beginning
943	   to enter ERC metadata.

945	   Besides simplicity of ERC system implementation and data entry
946	   mechanics, ERC semantics (what the record and its constituent parts
947	   mean) must also be easy to explain.  ERC semantics are based on a
948	   reformulation and extension of the Dublin Core [DCORE] hypothesis,
949	   which suggests that the fifteen Dublin Core metadata elements have a
950	   key role to play in cross-domain resource description.  The ERC
951	   design recognizes that the Dublin Core's primary contribution is the
952	   international, interdisciplinary consensus that identified fifteen
953	   semantic buckets (element categories), regardless of how they are
954	   labeled.  The ERC then adds a definition for a record and some
955	   minimal compliance rules.  In pursuing the limits of simplicity, the
956	   ERC design combines and relabels some Dublin Core buckets to isolate
957	   a tiny kernel (subset) of four elements for basic cross-domain
958	   resource description.

960	   For the cross-domain kernel, the ERC uses the four basic elements --
961	   who, what, when, and where -- to pretend that every object in the
962	   universe can have a uniform minimal description.  Each has a name or
963	   other identifier, a location, some responsible person or party, and a
964	   date.  It doesn't matter what type of object it is, or whether one
965	   plans to read it, interact with it, smoke it, wear it, or navigate
966	   it.  Of course, this approach is flawed because uniformity of
967	   description for some object types requires more semantic contortion
968	   and sacrifice than for others.  That is why at the beginning of this
969	   document, the ARK was said to be suited to objects that accommodate
970	   reasonably regular electronic description.

972	   While insisting on uniformity at the most basic level provides
973	   powerful cross-domain leverage, the semantic sacrifice is great for
974	   many applications.  So the ERC also permits a semantically rich and
975	   nuanced description to co-exist in a record along with a basic
976	   description.  In that way both sophisticated and naive recipients of
977	   the record can extract the level of meaning from it that best suits
978	   their needs and abilities.  Key to unlocking the richer description
979	   is a controlled vocabulary of ERC record types (not explained in this
980	   document) that permit knowledgeable recipients to apply defined sets
981	   of additional assumptions to the record.

983	7.1.  ERC Syntax

985	   An ERC record is a sequence of metadata elements ending in a blank
986	   line.  An element consists of a label, a colon, and an optional
987	   value.  Here is an example of a record with five elements.

989	        erc:
990	        who: Gibbon, Edward
991	        what: The Decline and Fall of the Roman Empire
992	        when: 1781
993	        where: http://www.ccel.org/g/gibbon/decline/

995	   A long value may be folded (continued) onto the next line by
996	   inserting a newline and indenting the next line.  A value can be thus
997	   folded across multiple lines.  Here are two example elements, each
998	   folded across four lines.

1000	        who/created: University of California, San Francisco, AIDS
1001	             Program at San Francisco General Hospital | University
1002	             of California, San Francisco, Center for AIDS Prevention
1003	             Studies
1004	        what/Topic:
1005	              Heart Attack | Heart Failure
1006	             | Heart
1007	                              Diseases

1009	   An element value folded across several lines is treated as if the
1010	   lines were joined together on one long line.  For example, the second
1011	   element from the previous example is considered equivalent to

1013	        what/Topic: Heart Attack | Heart Failure | Heart Diseases

1015	   An element value may contain multiple values, each one separated from
1016	   the next by a `|' (pipe) character.  The element from the previous
1017	   example contains three values.

1019	   For annotation purposes, any line beginning with a `#' (hash)
1020	   character is treated as if it were not present; this is a "comment"
1021	   line (a feature not available in email or HTTP headers).  For
1022	   example, the following element is spread across four lines and
1023	   contains two values:

1025	        what/Topic:
1026	             Heart Attack
1027	        #    | Heart Failure  -- hold off until next review cycle
1028	             | Heart Diseases

1030	7.2.  ERC Stories

1032	   An ERC record is organized into one or more distinct segments, where
1033	   where each segment tells a story about a different aspect of the
1034	   information resource.  A segment boundary occurs whenever a segment
1035	   label (an element beginning with "erc") is encountered.  The basic
1036	   label "erc:" introduces the story of an object's expression (e.g.,
1037	   its publication, installation, or performance).  The label "erc-
1038	   about:"  introduces the story of an object's content (what it is
1039	   about) and "erc-support:" introduces the story of a support
1040	   commitment made to it.  A story segment that concerns the ERC itself
1041	   is introduced by the label "erc-from:".  It is an important segment
1042	   that tells the story of the ERC's provenance.  Elements beginning
1043	   with "erc" are reserved for segment labels and their associated story
1044	   types.  From an earlier example, here is an ERC with two segments.

1046	       erc:
1047	       who:    Lederberg, Joshua
1048	       what:   Studies of Human Families for Genetic Linkage
1049	       when:   1974
1050	       where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1051	       erc-support:
1052	       who:    NIH/NLM/LHNCBC
1053	       what:   Permanent, Unchanging Content
1054	       # Note to ops staff:  date needs verification.
1055	       when:   2001 04 21
1056	       where:  http://ark.nlm.nih.gov/yy22948

1058	   Segment stories are told according to journalistic tradition.  While
1059	   any number of pertinent elements may appear in a segment, priority is
1060	   placed on answering the questions who, what, when, and where at the
1061	   beginning of each segment so that readers can make the most important
1062	   selection or rejection decisions as soon as possible.  To make things
1063	   simple, the listed ordering of the questions is maintained in each
1064	   segment (as it happens most people who have been exposed to this
1065	   story telling technique are already familiar with the above
1066	   ordering).

1068	   The four questions are answered by using corresponding element
1069	   labels.  The four element labels can be re-used in each story
1070	   segment, but their meaning changes depending on the segment (the
1071	   story type) in which they appear.  In the example above, "who" is
1072	   first used to name a document's author and subsequently used to name
1073	   the permanence guarantor (provider).  Similarly, "when" first lists
1074	   the date of object creation and in the next segment lists the date of
1075	   a commitment decision.  Four labels appearing across three segments
1076	   effectively map to twelve semantically distinct elements.  Distinct
1077	   element meanings are mapped to Dublin Core elements in a later
1078	   section.

1080	7.3.  The ERC Anchoring Story

1082	   Each ERC contains an anchoring story.  It is usually found in the
1083	   first segment labeled "erc:" and it concerns an "anchoring"
1084	   expression of the object.  An "anchoring" expression is the one that
1085	   a provider deemed the most suitable basic referent given the audience
1086	   and application for which it produced the ERC.  If it sounds like the
1087	   provider has great latitude in choosing its anchoring expression, it
1088	   is because it does.  A typical anchoring expression in an ERC for a
1089	   born-digital document would be described in the story of the
1090	   document's release on a web site.

1092	   An anchoring story need not be the central descriptive goal of an ERC
1093	   record.  For example, a museum provider may create an ERC for a
1094	   digitized photograph of a painting but choose to anchor it in the
1095	   story of the original painting instead of the story of the electronic
1096	   likeness; although the ERC may through other segments prove to be
1097	   centrally concerned with describing the electronic likeness, the
1098	   provider may have chosen this particular anchoring story in order to
1099	   make the ERC visible in a way that is most natural to patrons (who
1100	   would find the Mona Lisa under da Vinci sooner than they would find
1101	   it under the name of the person who snapped the photograph or scanned
1102	   the image).  In another example, a provider that creates an ERC for a
1103	   dramatic play as an abstract work has the task of describing a piece
1104	   of intangible intellectual property.  To anchor this abstract object
1105	   in the concrete world, if only through a derivative expression, it
1106	   makes sense for the provider to choose one printed edition of the
1107	   play as the anchoring object expression (to describe in the anchoring
1108	   story) of the ERC.

1110	   The anchoring story has special rules designed to keep ERC processing
1111	   simple and predictable.  Each of the four basic elements (who, what,
1112	   when, and where) must be present, unless a best effort to supply it
1113	   fails.  In the event of failure, the element still appears but a
1114	   special value (described later) is used to explain the missing value.
1115	   While the requirement that each of the four elements be present only
1116	   applies to the anchoring story segment, as usual these elements
1117	   appear at the beginning of the segment and may only be used in the
1118	   prescribed order.  A minimal ERC would normally consist of just an
1119	   anchoring story and the element quartet, as illustrated in the next
1120	   example.

1122	       erc:
1123	       who:   National Research Council
1124	       what:  The Digital Dilemma
1125	       when:  2000
1126	       where: http://books.nap.edu/html/digital%5Fdilemma

1128	   A minimal ERC can be abbreviated so that it resembles a traditional
1129	   compact bibliographic citation that is nonetheless completely machine
1130	   processable.  The required elements and ordering makes it possible to
1131	   eliminate the element labels, as shown here.

1133	       erc: National Research Council | The Digital Dilemma | 2000
1134	              | http://books.nap.edu/html/digital%5Fdilemma

1136	   Although machine readable, this abbreviated ERC format is neither
1137	   required nor permitted at this time in HKMP responses.

1139	7.4.  ERC Elements

1141	   As mentioned, the four basic ERC elements (who, what, when, and
1142	   where) take on different specific meanings depending on the story
1143	   segment in which they are used.  By appearing in each segment, albeit
1144	   in different guises, the four elements serve as a valuable mnemonic
1145	   device -- a kind of checklist -- for constructing minimal story
1146	   segments from scratch.  Again, it is only in the anchoring segment
1147	   that all four elements are mandatory.

1149	   An ERC segment allows an unlimited number of elements with the same
1150	   label.  Multiple such elements are treated as if they were combined
1151	   into one element with multiple values.  For example, the three
1152	   elements,

1154	       who: Bullock, T.H.
1155	       who: Achimowicz, J.Z. | Duckrow, R.B. | Spencer, S.S.
1156	       who: Iragui-Madoz, V.J.

1158	   are treated as if all the values were in one element with five-
1159	   values:

1161	       who: Bullock, T.H.  | Achimowicz, J.Z. | Duckrow, R.B.
1162	               | Spencer, S.S.  | Iragui-Madoz, V.J.

1164	   Required ordering is preserved as long as all who's precede all
1165	   what's, all what's precede all when's, and all when's precede all
1166	   where's.

1168	   Here are some mappings between ERC elements and Dublin Core [DCORE]
1169	   elements.

1171	        Segment     ERC Element     Equivalent Dublin Core Element
1172	       ---------    -----------     ------------------------------
1173	          erc          who          Creator/Contributor/Publisher
1174	          erc          what                Title
1175	          erc          when                Date
1176	          erc          where               Identifier
1177	       erc-about       who                  <none>
1178	       erc-about       what                Subject
1179	       erc-about       when                Coverage (temporal)
1180	       erc-about       where               Coverage (spatial)

1182	   The basic element labels may also be qualified to add nuances to the
1183	   semantic categories that they identify.  Elements are qualified by
1184	   appending a `/' (slash) and a qualifier term.  Often qualifier terms
1185	   appear as the past tense form of a verb because it makes re-using
1186	   qualifiers among elements easier.  Such is the case for three out of
1187	   the four qualifiers appearing in the seven elements below.

1189	       who/created:    ...
1190	       who/published:  ...
1191	       who/modified:   ...
1192	       when/valid:     ...
1193	       when/modified:  ...
1194	       when/published: ...
1195	       when/created:   ...

1197	   Using past tense verbs for qualifiers also reminds providers and
1198	   recipients that element values contain transient assertions that may
1199	   have been true once, but that tend to become less true over time.
1200	   Recipients that don't understand the meaning of a qualifier can fall
1201	   back onto the semantic category (bucket) designated by the
1202	   unqualified element label.  Inevitably recipients (people and
1203	   software) will have diverse abilities in understanding elements and
1204	   qualifiers.

1206	   Any number of other elements and qualifiers may be used in
1207	   conjunction with the quartet of basic segment questions.  The only
1208	   semantic requirement is that they pertain to the segment's story.
1209	   Also, it is only the four basic elements that change meaning
1210	   depending on their segment context.  All other elements have meaning
1211	   independent of the segment in which they appear.  If an element label
1212	   stripped of its qualifier is still not recognized by the recipient, a
1213	   second fall back position is to ignore it and rely on the four basic
1214	   elements.

1216	   Elements may be either Canonical, Provisional, or Local.  Canonical
1217	   elements are officially recognized via a registry as part of the
1218	   metadata vernacular.  All elements, qualifiers, and segment labels
1219	   used in this document up until now belong to that vernacular.
1220	   Provisional elements are also officially recognized via the registry,
1221	   but have only been proposed for inclusion in the vernacular.  To be
1222	   promoted to the vernacular, a provisional element passes through a
1223	   vetting process during which its documentation must be in order and
1224	   its community acceptance demonstrated.  Local elements are any
1225	   elements not officially recognized in the registry.  The registry
1226	   [REG] is a work in progress.

1228	   Local elements can be immediately distinguishable from Canonical or
1229	   Provisional elements because all terms that begin with an upper case
1230	   letter are reserved for spontaneous local use.  No term beginning
1231	   with an upper case letter will ever be assigned Canonical or
1232	   Provisional status, so it should be safe to use such terms for local
1233	   purposes.  Any recipient of external ERCs containing such terms will
1234	   understand them to be part of the originating provider's local
1235	   metadata dialect.  Here's an example ERC with three segments, one
1236	   local element, and two local qualifiers.  The segment boundaries have
1237	   been emphasized by comment lines (which, as before, are ignored by
1238	   processors).

1240	       erc:
1241	       who: Bullock, TH | Achimowicz, JZ | Duckrow, RB
1242	               | Spencer, SS | Iragui-Madoz, VJ
1243	       what: Bicoherence of intracranial EEG in sleep,
1244	               wakefulness and seizures
1245	       when: 1997 12 00
1246	       where: http://cogprints.soton.ac.uk/%{
1247	               documents/disk0/00/00/01/22/index.html %}
1248	       in: EEG Clin Neurophysiol | v103, i6, p661-678 | 1997 12 00
1249	       IDcode: cog00000122
1250	       # ---- new segment ----
1251	       erc-about:
1252	       what/Subcategory: Bispectrum | Nonlinearity | Epilepsy
1253	               | Cooperativity | Subdural | Hippocampus | Higher moment
1254	       # ---- new segment ----
1255	       erc-from:
1256	       who: NIH/NLM/NCBI
1257	       what: pm9546494
1258	       when/Reviewed: 1998 04 18 021600
1259	       where: http://ark.nlm.nih.gov/12025/pm9546494?

1261	   The local element "IDcode" immediately precedes the "erc-about"
1262	   segment, which itself contains an element with the local qualifier
1263	   "Subcategory".  The second to last element also carries the local
1264	   qualifier "Reviewed".  Finally, what might be a provisional element
1265	   "in" appears near the end of the first segment.  It might have been
1266	   proposed as a way to complete a citation for an object originally
1267	   appearing inside another object (such as an article appearing in a
1268	   journal or an encyclopedia).

1270	7.5.  ERC Element Values

1272	   ERC element values tend to be straightforward strings.  If the
1273	   provider intends something special for an element, it will so
1274	   indicate with markers at the beginning of its value string.  The
1275	   markers are designed to be uncommon enough that they would not likely
1276	   occur in normal data except by deliberate intent.  Markers can only
1277	   occur near the beginning of a string, and once any octet of non-
1278	   marker data has been encountered, no further marker processing is
1279	   done for the element value.  In the absence of markers the string is
1280	   considered pure data; this has been the case with all the examples
1281	   seen thus far.  The fullest form of an element value with all three
1282	   optional markers in place looks like this.

1284	       VALUE =    [markup_flags]    (:ccode)    ,    DATA

1286	   In processing, the first non-whitespace character of an ERC element
1287	   value is examined.  An initial `[' is reserved to introduce a
1288	   bracketed set of markup flags (not described in this document) that
1289	   ends with `]'.  If ERC data is machine-generated, each value string
1290	   may be preceded by "[]" to prevent any of its data from being
1291	   mistaken for markup flags.  Once past the optional markup, the
1292	   remaining value may optionally begin with a controlled code.  A
1293	   controlled code always has the form "(:ccode)", for example,

1295	       who: (:unkn) Anonymous
1296	       what: (:791) Bee Stings

1298	   Any string after such a code is taken to be an uncontrolled (e.g.,
1299	   natural language) equivalent.  The code "unkn" indicates a
1300	   conventional explanation for a missing value (stating that the value
1301	   is unknown).  The remainder of the string makes an equivalent
1302	   statement in a form that the provider deemed most suitable to its
1303	   (probably human) audience.  The code "791" could be a fixed numeric
1304	   topic identifier within an unspecified topic vocabulary.  Any code
1305	   may be ignored by those that do not understand it.

1307	   There are several codes to explain different ways in which a required
1308	   element's value may go missing.

1310	       (:unkn)   unknown (e.g., Anonymous)
1311	       (:unav)   value unavailable indefinitely
1312	       (:none)   never had a value, never will
1313	       (:unac)   temporarily inaccessible
1314	       (:unap)   not applicable (makes no sense)
1315	       (:null)   explicitly empty
1316	       (:igno)   element here only to satisfy syntax
1317	       (:unas)   value unassigned (e.g., untitled painting)
1318	       (:elwr)   value appears elsewhere in record
1319	       (:remo)   withheld, suppressed intentionally

1321	   Once past an optional controlled code, the remaining string value is
1322	   subjected to one final test.  If the first next non-whitespace
1323	   character is a `,' (comma), it indicates that the string value is
1324	   "sort-friendly".  This means that the value is (a) laid out with an
1325	   inverted word order useful for sorting items having comparably laid
1326	   out element values (items might be the containing ERC records) and
1327	   (b) that the value may contain other commas that indicate inversion
1328	   points should it become necessary to recover the value in natural
1329	   word order.  Typically, this feature is used to express Western-style
1330	   personal names in family-name-given-name order.  It can also be used
1331	   wherever natural word order might make sorting tricky, such as when
1332	   data contains titles or corporate names.  Here are some example
1333	   elements.

1335	       who:   ,  van Gogh, Vincent
1336	       who:,Howell, III, PhD, 1922-1987, Thurston
1337	       who:, Acme Rocket Factory, Inc., The
1338	       who:, Mao Tse Tung
1339	       who:, McCartney, Paul, Sir,
1340	       what:, Health and Human Services, United States Government
1341	               Department of, The,

1343	   There are rules to use in recovering a copy of the value in natural
1344	   word order, if desired.  The above example strings have the following
1345	   natural word order values, respectively.

1347	       Vincent van Gogh
1348	       Thurston Howell, III, PhD, 1922-1987
1349	       The Acme Rocket Factory, Inc.
1350	       Mao Tse Tung
1351	       Sir Paul McCartney
1352	       The United States Government Department of Health and Human Services

1354	7.6.  ERC Element Encoding and Dates

1356	   Some characters that need to appear in ERC element values might
1357	   conflict with special characters used for structuring ERCs, so there
1358	   needs to be a way to include them as literal characters that are
1359	   protected from special interpretation.  This is accomplished through
1360	   an encoding mechanism that resembles the %-encoding familiar to [URI]
1361	   handlers.

1363	   The ERC encoding mechanism also uses `%', but instead of taking two
1364	   following hexadecimal digits, it takes one non-alphanumeric character
1365	   or two alphabetic characters that cannot be mistaken for hex digits.
1366	   It is designed not to be confused with normal web-style %-encoding.
1367	   In particular it can be decoded without risking unintended decoding
1368	   of normal %-encoded data (which would introduce errors).  Here are
1369	   the one-character (non-alphanumeric) ERC encoding extensions.

1371	       ERC       Purpose
1372	       ---     ------------------------------------------------
1373	       %!      decodes to the element separator `|'
1374	       %%      decodes to a percent sign `%'
1375	       %.      decodes to a comma `,'
1376	       %_      a non-character used as syntax shim
1377	       %{      a non-character that begins a ws-squeezed block
1378	       %}      a non-character that ends a ws-squeezed block

1380	   One particularly useful construct in ERC element values is the pair
1381	   of special encoding markers ("%{" and "%}") that indicates a
1382	   "whitespace-squeezed" block.  Whatever string of characters they
1383	   enclose will be treated as if none of the contained whitespace
1384	   (SPACEs, TABs, Newlines) were present.  This comes in handy for
1385	   writing long, multi-part URLs in a readable way.  For example, the
1386	   value in

1388	       where: http://foo.bar.org/node%{
1389	                  ? db = foo
1390	                  & start = 1
1391	                  & end = 5
1392	                  & buf = 2
1393	                  & query = foo + bar + zaf
1394	              %}

1396	   is decoded into an equivalent element, but with a correct and intact
1397	   URL:

1399	   where:
1400	    http://foo.bar.org/node?db=foo&start=1&end=5&buf=2&query=foo+bar+zaf

1402	   In a parting word about ERC element values, a commonly recurring
1403	   value type is a date, possibly followed by a time.  ERC dates take on
1404	   one of the following forms:

1406	       1999                (four digit year)
1407	       2000 12 29          (year, month, day)
1408	       2000 12 29 235955   (year, month, day, hour, minute, second)

1410	   In dates, all internal whitespace is squeezed out to achieve a
1411	   normalized form suitable for lexical comparison and sorting.  This
1412	   means that the following dates

1414	       2000 12 29 235955           (recommended for readability)
1415	       2000 12 29 23 59 55
1416	       20   001 229 235  9 5 5
1417	       20001229235955              (normalized date and time)

1419	   are all equivalent.  The first form is recommended for readability.
1420	   The last form (shortest and easiest to compute with) is the
1421	   normalized form.  Hyphens and commas are reserved to create date
1422	   ranges and lists, for example,

1424	       1996-2000                   (a range of four years)
1425	       1952, 1957, 1969            (a list of three years)
1426	       1952, 1958-1967, 1985       (a mixed list of dates and ranges)
1427	       20001229-20001231           (a range of three days)

1429	7.7.  ERC Stub Records and Internal Support

1431	   The ERC design introduces the concept of a "stub" record, which is an
1432	   incomplete ERC record intended to be supplemented with additional
1433	   elements before being released as a standalone ERC record.  Stub ERC
1434	   records have no minimum required elements.  They may be useful in
1435	   supporting internal procedures using the ERC syntax.  Often they rely
1436	   on the convenience and accuracy of automatically supplied elements,
1437	   even the basic ones.  To be ready for external use, however, an ERC
1438	   stub must be transformed into a complete ERC record having the usual
1439	   required elements.  An ERC stub record can be convenient for metadata
1440	   embedded in a document, where elements such as location, modification
1441	   date, and size -- which one would not omit from an externalized
1442	   record -- are omitted simply because they are much better supplied by
1443	   a computation.  A separate local administrative procedure, not
1444	   defined for ERC's in general, would effect the promotion of stubs
1445	   into complete records.

1447	   While the ERC is a general-purpose container for exchange of resource
1448	   descriptions, it does not dictate how records must be internally
1449	   stored, laid out, or assembled by data providers or recipients.
1450	   Arbitrary internal descriptive frameworks can support ERCs simply by
1451	   mapping (e.g., on demand) local records to the ERC container format
1452	   and making them available for export.  Therefore, to support ERCs
1453	   there is no need for a data provider to convert internal data to be
1454	   stored in an ERC format.  On the other hand, any provider (such as
1455	   one just getting started in the business of resource description) may
1456	   choose to store and manipulate local data natively in the ERC format.

1458	8.  Advice to Web Clients

1460	   This section offers some advice to web client software developers It
1461	   is hard to write about because it tries to anticipate a series of
1462	   events that might lead to native web browser support for ARKs.

1464	   ARKs are envisaged to appear wherever durable object references are
1465	   planned.  Library cataloging records, literature citations, and
1466	   bibliographies are important examples.  In many of these places URLs
1467	   (Uniform Resource Locators) currently stand in, and URNs, DOIs, and
1468	   PURLs have been proposed as alternatives.

1470	   The strings representing ARKs are also envisaged to appear in some of
1471	   the places where URLs currently appear:  in hypertext links (where
1472	   they are not normally shown to users) and in rendered text (displayed
1473	   or printed).  Internet search engines, for example, tend to include
1474	   both actionable and manifest links when listing each item found.  A
1475	   normal HTML link for which the URL is not displayed looks like this.

1477	        <a href = "http://foo.bar.org/index.htm"> Click Here <a>

1479	   The same link with an ARK instead of a URL:

1481	        <a href = "ark:/14697/b12345x"> Click Here <a>

1483	   Web browsers would in general require a small modification to
1484	   recognize and convert this ARK, via mapping authority discovery, to
1485	   an equivalent URL.

1487	        <a href = "http://a.b.org/14697/b12345x"> Click Here <a>

1489	   A simple expedient that works for now without browser modification is
1490	   to use a specified-host ARK (one with an NMAH) but without the usual
1491	   "ark:" prefix (a prefix of "http://" is normally assumed), as in,

1493	        <a href = "a.b.org/b12345x"> Click Here <a>

1495	   An NAA will typically make known the associations it creates by
1496	   publishing them in catalogs, actively advertizing them, or simply
1497	   leaving them on web sites for visitors (e.g., users, indexing
1498	   spiders) to stumble across in browsing.

1500	9.  Security Considerations

1502	   The ARK naming scheme poses no direct risk to computers and networks.
1503	   Implementors of ARK services need to be aware of security issues when
1504	   querying networks and filesystems for Name Mapping Authority
1505	   services, and the concomitant risks from spoofing and obtaining
1506	   incorrect information.  These risks are no greater for ARK mapping
1507	   authority discovery than for other kinds of service discovery.  For
1508	   example, recipients of ARKs with a specified hostport (NMAH) should
1509	   treat it like a URL and be aware that the identified ARK service may
1510	   no longer be operational.

1512	   Apart from mapping authority discovery, ARK clients and servers
1513	   subject themselves to all the risks that accompany normal operation
1514	   of the protocols (e.g., HTTP, Z39.50) underlying mapping services.
1515	   As specializations of such protocols, an ARK service may limit
1516	   exposure to the usual risks.  Indeed, ARK services may enhance a kind
1517	   of security by helping users identify long-term reliable references
1518	   to information objects.

1520	10.  Authors' Addresses

1522	   John A. Kunze
1523	   Center for Knowledge Management
1524	   University of California, San Francisco
1525	   530 Parnassus Ave, Box 0840
1526	   San Francisco, CA  94143-0840, USA

1528	   Fax:   +1 415-476-4653
1529	   EMail: jak@ckm.ucsf.edu

1531	   R. P. C. Rodgers
1532	   US National Library of Medicine
1533	   8600 Rockville Pike, Bldg. 38A
1534	   Bethesda, MD  20894

1536	   Fax:   +1 301-496-0673
1537	   EMail: rodgers@nlm.nih.gov

1539	11.  References

1541	   [DCORE]    Dublin Core Metadata Initiative, "Dublin Core Metadata
1542	              Element Set, Version 1.1:  Reference Description", July
1543	              1999, http://dublincore.org/documents/dces/.

1545	   [DNS]      P.V. Mockapetris, "Domain Names - Concepts and
1546	              Facilities", RFC 1034, November 1987.

1548	   [DOI]      International DOI Foundation, "The Digital Object
1549	              Identifier (DOI) System", February 2001,
1550	              http://dx.doi.org/10.1000/203.

1552	   [EMHDRS]   D. Crocker, "Standard for the format of ARPA Internet text
1553	              messages", RFC 822, August 1982.

1555	   [ERC]      J. Kunze, "Electronic Resource Citations", work in
1556	              progress.

1558	   [HKMP]     J. Kunze, "HTTP Key Mapping Protocol", work in progress.

1560	   [HTTP]     R. Fielding, et al, "Hypertext Transfer Protocol --
1561	              HTTP/1.1", RFC 2616, June 1999.

1563	   [MD5]      R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
1564	              April 1992.

1566	   [NAPTR]    M. Mealling, Daniel, R., "The Naming Authority Pointer
1567	              (NAPTR) DNS Resource Record", RFC 2915, September 2000.

1569	   [NLMPerm]  M. Byrnes, "Defining NLM's Commitment to the Permanence of
1570	              Electronic Information", ARL 212:8-9, October 2000,
1571	              http://www.arl.org/newsltr/212/nlm.html

1573	   [PURL]     K. Shafer, et al, "Introduction to Persistent Uniform
1574	              Resource Locators", 1996,
1575	              http://purl.oclc.org/OCLC/PURL/INET96

1577	   [REG]      J. Kunze, "Resource Metadata Vocabulary", work in
1578	              progress.

1580	   [URI]      T. Berners-Lee, et al, "Uniform Resource Identifiers
1581	              (URI):  Generic Syntax", RFC 2396, August 1998.

1583	   [URNBIB]   C. Lynch, et al, "Using Existing Bibliographic Identifiers
1584	              as Uniform Resource Names", RFC 2288, February 1998.

1586	   [URNSYN]   R. Moats, "URN Syntax", RFC 2141, May 1997.

1588	   [URNNID]   L. Daigle, et al, "URN Namespace Definition Mechanisms",
1589	              RFC 2611, June 1999.

1591	   [TELNET]   J. Postel, J.K. Reynolds, "Telnet Protocol Specification",
1592	              RFC 854, May 1983.

1594	12.  Appendix:  An NLM Prototype ARK Service

1596	   The US National Library of Medicine (NLM) has an experimental,
1597	   prototype ARK service under development.  It is being made available
1598	   for purposes of demonstrating various aspects of the ARK system, but
1599	   is subject to temporary or permanent withdrawal (without notice)
1600	   depending upon the circumstances of the small research group
1601	   responsible for making it available.  It is described at:

1603	       http://ark.nlm.nih.gov/

1605	   Comments and feedback may be addressed to rodgers@nlm.nih.gov.

1607	13.  Appendix:  Current ARK Name Authority Table

1609	   This appendix contains a copy of the Name Authority Table (a file) at
1610	   the time of writing.  It may be loaded into a local filesystem (e.g.,
1611	   /etc/natab) for use in mapping NAAs (Name Assigning Authorities) to
1612	   NMAHs (Name Mapping Authority Hostports).  It contains Perl code that
1613	   can be copied into a standalone script that processes the table (as a
1614	   file).  Because this is still a proposed file, none of the values in
1615	   it are real.

1617	   #
1618	   # Name Assigning Authority / Name Mapping Authority Lookup Table
1619	   #     Last change:   22 February 2001
1620	   #     Reload from:   http://ark.nlm.nih.gov/etc/natab
1621	   #     Mirrored at:   http://www.ckm.ucsf.edu/people/jak/home/etc/natab
1622	   #                    http://....../etc/natab
1623	   #     To register:   mailto:jak@ckm.ucsf.edu?Subject=naareg
1624	   #     Process with:  Perl script at end of this file (optional)
1625	   #
1626	   # Each NAA appears at the beginning of a line with the NAA Number
1627	   # first, a colon, and an ARK or URL to a statement of naming policy
1628	   # (see http://ark.nlm.nih.gov/naapolicyeg.html for an example).
1629	   # All the NMA hostports that service an NAA are listed, one per
1630	   # line, indented, after the corresponding NAA line.
1631	   #
1632	   #   US Library of Congress
1633	   12025:  http://www.loc.gov/xxx/naapolicy.html
1634	           foobar.zaf.org
1635	           sneezy.dopey.com
1636	   #
1637	   #   US National Library of Medicine
1638	   12026:  http://www.nlm.nih.gov/xxx/naapolicy.html
1639	           lhc.nlm.nih.gov:8080
1640	           foobar.zaf.org
1641	           sneezy.dopey.com
1642	   #
1643	   #   US National Agriculture Library
1644	   12027:  http://www.nal.gov/xxx/naapolicy.html
1645	           foobar.zaf.gov:80
1646	   #
1647	   #--- end of data ---
1648	   # The enclosed Perl script takes an NAA as argument and outputs
1649	   # the NMAs in this file listed under any matching NAA.
1650	   #
1651	   # my $naa = shift;
1652	   # while (<>) {
1653	   #     next if (! /^$naa:/);
1654	   #     while (<>) {
1655	   #         last if (! /^[#\s]./);
1656	   #         print "$1\n" if (/^\s+(\S+)/);
1657	   #     }
1658	   # }
1659	   # end of file

1661	14.  Copyright Notice

1663	   Copyright (C) The Internet Society (2001).  All Rights Reserved.

1665	   This document and translations of it may be copied and furnished to
1666	   others, and derivative works that comment on or otherwise explain it
1667	   or assist in its implementation may be prepared, copied, published
1668	   and distributed, in whole or in part, without restriction of any
1669	   kind, provided that the above copyright notice and this paragraph are
1670	   included on all such copies and derivative works.  However, this
1671	   document itself may not be modified in any way, such as by removing
1672	   the copyright notice or references to the Internet Society or other
1673	   Internet organizations, except as needed for the  purpose of
1674	   developing Internet standards in which case the procedures for
1675	   copyrights defined in the Internet Standards process must be
1676	   followed, or as required to translate it into languages other than
1677	   English.

1679	   The limited permissions granted above are perpetual and will not be
1680	   revoked by the Internet Society or its successors or assigns.

1682	   This document and the information contained herein is provided on an
1683	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1684	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1685	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1686	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1687	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1689	   The IETF invites any interested party to bring to its attention any
1690	   copyrights, patents or patent applications, or other proprietary
1691	   rights which may cover technology that may be required to practice
1692	   this standard.  Please address the information to the IETF Executive
1693	   Director.

1695	Expires 8 September 2001
1696	                           Table of Contents

1698	Status of this Document ...........................................    1
1699	Abstract ..........................................................    1
1700	1.  Introduction ..................................................    3
1701	1.1.  Three Reasons to Use ARKs ...................................    3
1702	1.2.  Organizing Support for ARKs .................................    4
1703	1.3.  A Definition of Identifier ..................................    5
1704	2.  ARK Anatomy ...................................................    6
1705	2.1.  The Name Mapping Authority Hostport (NMAH) ..................    6
1706	2.2.  The Name Assigning Authority Number (NAAN) ..................    7
1707	2.3.  The Name Part ...............................................    8
1708	2.4.  Lexical Equivalence .........................................    8
1709	2.5.  Naming Considerations .......................................    9
1710	3.  Assigners of ARKs .............................................   11
1711	4.  Finding a Name Mapping Authority ..............................   11
1712	4.1.  Looking Up NMAHs in a Globally Accessible File ..............   13
1713	4.2.  Looking up NMAHs Distributed via DNS ........................   15
1714	5.  Generic ARK Service Definition ................................   16
1715	5.1.  Generic ARK Access Service (access, location) ...............   17
1716	5.2.  Generic Policy Service (permanence, naming, etc.)  ..........   17
1717	5.3.  Generic Description Service .................................   18
1718	6.  Overview of the HTTP Key Mapping Protocol (HKMP) ..............   18
1719	7.  Overview of Electronic Resource Citations (ERCs) ..............   21
1720	7.1.  ERC Syntax ..................................................   23
1721	7.2.  ERC Stories .................................................   24
1722	7.3.  The ERC Anchoring Story .....................................   25
1723	7.4.  ERC Elements ................................................   26
1724	7.5.  ERC Element Values ..........................................   29
1725	7.6.  ERC Element Encoding and Dates ..............................   31
1726	7.7.  ERC Stub Records and Internal Support .......................   33
1727	8.  Advice to Web Clients .........................................   33
1728	9.  Security Considerations .......................................   34
1729	10.  Authors' Addresses ...........................................   34
1730	11.  References ...................................................   35
1731	12.  Appendix:  An NLM Prototype ARK Service ......................   36
1732	13.  Appendix:  Current ARK Name Authority Table ..................   36
1733	14.  Copyright Notice .............................................   38