idnits 2.17.1 

draft-kunze-ark-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 2122.

  ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
     2114), which is fine, but *also* found old RFC 2026, Section 10.4C,
     paragraph 1 text on line 36.

  ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
     Acknowledgement -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.

  ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure
     Acknowledgement. 

  ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure
     Acknowledgement. 

  ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure
     Invitation. 


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 45
     longer pages, the longest (page 2) being 63 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 46 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 4 instances of too long lines in the document, the longest one
     being 5 characters in excess of 72.

  ** The abstract seems to contain references ([Qualifier]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  == There are 8 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 771 has weird spacing: '...eful to  remem...'

  == Line 940 has weird spacing: '... regexp  repla...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (19 February 2005) is 6998 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'Qualifier' is mentioned on line 401, but not defined

  == Unused Reference: 'MD5' is defined on line 1944, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ANVL'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ARK'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCORE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DERC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DOI'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ERC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Handle'

  ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')

  ** Obsolete normative reference: RFC 2915 (ref. 'NAPTR') (Obsoleted by RFC
     3401, RFC 3402, RFC 3403, RFC 3404)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NLMPerm'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NOID'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PURL'

  ** Obsolete normative reference: RFC  822 (Obsoleted by RFC 2822)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'TEMPER'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'THUMP'

  ** Obsolete normative reference: RFC 2396 (ref. 'URI') (Obsoleted by RFC
     3986)

  ** Downref: Normative reference to an Informational RFC: RFC 2288 (ref.
     'URNBIB')

  ** Obsolete normative reference: RFC 2141 (ref. 'URNSYN') (Obsoleted by RFC
     8141)

  ** Obsolete normative reference: RFC 2611 (ref. 'URNNID') (Obsoleted by RFC
     3406)


     Summary: 22 errors (**), 0 flaws (~~), 9 warnings (==), 16 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet-Draft: draft-kunze-ark-09.txt                          J. Kunze
3	ARK Identifier Scheme                    University of California (UCOP)
4	Expires 19 August 2005                                  R. P. C. Rodgers
5	                                         US National Library of Medicine
6	                                                        19 February 2005

8	                  The ARK Persistent Identifier Scheme

10	Status of this Document

12	   By submitting this Internet-Draft, each author represents that any
13	   applicable patent or other IPR claims of which he or she is aware
14	   have been disclosed, or will be disclosed, and any of which he or she
15	   become aware will be disclosed, in accordance with RFC 3668.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as ``work in progress.''

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   Distribution of this document is unlimited.  Please send comments to
34	   jak@ucop.edu.

36	   Copyright (C) The Internet Society (2005).  All Rights Reserved.

38	Abstract

40	   The ARK (Archival Resource Key) naming scheme is designed to
41	   facilitate the high-quality and persistent identification of
42	   information objects. A founding principle of the ARK is that
43	   persistence is purely a matter of service and is neither inherent in
44	   an object nor conferred on it by a particular naming syntax. The best
45	   that an identifier can do is to lead users to the services that
46	   support persistence. The term ARK itself refers both to the scheme
47	   and to any single identifier that conforms to it.  An ARK has five
48	   components:

50	              [http://NMAH/]ark:/NAAN/Name[Qualifier]

52	   an optional and mutable Name Mapping Authority Hostport, the "ark:"
53	   label, the Name Assigning Authority Number (NAAN), the assigned Name,
54	   and an optional and possibly mutable Qualifier supported by the NMA.
55	   The NAAN and Name together form the immutable persistent identifier
56	   for the object.  An ARK is a special kind of URL that connects users
57	   to three things: the named object, its metadata, and the provider's
58	   promise about its persistence. When entered into the location field
59	   of a Web browser, the ARK leads the user to the named object. That
60	   same ARK, followed by a single question mark ('?'), returns a brief
61	   metadata record that is both human- and machine-readable. When the
62	   ARK is followed by dual question marks ('??'), the returned metadata
63	   contains a commitment statement from the current provider.  Tools
64	   exist for minting, binding, and resolving ARKs.

66	1.  Introduction

68	   This document describes a scheme for the high-quality naming of
69	   information resources.  The scheme, called the Archival Resource Key
70	   (ARK), is well suited to long-term access and identification of any
71	   information resources that accommodate reasonably regular electronic
72	   description.  This includes digital documents, databases, software,
73	   and websites, as well as physical objects (books, bones, statues,
74	   etc.) and intangible objects (chemicals, diseases, vocabulary terms,
75	   performances).  Hereafter the term "object" refers to an information
76	   resource.  The term ARK itself refers both to the scheme and to any
77	   single identifier that conforms to it.  A reasonably concise and
78	   accessible overview and rationale for the scheme is available at
79	   [ARK].

81	   Schemes for persistent identification of network-accessible objects
82	   are not new.  In the early 1990's, the design of the Uniform Resource
83	   Name [URNSYN] responded to the observed failure rate of URLs by
84	   articulating an indirect, non-hostname-based naming scheme and the
85	   need for responsible name management.  Meanwhile, promoters of the
86	   Digital Object Identifier [DOI] succeeded in building a community of
87	   providers around a mature software system [Handle] that supports name
88	   management.  The Persistent Uniform Resource Locator [PURL] was
89	   another scheme that has the unique advantage of working with
90	   unmodified web browsers.  ARKs represent an approach that attempts to
91	   build on the strengths and to avoid the weaknesses of the other
92	   schemes.

94	   A founding principle of the ARK is that persistence is purely a
95	   matter of service.  Persistence is neither inherent in an object nor
96	   conferred on it by a particular naming syntax.  Nor is the technique
97	   of name indirection - upon which URNs, Handles, DOIs, and PURLs are
98	   founded - of central importance.  Name indirection is an ancient and
99	   well-understood practice; new mechanisms for it keep appearing and
100	   distracting practitioner attention, with the Domain Name System [DNS]
101	   being a particularly dazzling and elegant example.  What is often
102	   forgotten is that maintenance of an indirection table is the
103	   overwhelming and unavoidable cost to the organization providing
104	   persistence, and that cost is equivalent across naming schemes.  That
105	   indirection has always been a native part the web while being so
106	   lightly utilized for the persistence of web-based objects is an
107	   indication of how unsuited most organizations are to the task of
108	   table maintenance and to the overall challenge of digital permanence.

110	   Persistence is achieved through a provider's successful stewardship
111	   of objects and their identifiers.  The highest level of persistence
112	   will be reinforced by a provider's robust contingency, redundancy,
113	   and succession strategies.  It is further safeguarded to the extent
114	   that a provider's mission is shielded from marketplace and political
115	   instabilities.  These are by far the major challenges confronting
116	   persistence providers, and no identifier scheme has any direct impact
117	   on them.

119	   Given the limited ability of any naming scheme to positively
120	   contribute to the considerable undertaking of digital permanence, it
121	   is legitimate to ask whether a given scheme might itself actually
122	   become a liability as the provider carries objects and infrastructure
123	   into the technologically evolving future.  It is in response to this
124	   question that the ARK scheme tries to be simple, transparent, and
125	   free of proprietary components, vendor relationships, and special-
126	   purpose global infrastructure.

128	1.1.  Three Reasons to Use ARKs

130	   The first requirement of an ARK is to give users a link from an
131	   object to a promise of stewardship for it.  That promise is a multi-
132	   faceted covenant that binds the word of an identified service
133	   provider to a specific set of responsibilities.  No one can tell if
134	   successful stewardship will take place because no one can predict the
135	   future.  Reasonable conjecture, however, may be based on past
136	   performance.  There must be a way to tie a promise of persistence to
137	   a provider's demonstrated or perceived ability - its reputation - in
138	   that arena.  Provider reputations would then rise and fall as
139	   promises are observed variously to be kept and broken.  This is
140	   perhaps the best way we have for gauging the strength of any
141	   persistence promise.  Note that over time, current providers have
142	   nothing to do with the intentions of the original assigners of names.

144	   The second requirement of an ARK is to give users a link from an
145	   object to a description of it.  The problem with a naked identifier
146	   is that without a description real identification is incomplete.
147	   Identifiers common today are relatively opaque, though some contain
148	   ad hoc clues that reflect brief life cycle periods such as the
149	   address of a short stay in a filesystem hierarchy.  Possession of
150	   both an identifier and an object is some improvement, but positive
151	   identification may still be uncertain since the object itself might
152	   not include a matching identifier or might not carry evidence obvious
153	   enough to reveal its identity without significant research.  In
154	   either case, what is called for is a record bearing witness to the
155	   identifier's association with the object, as supported by a recorded
156	   set of object characteristics.  This descriptive record is partly an
157	   identification "receipt" with which users and archivists can verify
158	   an object's identity after brief inspection and a plausible match
159	   with recorded characteristics such as title and size.

161	   The final requirement of an ARK is to give users a link to the object
162	   itself (or to a copy) if at all possible.  Persistent access is the
163	   central duty of an ARK.  Persistent identification plays a vital
164	   supporting role but, strictly speaking, it can be construed as no
165	   more than a record attesting to the original assignment of a never-
166	   reassigned identifier.  Object access may not be feasible for various
167	   reasons, such as catastrophic loss of the object, a licensing
168	   agreement that keeps an archive "dark" for a period of years, or when
169	   an object's own lack of tangible existence confuses normal concepts
170	   of access (e.g., a vocabulary term might be accessed through its
171	   definition).  In such cases the ARK's identification role assumes a
172	   much higher profile.  But attempts to simplify the persistence
173	   problem by decoupling access from identification and concentrating
174	   exclusively on the latter are of questionable utility.  A perfect
175	   system for assigning forever unique identifiers might be created, but
176	   if it did so without reducing access failure rates, no one would be
177	   interested.  The central issue - which may be summed up as the "HTTP
178	   404 Not Found" problem - would not have been addressed.

180	1.2.  Organizing Support for ARKs

182	   An organization and the user community it serves can often be seen to
183	   struggle with two different areas of persistent identification: the
184	   Our Stuff problem and the Their Stuff problem.  In the Our Stuff
185	   problem, we in the organization want our own objects to acquire
186	   persistent names.  Since we possess or control these objects, our
187	   organization tackles the Our Stuff problem directly.  Whether or not
188	   the objects are named by ARKs, our organization is the responsible
189	   party, so it can plan for, maintain, and make commitments about the
190	   objects.

192	   In the Their Stuff problem, we in the organization want others'
193	   objects to acquire persistent names.  These are objects that we do
194	   not own or control, but some of which are critically important to us.
195	   But because they are beyond our influence as far as support is
196	   concerned, creating and maintaining persistent identifiers for Their
197	   Stuff is not especially purposeful or feasible for us to do.  There
198	   is little that we can do about someone else's stuff except encourage
199	   them to find or become providers of persistence services.

201	   Co-location of persistent access and identification services is
202	   natural.  Any organization that undertakes ongoing support of true
203	   persistent identification (which includes description) is well-served
204	   if it controls, owns, or otherwise has clear internal access to the
205	   identified objects, and this gives it an advantage if it wishes also
206	   to support persistent access to outsiders.  Conversely, persistent
207	   access to outsiders requires orderly internal collection management
208	   procedures that include monitoring, acquisition, verification, and
209	   change control over objects, which in turn requires object
210	   identifiers persistent enough to support auditable record keeping
211	   practices.

213	   Although, organizing ARK services under one roof thus tends to make
214	   sense, object hosting can successfully be separated from name
215	   mapping.  An example is when a name mapping authority centrally
216	   provides uniform resolution services via a protocol gateway on behalf
217	   of organizations that host objects behind a variety of access
218	   protocols.  It is also reasonable to build value-added description
219	   services that rely on the underlying services of a set of mapping
220	   authorities.

222	   Supporting ARKs is not for every organization.  By requiring
223	   specific, revealed commitments to preservation, to object access, and
224	   to description, the bar for providing ARK services is higher than for
225	   some other identifier schemes.  On the other hand, it would be hard
226	   to grant credence to a persistence promise from an organization that
227	   could not muster the minimum ARK services.  Not that there isn't a
228	   business model for an ARK-like, description-only service built on top
229	   of another organization's full complement of ARK services.  For
230	   example, there might be competition at the description level for
231	   abstracting and indexing a body of scientific literature archived in
232	   a combination of open and fee-based repositories.  The description-
233	   only service would have no direct commitment to the objects, but
234	   would act as intermediary, forwarding commitment statements from
235	   object hosting to requestors.

237	1.3.  Definition of Identifier

239	   An identifier is not a string of character data - an identifier is an
240	   association between a string of data and an object.  This abstraction
241	   is necessary because without it a string is just data.  It's nonsense
242	   to talk about a string's breaking, or about its being strong,
243	   maintained, and authentic.  But as a representative of an
244	   association, a string can do, metaphorically, the things that we
245	   expect of it.

247	   Without regard to whether an object is physical, digital, or
248	   conceptual, to identify it is to claim an association between it and
249	   a representative string, such as "Jane" or "ISBN 0596000278".  What
250	   gives a claim credibility is a set of verifiable assertions, or
251	   metadata, about the object, such as age, height, title, or number of
252	   pages.  In other words, the association is made manifest by a record
253	   (e.g., a cataloging or other metadata record) that vouches for it.

255	   In the complete absence of any testimony (metadata) regarding an
256	   association, a would-be identifier string is a meaningless sequence
257	   of characters.  To keep an externally visible but otherwise internal
258	   string from being perceived as an identifier by outsiders, for
259	   example, it suffices for an organization not to disclose the nature
260	   of its association.  For our immediate purpose, actual existence of
261	   an association record is more important than its authenticity or
262	   verifiability, which are outside the scope of this specification.

264	   It is a gift to the identification process if an object carries its
265	   own name as an inseparable part of itself, such as an identifier
266	   imprinted on the first page of a document or embedded in a data
267	   structure element of a digital document header.  In cases where the
268	   object is large, unwieldy, or unavailable (such as when licensing
269	   restrictions are in effect), a metadata record that includes the
270	   identifier string will usually suffice.  That record becomes a
271	   conveniently manipulable object surrogate, acting as both an
272	   association "receipt" and "declaration".

274	   Note that our definition of identifier extends the one in use for
275	   Uniform Resource Identifiers [URI].  The present document still
276	   sometimes (ab)uses the terms "ARK" and "identifier" as shorthand for
277	   the string part of an identifier, but the context should make the
278	   meaning clear.

280	2.  ARK Anatomy

282	   An ARK is represented by a sequence of characters (a string) that
283	   contains the label, "ark:", optionally preceded by the beginning part
284	   of a URL.  Here is a diagrammed example.

286	         http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff
287	         \___________________/ \__/ \___/ \______/ \____________/
288	           (replaceable)        |     |      |       Qualifier
289	                |         ARK Label   |      |    (NMA-supported)
290	                |                     |      |
291	      Name Mapping Authority          |    Name (NAA-assigned)
292	         Hostport (NMAH)              |
293	                           Name Assigning Authority Number (NAAN)

295	   The ARK syntax can be summarized,

297	                    [http://NMAH/]ark:/NAAN/Name[Qualifier]

299	   where the NMAH and Qualifier parts are in brackets to indicate that
300	   they are optional.

302	2.1.  The Name Mapping Authority Hostport (NMAH)

304	   Before the "ark:" label may appear an optional Name Mapping Authority
305	   Hostport (NMAH) that is a temporary address where ARK service
306	   requests may be sent.  It consists of "http://" (or any service
307	   specification valid for a URL) followed by an Internet hostname or
308	   hostport combination having the same format and semantics as the
309	   hostport part of a URL.  The most important thing about the NMAH is
310	   that it is "identity inert" from the point of view of object
311	   identification.  In other words, ARKs that differ only in the
312	   optional NMAH part identify the same object.  Thus, for example, the
313	   following three ARKs are synonyms for just one information object:

315	                      http://loc.gov/ark:/12025/654xz321
316	                  http://rutgers.edu/ark:/12025/654xz321
317	                                     ark:/12025/654xz321

319	   Strictly speaking, in the realm of digital objects, these ARKs may
320	   lead over time to somewhat different or diverging instances of the
321	   originally named object.  In an ideal world, divergence of persistent
322	   objects is not desirable, but it is widely believed that digital
323	   preservation efforts will inevitably lead to alterations in some
324	   original objects (e.g, a format migration in order to preserve the
325	   ability to display a document).  If any of those objects are held
326	   redundantly in more than one organization (a common preservation
327	   strategy), chances are small that all holding organizations will
328	   perform the same precise transformations and all maintain the same
329	   object metadata.  More significant divergence would be expected when
330	   the holding organizations serve different audiences or compete with
331	   each other.

333	   The NMAH part makes an ARK into an actionable URL.  As with many
334	   internet parameters, it is helpful to approach the NMAH being liberal
335	   in what you accept and conservative in what you propose.  From the
336	   recipient's point of view, the NMAH part should be treated as
337	   temporary, disposable, and replaceable.  From the NMA's point of
338	   view, it should be chosen with the greatest concern for longevity.  A
339	   carefully chosen NMAH should be at least as permanent as the
340	   providing organization's own hostname.  In the case of a national or
341	   university library, for example, there is no reason why the NMAH
342	   should not be considerably more permanent than soft-funded proxy
343	   hostnames such as hdl.handle.net, dx.doi.org, and purl.org.  In
344	   general and over time, however, it is not unexpected for an NMAH
345	   eventually to stop working and require replacement with the NMAH of a
346	   currently active service provider.

348	   This replacement relies on a mapping authority "resolver" discovery
349	   process, of which two alternate methods are outlined in a later
350	   section.  The ARK, URN, Handle, and DOI schemes all use a resolver
351	   discovery model that sooner or later requires matching the original
352	   assigning authority with a current provider servicing that
353	   authority's named objects; once found, the resolver at that provider
354	   performs what amounts to a redirect to a place where the object is
355	   currently held.  All the schemes rely on the ongoing functionality of
356	   currently mainstream technologies such as the Domain Name System
357	   [DNS] and web browsers.  The Handle and DOI schemes in addition
358	   require that the Handle protocol layer and global server grid be
359	   available at all times.

361	   The practice of prepending "http://" and an NMAH to an ARK is a way
362	   of creating an actionable identifier by a method that is itself
363	   temporary.  Assuming that infrastructure supporting [HTTP]
364	   information retrieval will no longer be available one day, ARKs will
365	   then have to be converted into new kinds of actionable identifiers.

367	   By that time, if ARKs see widespread use, web browsers would
368	   presumably evolve to perform this (currently simple) transformation
369	   automatically.

371	2.2.  The ARK Label Part - ark:

373	   The label part distinguishes an ARK from an ordinary identifier.  In
374	   a URL found in the wild, the string, "ark:/", indicates that the URL
375	   stands a reasonable chance of being an ARK.  If the context warrants,
376	   verification that it actually is an ARK can be done by testing it for
377	   existence of the three ARK services.

379	   Since nothing about an identifier syntax directly affects
380	   persistence, the "ark:" label (like "urn:", "doi:", and "hdl:")
381	   cannot tell you whether the identifier is persistent or whether the
382	   object is available.  It does tell you that the original Name
383	   Assigning Authority (NAA) had some sort of hopes for it, but it
384	   doesn't tell you whether that NAA is still in existence, or whether a
385	   decade ago it ceased to have any responsibility for providing
386	   persistence, or whether it ever had any responsibility beyond naming.

388	   Only a current provider can say for certain what sort of commitment
389	   it intends, and the ARK label suggests that you can query the NMAH
390	   directly to find out exactly what kind of persistence is promised.
391	   Even if what is promised is impersistence (i.e., a short-term
392	   identifier), saying so is valuable information to the recipient.
393	   Thus an ARK is a high-functioning identifier in the sense that it
394	   provides access to the object, the metadata, and a commitment
395	   statement, even if the commitment is explicitly very weak.

397	2.3.  The Name Assigning Authority Number (NAAN)

399	   Recalling that the general form of the ARK is,

401	                    [http://NMAH/]ark:/NAAN/Name[Qualifier]

403	   the part of the ARK directly following the "ark:" is the Name
404	   Assigning Authority Number (NAAN) enclosed in `/' (slash) characters.
405	   This part is always required, as it identifies the organization that
406	   originally assigned the Name of the object.  It is used to discover a
407	   currently valid NMAH and to provide top-level partitioning of the
408	   space of all ARKs.  NAANs are registered in a manner similar to URN
409	   Namespaces, but they are pure numbers consisting of 5 digits or 9
410	   digits.  Thus, the first 100,000 registered NAAs fit compactly into
411	   the 5 digits, and if growth warrants, the next billion fit into the 9
412	   digit form.  In either case the fixed odd numbers of digits helps
413	   reduce the chances of finding a NAAN out of context and confusing it
414	   with nearby quantities such as 4-digit dates.

416	2.4.  The Name Part

418	   The part of the ARK just after the NAAN is the Name assigned by the
419	   NAA, and it is also required.  Semantic opaqueness in the Name part
420	   is strongly encouraged in order to reduce an ARK's vulnerability to
421	   era- and language-specific change.  Identifier strings containing
422	   linguistic fragments can create support difficulties down the road.
423	   No matter how appropriate or even meaningless they are today, such
424	   fragments may one day create confusion, give offense, or infringe on
425	   a trademark as the semantic environment around us and our communities
426	   evolves.

428	   Names that look more or less like numbers avoid common problems that
429	   defeat persistence and international acceptance.  The use of digits
430	   is highly recommended.  Mixing in non-vowel alphabetic characters a
431	   couple at a time is a relatively safe and easy way to achieve a
432	   denser namespace (more possible names for a given length of the name
433	   string).  Such names have a chance of aging and traveling well.
434	   Tools exists that mint, bind, and resolve opaque identifiers, with or
435	   without check characters [NOID].  More on naming considerations is
436	   given in a subsequent section.

438	2.5.  The Qualifier Part

440	   The part of the ARK following the NAA-assigned Name is an optional
441	   Qualifier.  It is a string that extends the base ARK in order to
442	   create a kind of service entry point into the object named by the
443	   NAA.  At the discretion of the providing NMA, such a service entry
444	   point permits an ARK to support access to individual hierarchical
445	   components and subcomponents of an object, and to variants (versions,
446	   languages, formats) of components.  A Qualifier may be invented by
447	   the NAA or by any NMA servicing the object.

449	   In form, the Qualifier is a ComponentPath, or a VariantPath, or a
450	   ComponentPath followed by a VariantPath.  A VariantPath is introduced
451	   and subdivided by the reserved character `.', and a ComponentPath is
452	   introduced and subdivided by the reserved character `/'.  In this
453	   example,

455	         http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff

457	   the string "/s3/f8" is a ComponentPath and the string ".05v.tiff" is
458	   a VariantPath.  The ARK Qualifier is a formalization of some current
459	   mainstream URL syntax conventions, but in ARKs the formalization
460	   specifically reserves meanings that permit recipients to make strong
461	   inferences about logical subobject containment and equivalence solely
462	   from the form of the received identifiers and without having to
463	   inspect metadata records in order to discover such relationships.
464	   NMAs are free not to disclose any of these relationships merely by
465	   avoiding the reserved characters above.  Hierarchical components and
466	   variants are discussed further in the next two sections.

468	   The Qualifier, if present, differs from the Name in several important
469	   respects.  First, a Qualifier may have been assigned either by the
470	   NAA or later by the NMA.  The assignment of a Qualifier by an NMA
471	   effectively amounts to an act of publishing a service entry point
472	   within the conceptual object originally named by the NAA.  For our
473	   purposes, an ARK extended with a Qualifier assigned by an NMA will be
474	   called an NMA-qualified ARK.

476	   Second, a Qualifier assignment on the part of an NMA is made in
477	   fulfillment of its service obligations and may reflect changing
478	   service expectations and technology requirements.  NMA-qualified ARKs
479	   could therefore be transient, even if the base, unqualified ARK is
480	   persistent.  For example, it would be reasonable for an NMA to
481	   support access to an image object through an actionable ARK that is
482	   considered persistent even if the experience of that access changes
483	   as linking, labeling, and presentation conventions evolve and as
484	   format and security standards are updated.  For an image "thumbnail",
485	   that NMA could also support an NMA-qualified ARK that is considered
486	   impersistent because the thumbnail will be replaced with higher
487	   resolution images as network bandwidth and CPU speeds increase.  At
488	   the same time, for an originally scanned, high-resolution master, the
489	   NMA could publish an NMA-qualfied ARK that is itself considered
490	   persistent.  Of course, the NMA must be able to return its separate
491	   commitments to unqualified, NAA-assigned ARKs, to NMA-qualified ARKs,
492	   and to any NAA-qualified ARKs that it supports.

494	   A third difference between a Qualifier and a Name concerns the
495	   semantic opaqueness constraint.  When an NMA-qualified ARK is to be
496	   used as a transient service entry point into a persistent object, the
497	   priority given to semantic opaqueness observed by the NAA in the Name
498	   part may be relaxed by the NMA in the Qualifier part.  If service
499	   priorities in the Qualifier take precedence over persistence, short-
500	   term usability considerations may recommend somewhat semantically
501	   laden Qualifier strings.

503	   Finally, not only is the set of Qualifiers supported by an NMA
504	   mutable, but different NMAs may support different Qualifier sets for
505	   the same NAA-identified object.  In this regard the NMAs act
506	   independently of each other and of the NAA.

508	   The next two sections describe how ARK syntax may be used to declare,
509	   or to avoid declaring, certain kinds of relatedness among qualified
510	   ARKs.

512	2.5.1.  ARKs that Reveal Object Hierarchy

514	   An NAA or NMA may choose to reveal the presence of a hierarchical
515	   relationship between objects using the `/' (slash) character after
516	   the Name part of an ARK.  Some authorities will choose not to
517	   disclose this information, while others will go ahead and disclose so
518	   that manipulators of large sets of ARKs can infer object
519	   relationships by simple identifier inspection; for example, this
520	   makes it possible for a system to present a collapsed view of a large
521	   search result set.

523	   If the ARK contains an internal slash after the NAAN, the piece to
524	   its left indicates a containing object.  For example, publishing an
525	   ARK of the form,

527	                         ark:/12025/654/xz/321

529	   is equivalent to publishing three ARKs,

531	                         ark:/12025/654/xz/321
532	                         ark:/12025/654/xz
533	                         ark:/12025/654

535	   together with a declaration that the first object is contained in the
536	   second object, and that the second object is contained in the third.

538	   Revealing the presence of hierarchy is completely up to the assigning
539	   authority.  It is hard enough to commit to one object's name, let
540	   alone to three objects' names and to a specific, ongoing relatedness
541	   among them.  Thus, regardless of whether hierarchy was present
542	   initially, the assigning authority, by not using slashes, reveals no
543	   shared inferences about hierarchical or other inter-relatedness in
544	   the following ARKs:

546	                         ark:/12025/654_xz_321
547	                         ark:/12025/654_xz
548	                         ark:/12025/654xz321
549	                         ark:/12025/654xz
550	                         ark:/12025/654

552	   Note that slashes around the ARK's NAAN (/12025/ in these examples)
553	   are not part of the ARK's Name and therefore do not indicate the
554	   existence of some sort of NAAN super object containing all objects in
555	   its namespace.  A slash must have at least one non-structural
556	   character (one that is neither a slash nor a period) on both sides in
557	   order for it to separate recognizable structural components.  So
558	   initial or final slashes may be removed, and double slashes may be
559	   converted into single slashes.

561	2.5.2.  ARKs that Reveal Object Variants

563	   An NAA or NMA may choose to reveal the possible presence of variant
564	   objects or object components using the `.' (period) character after
565	   the Name part of an ARK.  Some authorities will choose not to
566	   disclose this information, while others will go ahead and disclose so
567	   that manipulators of large sets of ARKs can infer object
568	   relationships by simple identifier inspection; for example, this
569	   makes it possible for a system to present a collapsed view of a large
570	   search result set.

572	   If the ARK contains an internal period after Name, the piece to its
573	   left is a base name and the piece to its right, and up to the end of
574	   the ARK or to the next period is a suffix.  A Name may have more than
575	   one suffix, for example,

577	                         ark:/12025/654.24
578	                         ark:/12025/xz4/654.24
579	                         ark:/12025/654.20v.78g.f55

581	   There are two main rules.  First, if two ARKs share the same base
582	   name but have different suffixes, the corresponding objects were
583	   considered variants of each other (different formats, languages,
584	   versions, etc.) by the assigning authority.  Thus, the following ARKs
585	   are variants of each other:

587	                         ark:/12025/654.20v.78g.f55
588	                         ark:/12025/654.321xz
589	                         ark:/12025/654.44

591	   Second, publishing an ARK with a suffix implies the existence of at
592	   least one variant identified by the ARK without its suffix.  The ARK
593	   otherwise permits no further assumptions about what variants might
594	   exist.  So publishing the ARK,

596	                         ark:/12025/654.20v.78g.f55

598	   is equivalent to publishing the four ARKs,

600	                         ark:/12025/654.20v.78g.f55
601	                         ark:/12025/654.20v.78g
602	                         ark:/12025/654.20v
603	                         ark:/12025/654

605	   Revealing the possibility of variants is completely up to the
606	   assigning authority.  It is hard enough to commit to one object's
607	   name, let alone to multiple variants' names and to a specific,
608	   ongoing relatedness among them.  The assigning authority is the sole
609	   arbiter of what constitutes a variant within its namespace, and
610	   whether to reveal that kind of relatedness by using periods within
611	   its names.

613	   A period must have at least one non-structural character (one that is
614	   neither a slash nor a period) on both sides in order for it to
615	   separate recognizable structural components.  So initial or final
616	   periods may be removed, and double periods may be converted into
617	   single periods.  Multiple suffixes should be arranged in sorted order
618	   (pure ASCII collating sequence) at the end of an ARK.

620	2.6.  Character Repertoires

622	   The Name and Qualifier parts are strings of visible ASCII characters
623	   and should be less than 128 bytes in length.  The length restriction
624	   keeps the ARK short enough to append ordinary ARK request strings
625	   without running into transport restrictions (e.g., within HTTP GET
626	   requests).  Characters may be letters, digits, or any of these six
627	   characters:

629	         =   #   *   +   @   _   $

631	   The following characters may also be used, but their meanings are
632	   reserved:

634	         %   -   .   /

636	   The characters `/' and `.' are ignored if either appears as the last
637	   character of an ARK.  If used internally, they allow a name assigning
638	   authority to reveal object hierarchy and object variants as
639	   previously described.

641	   Hyphens are considered to be insignificant and are always ignored in
642	   ARKs.  A `-' (hyphen) may appear in an ARK for readability, or it may
643	   have crept in during the formatting and wrapping of text, but it must
644	   be ignored in lexical comparisons.  As in a telephone number, hyphens
645	   have no meaning in an ARK.  It is always safe for an NMA that
646	   receives an ARK to remove any hyphens found in it.  As a result, like
647	   the NMAH, hyphens are "identity inert" in comparing ARKs for
648	   equivalence.  For example, the following ARKs are equivalent for
649	   purposes of comparison and ARK service access:

651	                                 ark:/12025/65-4-xz-321
652	                 ark:sneezy.dopey.com/12025/654--xz32-1
653	                                 ark:/12025/654xz321

655	   The `%' character is reserved for %-encoding all other octets that
656	   would appear in the ARK string, in the same manner as for URIs [URI].
657	   A %-encoded octet consists of a `%' followed by two hex digits; for
658	   example, "%7d" stands in for `}'.  Lower case hex digits are
659	   preferred to reduce the chances of false acronym recognition; thus it
660	   is better to use "%acT" instead of "%ACT".  The character `%' itself
661	   must be represented using "%25".  As with URNs, %-encoding permits
662	   ARKs to support legacy namespaces (e.g., ISBN, ISSN, SICI) that have
663	   less restricted character repertoires [URNBIB].

665	2.7.  Normalization and Lexical Equivalence

667	   To determine if two or more ARKs identify the same object, the ARKs
668	   are compared for lexical equivalence after first being normalized.
669	   Since ARK strings may appear in various forms (e.g., having different
670	   NMAHs), normalizing them minimizes the chances that comparing two ARK
671	   strings for equality will fail unless they actually identify
672	   different objects.  In a specified-host ARK (one having an NMAH), the
673	   NMAH never participates in such comparisons.

675	   Normalization of an ARK for the purpose of octet-by-octet equality
676	   comparison with another ARK consists of four steps.  First, any upper
677	   case letters in the "ark:" label and the two characters following a
678	   `%' are converted to lower case.  The case of all other letters in
679	   the ARK string must be preserved.  Second, any NMAH part is removed
680	   (everything from an initial "http://" up to the next slash) and all
681	   hyphens are removed.

683	   Third, structural characters (slash and period) are normalized.
684	   Initial and final occurrences are removed, and two structural
685	   characters in a row (e.g., // or ./) are replaced by the first
686	   character, iterating until each occurrence has at least one non-
687	   structural character on either side.  Finally, if there are any
688	   components with a period on the left and a slash on the right, either
689	   the component and the preceding period must be moved to the end of
690	   the Name part or the ARK must be thrown out as malformed.

692	   The fourth and final step is to arrange the suffixes in ASCII
693	   collating sequence (that is, to sort them) and to remove duplicate
694	   suffixes, if any.  It is also permissible to throw out ARKs for which
695	   the suffixes are not sorted.

697	   The resulting ARK string is now normalized.  Comparisons between
698	   normalized ARKs are case-sensitive, meaning that upper case letters
699	   are considered different from their lower case counterparts.

701	   To keep ARK string variation to a minimum, no reserved ARK characters
702	   should be %-encoded unless it is deliberately to conceal their
703	   reserved meanings.  No non-reserved ARK characters should ever be
704	   %-encoded.  Finally, no %-encoded character should ever appear in an
705	   ARK in its decoded form.

707	2.8.  Naming Considerations

709	   The ARK has different goals from the URI, so it has different
710	   character set requirements.  Because linguistic constructs imperil
711	   persistence, for ARKs non-ASCII character support is unimportant.
712	   ARKs and URIs share goals of transcribability and transportability
713	   within web documents, so characters are required to be visible, non-
714	   conflicting with HTML/XML syntax, and not subject to tampering during
715	   transmission across common transport gateways.  Add the goal of
716	   making an undelimited ARK recognizable in running prose, as in
717	   ark:/12025/=@_22*$, and certain punctuation characters (e.g., comma,
718	   period) end up being excluded from the ARK lest the end of a phrase
719	   or sentence be mistaken for part of the ARK.

721	   A valuable technique for provision of persistent objects is to try to
722	   arrange for the complete identifier to appear on, with, or near its
723	   retrieved object.  An object encountered at a moment in time when its
724	   discovery context has long since disappeared could then easily be
725	   traced back to its metadata, to alternate versions, to updates, etc.
726	   This has seen reasonable success, for example, in book publishing and
727	   software distribution.

729	   If persistence is the goal, a deliberate local strategy for
730	   systematic name assignment is crucial.  Names must be chosen with
731	   great care.  Poorly chosen and managed names will devastate any
732	   persistence strategy, and they do not discriminate based on naming
733	   scheme.  Whether a mistakenly re-assigned identifier is a URN, DOI,
734	   PURL, URL, or ARK, the damage - failed access and confusion - is not
735	   mitigated more in one scheme than in another.  Conversely, in-house
736	   efforts to manage names responsibly will go much further towards
737	   safeguarding persistence than any choice of naming scheme or name
738	   resolution technology.

740	   Hostnames appearing in any identifier meant to be persistent must be
741	   chosen with extra care.  The tendency in hostname selection has
742	   traditionally been to choose a token with recognizable attributes,
743	   such as a corporate brand, but that tendency wreaks havoc with
744	   persistence that is supposed to outlive brands, corporations, subject
745	   classifications, and natural language semantics (e.g., what did the
746	   three letters "gay" mean in 1958, 1978, and 1998?).  Today's
747	   recognized and correct attributes are tomorrow's stale or incorrect
748	   attributes.  In making hostnames (any names, actually) long-term
749	   persistent, it helps to eliminate recognizable attributes to the
750	   extent possible.  This affects selection of any name based on URLs,
751	   including PURLs and the explicitly disposable NMAHs.  There is no
752	   excuse for a provider that manages its internal names impeccably not
753	   to exercise the same care in choosing what could be an exceptionally
754	   durable hostname, especially if it would form the prefix for all the
755	   provider's URL-based external names.  Registering an opaque hostname
756	   in the ".org" or ".net" domain would not be a bad start.

758	   Dubious persistence speculation does not make selecting naming
759	   strategies any easier.  For example, despite rumors to the contrary,
760	   there are really no obvious reasons why the organizations registering
761	   DNS names, URN Namespaces, and DOI publisher IDs should have among
762	   them one that is intrinsically more fallible than the next.
763	   Moreover, it is a misconception that the demise of DNS and of HTTP
764	   need adversely affect the persistence of URLs.  At such a time,
765	   certainly URLs from the present day might not then be actionable by
766	   our present-day mechanisms, but resolution systems for future non-
767	   actionable URLs are no harder to imagine than resolution systems for
768	   present-day non-actionable URNs and DOIs.  There is no more stable a
769	   namespace than one that is dead and frozen, and that would then
770	   characterize the space of names bearing the "http://" prefix.  It is
771	   useful to  remember that just because hostnames have been carelessly
772	   chosen in their brief history does not mean that they are unsuitable
773	   in NMAHs (and URLs) intended for use in situations demanding the
774	   highest level of persistence available in the Internet environment.
775	   A well-planned name assignment strategy is everything.

777	3.  Assigners of ARKs

779	   A Name Assigning Authority (NAA) is an organization that creates (or
780	   delegates creation of) long-term associations between identifiers and
781	   information objects.  Examples of NAAs include national libraries,
782	   national archives, and publishers.  An NAA may arrange with an
783	   external organization for identifier assignment.  The US Library of
784	   Congress, for example, allows OCLC (the Online Computer Library
785	   Center, a major world cataloger of books) to create associations
786	   between Library of Congress call numbers (LCCNs) and the books that
787	   OCLC processes.  A cataloging record is generated that testifies to
788	   each association, and the identifier is included by the publisher,
789	   for example, in the front matter of a book.

791	   An NAA does not so much create an identifier as create an
792	   association.  The NAA first draws an unused identifier string from
793	   its namespace, which is the set of all identifiers under its control.
794	   It then records the assignment of the identifier to an information
795	   object having sundry witnessed characteristics, such as a particular
796	   author and modification date.  A namespace is usually reserved for an
797	   NAA by agreement with recognized community organizations (such as
798	   IANA and ISO) that all names containing a particular string be under
799	   its control.  In the ARK an NAA is represented by the Name Assigning
800	   Authority Number (NAAN).

802	   The ARK namespace reserved for an NAA is the set of names bearing its
803	   particular NAAN.  For example, all strings beginning with
804	   "ark:/12025/" are under control of the NAA registered under 12025,
805	   which might be the National Library of Finland.  Because each NAA has
806	   a different NAAN, names from one namespace cannot conflict with those
807	   from another.  Each NAA is free to assign names from its namespace
808	   (or delegate assignment) according to its own policies.  These
809	   policies must be documented in a manner similar to the declarations
810	   required for URN Namespace registration [URNNID].

812	   For now, registration of ARK NAAs is in a bootstrapping phase.  To
813	   register, please read about the mapping authority discovery file in
814	   the next section and send email to ark@cdlib.org.

816	4.  Finding a Name Mapping Authority

818	   In order to derive an actionable identifier (these days, a URL) from
819	   an ARK, a hostport (hostname or hostname plus port combination) for a
820	   working Name Mapping Authority (NMA) must be found.  An NMA is a
821	   service that is able to respond to the three basic ARK service
822	   requests.  Relying on registration and client-side discovery, NMAs
823	   make known which NAAs' identifiers they are willing to service.

825	   Upon encountering an ARK, a user (or client software) looks inside it
826	   for the optional NMAH part (the hostport of the NMA's ARK service).
827	   If it contains an NMAH that is working, this NMAH discovery step may
828	   be skipped; the NMAH effectively uses the beginning of an ARK to
829	   cache the results of a prior mapping authority discovery process.  If
830	   a new NMAH needs to found, the client looks inside the ARK again for
831	   the NAAN (Name Assigning Authority Number).  Querying a global
832	   database, it then uses the NAAN to look up all current NMAHs that
833	   service ARKs issued by the identified NAA.  The global database is
834	   key, and two specific methods for querying it are given in this
835	   section.

837	   In the interests of long-term persistence, however, ARK mechanisms
838	   are first defined in high-level, protocol-independent terms so that
839	   mechanisms may evolve and be replaced over time without compromising
840	   fundamental service objectives.  Either or both specific methods
841	   given here may eventually be supplanted by better methods since, by
842	   design, the ARK scheme does not depend on a particular method, but
843	   only on having some method to locate an active NMAH.

845	   At the time of issuance, at least one NMAH for an ARK should be
846	   prepared to service it.  That NMA may or may not be administered by
847	   the Name Assigning Authority (NAA) that created it.  Consider the
848	   following hypothetical example of providing long-term access to a
849	   cancer research journal.  The publisher wishes to turn a profit and
850	   the National Library of Medicine wishes to preserve the scholarly
851	   record.  An agreement might be struck whereby the publisher would act
852	   as the NAA and the national library would archive the journal issue
853	   when it appears, but without providing direct access for the first
854	   six months.  During the first six months of peak commercial
855	   viability, the publisher would retain exclusive delivery rights and
856	   would charge access fees.  Again, by agreement, both the library and
857	   the publisher would act as NMAs, but during that initial period the
858	   library would redirect requests for issues less than six months old
859	   to the publisher.  At the end of the waiting period, the library
860	   would then begin servicing requests for issues older than six months
861	   by tapping directly into its own archives.  Meanwhile, the publisher
862	   might routinely redirect incoming requests for older issues to the
863	   library.  Long-term access is thereby preserved, and so is the
864	   commercial incentive to publish content.

866	   Although it will be common for an NAA also to run an NMA service, it
867	   is never a requirement.  Over time NAAs and NMAs will come and go.
868	   One NMA will succeed another, and there might be many NMAs serving
869	   the same ARKs simultaneously (e.g., as mirrors or as competitors).
870	   There might also be asymmetric but coordinated NMAs as in the
871	   library-publisher example above.

873	4.1.  Looking Up NMAHs in a Globally Accessible File

875	   This subsection describes a way to look up NMAHs using a simple name
876	   authority table represented as a plain text file.  For efficient
877	   access the file may be stored in a local filesystem, but it needs to
878	   be reloaded periodically to incorporate updates.  It is not expected
879	   that the size of the file or frequency of update should impose an
880	   undue maintenance or searching burden any time soon, for even
881	   primitive linear search of a file with ten-thousand NAAs is a
882	   subsecond operation on modern server machines.  The proposed file
883	   strategy is similar to the /etc/hosts file strategy that supported
884	   Internet host address lookup for a period of years before the advent
885	   of DNS.

887	   The name authority table file is updated on an ongoing basis and is
888	   available for copying over the internet from the California Digital
889	   Library at http://www.cdlib.org/inside/diglib/ark/natab and from a
890	   number of mirror sites.  The file contains comment lines (lines that
891	   begin with `#') explaining the format and giving the file's
892	   modification time, reloading address, and NAA registration
893	   instructions.  There is even a Perl script that processes the file
894	   embedded in the file's comments.  As of February 2005, currently
895	   registered Name Assigning Authorities are:

897	        12025           National Library of Medicine
898	        12026           Library of Congress
899	        12027           National Agriculture Library
900	        13030           California Digital Library
901	        13038           World Intellectual Property Organization
902	        20775           University of California San Diego
903	        29114           University of California San Francisco
904	        28722           University of California Berkeley
905	        15230           Rutgers University Libraries
906	        13960           Internet Archive
907	        64269           Digital Curation Centre
908	        62624           New York University Libraries
909	        67531           University of North Texas Libraries
910	        27927           Ithaka Electronic-Archiving Initiative

912	   A snapshot of the name authority table file appears in an appendix.

914	4.2.  Looking up NMAHs Distributed via DNS

916	   This subsection introduces a method for looking up NMAHs that is
917	   based on the method for discovering URN resolvers described in
918	   [NAPTR].  It relies on querying the DNS system already installed in
919	   the background infrastructure of most networked computers.  A query
920	   is submitted to DNS asking for a list of resolvers that match a given
921	   NAAN.  DNS distributes the query to the particular DNS servers that
922	   can best provide the answer, unless the answer can be found more
923	   quickly in a local DNS cache as a side-effect of a recent query.

925	   Responses come back inside Name Authority Pointer (NAPTR) records.
926	   The normal result is one or more candidate NMAHs.

928	   In its full generality the [NAPTR] algorithm ambitiously accommodates
929	   a complex set of preferences, orderings, protocols, mapping services,
930	   regular expression rewriting rules, and DNS record types.  This
931	   subsection proposes a drastic simplification of it for the special
932	   case of ARK mapping authority discovery.  The simplified algorithm is
933	   called Maptr.  It uses only one DNS record type (NAPTR) and restricts
934	   most of its field values to constants.  The following hypothetical
935	   excerpt from a DNS data file for the NAAN known as 12026 shows three
936	   example NAPTR records ready to use with the Maptr algorithm.

938	       12026.ark.arpa.
939	       ;; US Library of Congress
940	       ;;       order pref flags service regexp  replacement
941	        IN NAPTR  0     0   "h"  "ark"   "USLC"  lhc.nlm.nih.gov:8080
942	        IN NAPTR  0     0   "h"  "ark"   "USLC"  foobar.zaf.org
943	        IN NAPTR  0     0   "h"  "ark"   "USLC"  sneezy.dopey.com

945	   All the fields are held constant for Maptr except for the "flags",
946	   "regexp", and "replacement" fields.  The "service" field contains the
947	   constant value "ark" so that NAPTR records participating in the Maptr
948	   algorithm will not be confused with other NAPTR records.  The "order"
949	   and "pref" fields are held to 0 (zero) and otherwise ignored for now;
950	   the algorithm may evolve to use these fields for ranking decisions
951	   when usage patterns and local administrative needs are better
952	   understood.

954	   When a Maptr query returns a record with a flags field of "h" (for
955	   hostport, a Maptr extension to the NAPTR flags), the replacement
956	   field contains the NMAH (hostport) of an ARK service provider.  When
957	   a query returns a record with a flags field of "" (the empty string),
958	   the client needs to submit a new query containing the domain name
959	   found in the replacement field.  This second sort of record exploits
960	   the distributed nature of DNS by redirecting the query to another
961	   domain name.  It looks like this.

963	       12345.ark.arpa.
964	       ;; Digital Library Consortium
965	       ;;       order pref flags service regexp replacement
966	        IN NAPTR  0     0    ""  "ark"     ""   dlc.spct.org.

968	   Here is the Maptr algorithm for ARK mapping authority discovery.  In
969	   it replace <NAAN> with the NAAN from the ARK for which an NMAH is
970	   sought.

972	        (1) Initialize the DNS query:  type=NAPTR,
973	        query=<NAAN>.ark.arpa.

975	        (2) Submit the query to DNS and retrieve (NAPTR) records,
976	        discarding any record that does not have "ark" for the service
977	        field.

979	        (3) All remaining records with a flags fields of "h" contain
980	        candidate NMAHs in their replacement fields.  Set them aside, if
981	        any.

983	        (4) Any record with an empty flags field ("") has a replacement
984	        field containing a new domain name to which a subsequent query
985	        should be redirected.  For each such record, set
986	        query=<replacement> then go to step (2).  When all such records
987	        have been recursively exhausted, go to step (5).

989	        (5) All redirected queries have been resolved and a set of
990	        candidate NMAHs has been accumulated from steps (3).  If there
991	        are zero NMAHs, exit - no mapping authority was found.  If there
992	        is one or more NMAH, choose one using any criteria you wish,
993	        then exit.

995	   A Perl script that implements this algorithm is included here.

997	     #!/depot/bin/perl

999	     use Net::DNS;                 # include simple DNS package
1000	     my $qtype = "NAPTR";               # initialize query type
1001	     my $naa = shift;              # get NAAN script argument
1002	     my $mad = new Net::DNS::Resolver;  # mapping authority discovery

1004	     &maptr("$naa.ark.arpa");      # call maptr - that's it

1006	     sub maptr {                   # recursive maptr algorithm
1007	          my $dname = shift;       # domain name as argument
1008	          my ($rr, $order, $pref, $flags, $service, $regexp,
1009	               $replacement);
1010	          my $query = $mad->query($dname, $qtype);
1011	          return                   # non-productive query
1012	               if (! $query || ! $query->answer);
1013	          foreach $rr ($query->answer) {
1014	               next           # skip records of wrong type
1015	                    if ($rr->type ne $qtype);
1016	               ($order, $pref, $flags, $service, $regexp,
1017	                    $replacement) = split(/\s/, $rr->rdatastr);
1018	               if ($flags eq "") {
1019	                    &maptr($replacement);    # recurse
1020	               } elsif ($flags eq "h") {
1021	                    print "$replacement\n";  # candidate NMAH
1022	               }
1023	          }
1024	     }

1026	   The global database thus distributed via DNS and the Maptr algorithm
1027	   can easily be seen to mirror the contents of the Name Authority Table
1028	   file described in the previous section.

1030	5.  Generic ARK Service Definition

1032	   An ARK request's output is delivered information; examples include
1033	   the object itself, a policy declaration (e.g., a promise of support),
1034	   a descriptive metadata record, or an error message.  The experience
1035	   of object delivery is expected to be an evolving mix of information
1036	   that reflects changing service expectations and technology
1037	   requirements; contemporary examples include such things as an object
1038	   summary and component links formatted for human consumption.  ARK
1039	   services must be couched in high-level, protocol-independent terms if
1040	   persistence is to outlive today's networking infrastructural
1041	   assumptions.  The high-level ARK service definitions listed below are
1042	   followed in the next section by a concrete method (one of many
1043	   possible methods) for delivering these services with today's
1044	   technology.

1046	5.1.  Generic ARK Access Service (access, location)

1048	   Returns (a copy of) the object or a redirect to the same, although a
1049	   sensible object proxy may be substituted.  Examples of sensible
1050	   substitutes include,

1052	     - a table of contents instead of a large complex document,
1053	     - a home page instead of an entire web site hierarchy,
1054	     - a rights clearance challenge before accessing protected data,
1055	     - directions for access to an offline object (e.g., a book),
1056	     - a description of an intangible object (a disease, an event), or
1057	     - an applet acting as "player" for a large multimedia object.

1059	   May also return a discriminated list of alternate object locators.
1060	   If access is denied, returns an explanation of the object's current
1061	   (perhaps permanent) inaccessibility.

1063	5.2.  Generic Policy Service (permanence, naming, etc.)

1065	   Returns declarations of policy and support commitments for given
1066	   ARKs.  Declarations are returned in either a structured metadata
1067	   format or a human readable text format; sometimes one format may
1068	   serve both purposes.  Policy subareas may be addressed in separate
1069	   requests, but the following areas should should be covered:  object
1070	   permanence, object naming, object fragment addressing, and
1071	   operational service support.

1073	   The permanence declaration for an object is a rating defined with
1074	   respect to an identified permanence provider (guarantor), which will
1075	   be the NMA.  It may include the following aspects.

1077	        (a) "object availability" - whether and how access to the object
1078	        is supported (e.g., online 24x7, or offline only),

1080	        (b) "identifier validity" - under what conditions the identifier
1081	        will be or has been re-assigned,

1083	        (c) "content invariance" - under what conditions the content of
1084	        the object is subject to change, and

1086	        (d) "change history" - access to corrections, migrations, and
1087	        revisions, whether through links to the changed objects
1088	        themselves or through a document summarizing the change history

1090	   One approach to a permanence rating framework, conceived
1091	   independently from ARKs, is given in [NLMPerm].  Under ongoing
1092	   development and limited deployment at the US National Library of
1093	   Medicine, it identifies the following "permanence levels":

1095	        Not Guaranteed: No commitment has been made to retain this
1096	        resource.  It could become unavailable at any time.  Its
1097	        identifier could be changed.

1099	        Permanent: Dynamic Content: A commitment has been made to keep
1100	        this resource permanently available.  Its identifier will always
1101	        provide access to the resource.  Its content could be revised or
1102	        replaced.

1104	        Permanent: Stable Content: A commitment has been made to keep
1105	        this resource permanently available.  Its identifier will always
1106	        provide access to the resource.  Its content is subject only to
1107	        minor corrections or additions.

1109	        Permanent: Unchanging Content: A commitment has been made to
1110	        keep this resource permanently available.  Its identifier will
1111	        always provide access to the resource.  Its content will not
1112	        change.

1114	   Naming policy for an object includes an historical description of the
1115	   NAA's (and its successor NAA's) policies regarding differentiation of
1116	   objects.  Since it the NMA who responds to requests for policy
1117	   statements, it is useful for the NMA to be able to produce or
1118	   summarize these historical NAA documents.  Naming policy may include
1119	   the following aspects.

1121	        (i) "similarity" - (or "unity") the limit, defined by the NAA,
1122	        to the level of dissimilarity beyond which two similar objects
1123	        warrant separate identifiers but before which they share one
1124	        single identifier, and

1126	        (ii) "granularity" - the limit, defined by the NAA, to the level
1127	        of object subdivision beyond which sub-objects do not warrant
1128	        separately assigned identifiers but before which sub-objects are
1129	        assigned separate identifiers.

1131	   Subnaming policy for an object describes the qualifiers that the NMA,
1132	   in fulfilling its ongoing and evolving service obligations, allows as
1133	   extensions to an NAA-assigned ARK.  To the conceptual object that the
1134	   NAA named with an ARK, the NMA may add component access points and
1135	   derivatives (e.g., format migrations in aid of preservation) in order
1136	   to provide both basic and value-added services.

1138	   Addressing policy for an object includes a description of how, during
1139	   access, object components (e.g., paragraphs, sections) or views
1140	   (e.g., image conversions) may or may not be "addressed", in other
1141	   words, how the NMA permits arguments or parameters to modify the
1142	   object delivered as the result of an ARK request.  If supported,
1143	   these sorts of operations would provide things like byte-ranged
1144	   fragment delivery and open-ended format conversions, or any set of
1145	   possible transformations that would be too numerous to list or to
1146	   identify with separately assigned ARKs.

1148	   Operational service support policy includes a description of general
1149	   operational aspects of the NMA service, such as after-hours staffing
1150	   and trouble reporting procedures.

1152	5.3.  Generic Description Service

1154	   Returns a description of the object.  Descriptions are returned in
1155	   either a structured metadata format or a human readable text format;
1156	   sometimes one format may serve both purposes.  A description must at
1157	   a minimum answer the who, what, when, and where questions concerning
1158	   an expression of the object.  Standalone descriptions should be
1159	   accompanied by the modification date and source of the description
1160	   itself.  May also return discriminated lists of ARKs that are related
1161	   to the given ARK.

1163	6.  Overview of the Tiny HTTP URL Mapping Protocol (THUMP)

1165	   The Tiny HTTP URL Mapping Protocol (THUMP) is a way of taking a key
1166	   (a kind of identifier) and asking such questions as, what information
1167	   does this identify and how permanent is it?  [THUMP] is in fact one
1168	   specific method under development for delivering ARK services.  The
1169	   protocol runs over HTTP to exploit the web browser's current pre-
1170	   eminence as user interface to the Internet.  THUMP is designed so
1171	   that a person can enter ARK requests directly into the location field
1172	   of current browser interfaces.  Because it runs over HTTP, THUMP can
1173	   be simulated and tested within keyboard-based [TELNET] sessions.

1175	   The asker (a person or client program) starts with an identifier,
1176	   such as an ARK or a URL.  The identifier reveals to the asker (or
1177	   allows the asker to infer) the Internet host name and port number of
1178	   a server system that responds to questions.  Here, this is just the
1179	   NMAH that is obtained by inspection and possibly lookup based on the
1180	   ARK's NAAN.  The asker then sets up an HTTP session with the server
1181	   system, sends a question via a THUMP request (contained within an
1182	   HTTP request), receives an answer via a THUMP response (contained
1183	   within an HTTP response), and closes the session.  That concludes the
1184	   connected portion of the protocol.

1186	   A THUMP request is a string of characters beginning with a `?'
1187	   (question mark) that is appended to the identifier string.  The
1188	   resulting string is sent as an argument to HTTP's GET command.
1189	   Request strings too long for GET may be sent using HTTP's POST
1190	   command.  The three most common requests correspond to three
1191	   degenerate special cases that keep the user's learning and typing
1192	   burden low.  First, a simple key with no request at all is the same
1193	   as an ordinary access request.  Thus a plain ARK entered into a
1194	   browser's location field behaves much like a plain URL, and returns
1195	   access to the primary identified object, for instance, an HTML
1196	   document.

1198	   The second special case is a minimal ARK description request string
1199	   consisting of just "?".  For example, entering the string,

1201	             ark.nlm.nih.gov/12025/psbbantu?

1203	   into the browser's location field directly precipitates a request for
1204	   a metadata record describing the object identified by
1205	   ark:/12025/psbbantu.  The browser, unaware of THUMP, prepares and
1206	   sends an HTTP GET request in the same manner as for a URL.  THUMP is
1207	   designed so that the response (indicated by the returned HTTP content
1208	   type) is normally displayed, whether the output is structured for
1209	   machine processing (text/plain) or formatted for human consumption
1210	   (text/html).

1212	   In the following example THUMP session, each line has been annotated
1213	   to include a line number and whether it was the client or server that
1214	   sent it.  Without going into much depth, the session has four pieces
1215	   separated from each other by blank lines:  the client's piece (lines
1216	   1-3), the server's HTTP/THUMP response headers (4-7), and the body of
1217	   the server's response (8-17).  The first and last lines (1 and 17)
1218	   correspond to the client's steps to start the TCP session and the
1219	   server's steps to end it, respectively.

1221	      1  C: [opens session]
1222	         C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu? HTTP/1.1
1223	         C:
1224	         S: HTTP/1.1 200 OK
1225	      5  S: Content-Type: text/plain
1226	         S: THUMP-Status: 0.1 200 OK
1227	         S:
1228	         S: |set: NLM | 12025/psbbantu? | 20030731
1229	         S:         | http://ark.nlm.nih.gov/ark:/12025/psbbantu?
1230	     10  S: here: 1 | 1 | 1
1231	         S:
1232	         S: erc:
1233	         S: who:    Lederberg, Joshua
1234	         S: what:   Studies of Human Families for Genetic Linkage
1235	     15  S: when:   1974
1236	         S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1237	         S: [closes session]

1239	   The first two server response lines (4-5) above are typical of HTTP.
1240	   The next line (6) is peculiar to THUMP, and indicates the THUMP
1241	   version and a normal return status.  The balance of the response
1242	   consists of a record set header (lines 8-10) and a single metadata
1243	   record (12-16) that comprises the ARK description service response.
1244	   The record set header identifies (8-9) who created the set, what its
1245	   title is, when it was created, and where an automated process can
1246	   access the set; it ends in a line (10) whose respective sub-elements
1247	   indicate that here in this communication the recipient can expect to
1248	   find 1 record, starting at the record numbered 1, from a set
1249	   consisting of a total of 1 record (i.e., here is the entire set,
1250	   consisting of exactly one record).

1252	   The returned record (12-16) is in the format of an Electronic
1253	   Resource Citation [ERC], which is discussed in more detail in the
1254	   next section.  For now, note that it contains four elements that
1255	   answer the top priority questions regarding an expression of the
1256	   object:  who played a major role in expressing it, what the
1257	   expression was called, when is was created, and where the expression
1258	   may be found.  This quartet of elements comes up again and again in
1259	   ERCs.

1261	   The third degenerate special case of an ARK request (and no other
1262	   cases will be described in this document) is the string "??",
1263	   corresponding to a minimal permanence policy request.  It can be seen
1264	   in use appended to an ARK (on line 2) in the example session that
1265	   follows.

1267	      1  C: [opens session]
1268	         C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu?? HTTP/1.1
1269	         C:
1270	         S: HTTP/1.1 200 OK
1271	      5  S: Content-Type: text/plain
1272	         S: THUMP-Status: 0.1 200 OK
1273	         S:
1274	         S: |set: NLM | 12025/psbbantu?? | 20030731
1275	         S:         | http://ark.nlm.nih.gov/ark:/12025/psbbantu??
1276	     10  S: here: 1 | 1 | 1
1277	         S:
1278	         S: erc:
1279	         S: who:    Lederberg, Joshua
1280	         S: what:   Studies of Human Families for Genetic Linkage
1281	     15  S: when:   1974
1282	         S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1283	         S: erc-support:
1284	         S: who:    USNLM
1285	         S: what:   Permanent, Unchanging Content
1286	     20  S: when:   20010421
1287	         S: where:  http://ark.nlm.nih.gov/yy22948
1288	         S: [closes session]

1290	   Again, a single metadata record (lines 12-21) is returned, but it
1291	   consists of two segments.  The first segment (12-16) gives the same
1292	   basic citation information as in the previous example.  It is
1293	   returned in order to establish context for the persistence
1294	   declaration in the second segment (17-21).

1296	   Each segment in an ERC tells a different story relating to the
1297	   object, so although the same four questions (elements) appear in
1298	   each, the answers depend on the segment's story type.  While the
1299	   first segment tells the story of an expression of the object, the
1300	   second segment tells the story of the support commitment made to it:
1301	   who made the commitment, what the nature of the commitment was, when
1302	   it was made, and where a fuller explanation of the commitment may be
1303	   found.

1305	7.  Overview of Electronic Resource Citations (ERCs)

1307	   An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a
1308	   simple, compact, and printable record designed to hold data
1309	   associated with an information resource.  By design, the ERC is a
1310	   metadata format that balances the needs for expressive power, very
1311	   simple machine processing, and direct human manipulation.

1313	   A founding principle of the ERC is that direct human contact with
1314	   metadata will be a necessary and sufficient condition for the near
1315	   term rapid development of metadata standards, systems, and services.
1316	   Thus the machine-processable ERC format must only minimally strain
1317	   people's ability to read, understand, change, and transmit ERCs
1318	   without their relying on intermediation with specialized software
1319	   tools.  The basic ERC needs to be succinct, transparent, and
1320	   trivially parseable by software.

1322	   In the current Internet, it is natural seriously to consider using
1323	   XML as an exchange format because of predictions that it will obviate
1324	   many ad hoc formats and programs, and unify much of the world's
1325	   information under one reliable data structuring discipline that is
1326	   easy to generate, verify, parse, and render.  It appears, however,
1327	   that XML is still only catching on after years of standards work and
1328	   implementation experience.  The reasons for it are unclear, but for
1329	   now very simple XML interpretation is still out of reach.  Another
1330	   important caution is that XML structures are hard on the eyeballs,
1331	   taking up an amount of display (and page) space that significantly
1332	   exceeds that of traditional formats.  Until these conflicts with ERC
1333	   principle are resolved, XML is not a first choice for representing
1334	   ERCs.  Borrowing instead from the data structuring format that
1335	   underlies the successful spread of email and web services, the first
1336	   ERC format uses [ANVL], which is based on email and HTTP headers
1337	   [RFC822].  There is a naturalness to ANVL's label-colon-value format
1338	   (seen in the previous section) that barely needs explanation to a
1339	   person beginning to enter ERC metadata.

1341	   Besides simplicity of ERC system implementation and data entry
1342	   mechanics, ERC semantics (what the record and its constituent parts
1343	   mean) must also be easy to explain.  ERC semantics are based on a
1344	   reformulation and extension of the Dublin Core [DCORE] hypothesis,
1345	   which suggests that the fifteen Dublin Core metadata elements have a
1346	   key role to play in cross-domain resource description.  The ERC
1347	   design recognizes that the Dublin Core's primary contribution is the
1348	   international, interdisciplinary consensus that identified fifteen
1349	   semantic buckets (element categories), regardless of how they are
1350	   labeled.  The ERC then adds a definition for a record and some
1351	   minimal compliance rules.  In pursuing the limits of simplicity, the
1352	   ERC design combines and relabels some Dublin Core buckets to isolate
1353	   a tiny kernel (subset) of four elements for basic cross-domain
1354	   resource description.

1356	   For the cross-domain kernel, the ERC uses the four basic elements -
1357	   who, what, when, and where - to pretend that every object in the
1358	   universe can have a uniform minimal description.  Each has a name or
1359	   other identifier, a location, some responsible person or party, and a
1360	   date.  It doesn't matter what type of object it is, or whether one
1361	   plans to read it, interact with it, smoke it, wear it, or navigate
1362	   it.  Of course, this approach is flawed because uniformity of
1363	   description for some object types requires more semantic contortion
1364	   and sacrifice than for others.  That is why at the beginning of this
1365	   document, the ARK was said to be suited to objects that accommodate
1366	   reasonably regular electronic description.

1368	   While insisting on uniformity at the most basic level provides
1369	   powerful cross-domain leverage, the semantic sacrifice is great for
1370	   many applications.  So the ERC also permits a semantically rich and
1371	   nuanced description to co-exist in a record along with a basic
1372	   description.  In that way both sophisticated and naive recipients of
1373	   the record can extract the level of meaning from it that best suits
1374	   their needs and abilities.  Key to unlocking the richer description
1375	   is a controlled vocabulary of ERC record types (not explained in this
1376	   document) that permit knowledgeable recipients to apply defined sets
1377	   of additional assumptions to the record.

1379	7.1.  ERC Syntax

1381	   An ERC record is a sequence of metadata elements ending in a blank
1382	   line.  An element consists of a label, a colon, and an optional
1383	   value.  Here is an example of a record with five elements.

1385	          erc:
1386	          who: Gibbon, Edward
1387	          what: The Decline and Fall of the Roman Empire
1388	          when: 1781
1389	          where: http://www.ccel.org/g/gibbon/decline/

1391	   A long value may be folded (continued) onto the next line by
1392	   inserting a newline and indenting the next line.  A value can be thus
1393	   folded across multiple lines.  Here are two example elements, each
1394	   folded across four lines.

1396	          who/created: University of California, San Francisco, AIDS
1397	               Program at San Francisco General Hospital | University
1398	               of California, San Francisco, Center for AIDS Prevention
1399	               Studies
1400	          what/Topic:
1401	                Heart Attack | Heart Failure
1402	               | Heart
1403	                                Diseases

1405	   An element value folded across several lines is treated as if the
1406	   lines were joined together on one long line.  For example, the second
1407	   element from the previous example is considered equivalent to

1409	          what/Topic: Heart Attack | Heart Failure | Heart Diseases

1411	   An element value may contain multiple values, each one separated from
1412	   the next by a `|' (pipe) character.  The element from the previous
1413	   example contains three values.

1415	   For annotation purposes, any line beginning with a `#' (hash)
1416	   character is treated as if it were not present; this is a "comment"
1417	   line (a feature not available in email or HTTP headers).  For
1418	   example, the following element is spread across four lines and
1419	   contains two values:

1421	          what/Topic:
1422	               Heart Attack
1423	          #    | Heart Failure  -- hold off until next review cycle
1424	               | Heart Diseases

1426	7.2.  ERC Stories

1428	   An ERC record is organized into one or more distinct segments, where
1429	   where each segment tells a story about a different aspect of the
1430	   information resource.  A segment boundary occurs whenever a segment
1431	   label (an element beginning with "erc") is encountered.  The basic
1432	   label "erc:" introduces the story of an object's expression (e.g.,
1433	   its publication, installation, or performance).  The label "erc-
1434	   about:" introduces the story of an object's content (what it is
1435	   about) and "erc-support:" introduces the story of a support
1436	   commitment made to it.  A story segment that concerns the ERC itself
1437	   is introduced by the label "erc-from:".  It is an important segment
1438	   that tells the story of the ERC's provenance.  Elements beginning
1439	   with "erc" are reserved for segment labels and their associated story
1440	   types.  From an earlier example, here is an ERC with two segments.

1442	         erc:
1443	         who:    Lederberg, Joshua
1444	         what:   Studies of Human Families for Genetic Linkage
1445	         when:   1974
1446	         where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1447	         erc-support:
1448	         who:    NIH/NLM/LHNCBC
1449	         what:   Permanent, Unchanging Content
1450	         # Note to ops staff:  date needs verification.
1451	         when:   2001 04 21
1452	         where:  http://ark.nlm.nih.gov/yy22948

1454	   Segment stories are told according to journalistic tradition.  While
1455	   any number of pertinent elements may appear in a segment, priority is
1456	   placed on answering the questions who, what, when, and where at the
1457	   beginning of each segment so that readers can make the most important
1458	   selection or rejection decisions as soon as possible.  To make things
1459	   simple, the listed ordering of the questions is maintained in each
1460	   segment (as it happens most people who have been exposed to this
1461	   story telling technique are already familiar with the above
1462	   ordering).

1464	   The four questions are answered by using corresponding element
1465	   labels.  The four element labels can be re-used in each story
1466	   segment, but their meaning changes depending on the segment (the
1467	   story type) in which they appear.  In the example above, "who" is
1468	   first used to name a document's author and subsequently used to name
1469	   the permanence guarantor (provider).  Similarly, "when" first lists
1470	   the date of object creation and in the next segment lists the date of
1471	   a commitment decision.  Four labels appearing across three segments
1472	   effectively map to twelve semantically distinct elements.  Distinct
1473	   element meanings are mapped to Dublin Core elements in a later
1474	   section.

1476	7.3.  The ERC Anchoring Story

1478	   Each ERC contains an anchoring story.  It is usually the first
1479	   segment labeled "erc:" and it concerns an "anchoring" expression of
1480	   the object.  An "anchoring" expression is the one that a provider
1481	   deemed the most suitable basic referent given the audience and
1482	   application for which it produced the ERC.  If it sounds like the
1483	   provider has great latitude in choosing its anchoring expression, it
1484	   is because it does.  A typical anchoring story in an ERC for a born-
1485	   digital document would be the story of the document's release on a
1486	   web site; such a document would then be the anchoring expression.

1488	   An anchoring story need not be the central descriptive goal of an ERC
1489	   record.  For example, a museum provider may create an ERC for a
1490	   digitized photograph of a painting but choose to anchor it in the
1491	   story of the original painting instead of the story of the electronic
1492	   likeness; although the ERC may through other segments prove to be
1493	   centrally concerned with describing the electronic likeness, the
1494	   provider may have chosen this particular anchoring story in order to
1495	   make the ERC visible in a way that is most natural to patrons (who
1496	   would find the Mona Lisa under da Vinci sooner than they would find
1497	   it under the name of the person who snapped the photograph or scanned
1498	   the image).  In another example, a provider that creates an ERC for a
1499	   dramatic play as an abstract work has the task of describing a piece
1500	   of intangible intellectual property.  To anchor this abstract object
1501	   in the concrete world, if only through a derivative expression, it
1502	   makes sense for the provider to choose a suitable printed edition of
1503	   the play as the anchoring object expression (to describe in the
1504	   anchoring story) of the ERC.

1506	   The anchoring story has special rules designed to keep ERC processing
1507	   simple and predictable.  Each of the four basic elements (who, what,
1508	   when, and where) must be present, unless a best effort to supply it
1509	   fails.  In the event of failure, the element still appears but a
1510	   special value (described later) is used to explain the missing value.
1511	   While the requirement that each of the four elements be present only
1512	   applies to the anchoring story segment, as usual these elements
1513	   appear at the beginning of the segment and may only be used in the
1514	   prescribed order.  A minimal ERC would normally consist of just an
1515	   anchoring story and the element quartet, as illustrated in the next
1516	   example.

1518	         erc:
1519	         who:   National Research Council
1520	         what:  The Digital Dilemma
1521	         when:  2000
1522	         where: http://books.nap.edu/html/digital%5Fdilemma

1524	   A minimal ERC can be abbreviated so that it resembles a traditional
1525	   compact bibliographic citation that is nonetheless completely machine
1526	   processable.  The required elements and ordering makes it possible to
1527	   eliminate the element labels, as shown here.

1529	         erc: National Research Council | The Digital Dilemma | 2000
1530	                | http://books.nap.edu/html/digital%5Fdilemma

1532	7.4.  ERC Elements

1534	   As mentioned, the four basic ERC elements (who, what, when, and
1535	   where) take on different specific meanings depending on the story
1536	   segment in which they are used.  By appearing in each segment, albeit
1537	   in different guises, the four elements serve as a valuable mnemonic
1538	   device - a kind of checklist - for constructing minimal story
1539	   segments from scratch.  Again, it is only in the anchoring segment
1540	   that all four elements are mandatory.

1542	   Here are some mappings between ERC elements and Dublin Core [DCORE]
1543	   elements.

1545	          Segment     ERC Element     Equivalent Dublin Core Element
1546	         ---------    -----------     ------------------------------
1547	            erc          who          Creator/Contributor/Publisher
1548	            erc          what                Title
1549	            erc          when                Date
1550	            erc          where               Identifier
1551	         erc-about       who                  <none>
1552	         erc-about       what                Subject
1553	         erc-about       when                Coverage (temporal)
1554	         erc-about       where               Coverage (spatial)

1556	   The basic element labels may also be qualified to add nuances to the
1557	   semantic categories that they identify.  Elements are qualified by
1558	   appending a `/' (slash) and a qualifier term.  Often qualifier terms
1559	   appear as the past tense form of a verb because it makes re-using
1560	   qualifiers among elements easier.

1562	         who/published:  ...
1563	         when/published: ...
1564	         where/published: ...

1566	   Using past tense verbs for qualifiers also reminds providers and
1567	   recipients that element values contain transient assertions that may
1568	   have been true once, but that tend to become less true over time.
1569	   Recipients that don't understand the meaning of a qualifier can fall
1570	   back onto the semantic category (bucket) designated by the
1571	   unqualified element label.  Inevitably recipients (people and
1572	   software) will have diverse abilities in understanding elements and
1573	   qualifiers.

1575	   Any number of other elements and qualifiers may be used in
1576	   conjunction with the quartet of basic segment questions.  The only
1577	   semantic requirement is that they pertain to the segment's story.
1578	   Also, it is only the four basic elements that change meaning
1579	   depending on their segment context.  All other elements have meaning
1580	   independent of the segment in which they appear.  If an element label
1581	   stripped of its qualifier is still not recognized by the recipient, a
1582	   second fall back position is to ignore it and rely on the four basic
1583	   elements.

1585	   Elements may be either Canonical, Provisional, or Local.  Canonical
1586	   elements are officially recognized via a registry as part of the
1587	   metadata vernacular.  All elements, qualifiers, and segment labels
1588	   used in this document up until now belong to that vernacular.
1589	   Provisional elements are also officially recognized via the registry,
1590	   but have only been proposed for inclusion in the vernacular.  To be
1591	   promoted to the vernacular, a provisional element passes through a
1592	   vetting process during which its documentation must be in order and
1593	   its community acceptance demonstrated.  Local elements are any
1594	   elements not officially recognized in the registry.  The registry
1595	   [DERC] is a work in progress.

1597	   Local elements can be immediately distinguishable from Canonical or
1598	   Provisional elements because all terms that begin with an upper case
1599	   letter are reserved for spontaneous local use.  No term beginning
1600	   with an upper case letter will ever be assigned Canonical or
1601	   Provisional status, so it should be safe to use such terms for local
1602	   purposes.  Any recipient of external ERCs containing such terms will
1603	   understand them to be part of the originating provider's local
1604	   metadata dialect.  Here's an example ERC with three segments, one
1605	   local element, and two local qualifiers.  The segment boundaries have
1606	   been emphasized by comment lines (which, as before, are ignored by
1607	   processors).

1609	         erc:
1610	         who: Bullock, TH | Achimowicz, JZ | Duckrow, RB
1611	                 | Spencer, SS | Iragui-Madoz, VJ
1612	         what: Bicoherence of intracranial EEG in sleep,
1613	                 wakefulness and seizures
1614	         when: 1997 12 00
1615	         where: http://cogprints.soton.ac.uk/%{
1616	                 documents/disk0/00/00/01/22/index.html %}
1617	         in: EEG Clin Neurophysiol | 1997 12 00 | v103, i6, p661-678
1618	         IDcode: cog00000122
1619	         # ---- new segment ----
1620	         erc-about:
1621	         what/Subcategory: Bispectrum | Nonlinearity | Epilepsy
1622	                 | Cooperativity | Subdural | Hippocampus | Higher moment
1623	         # ---- new segment ----
1624	         erc-from:
1625	         who: NIH/NLM/NCBI
1626	         what: pm9546494
1627	         when/Reviewed: 1998 04 18 021600
1628	         where: http://ark.nlm.nih.gov/12025/pm9546494?

1630	   The local element "IDcode" immediately precedes the "erc-about"
1631	   segment, which itself contains an element with the local qualifier
1632	   "Subcategory".  The second to last element also carries the local
1633	   qualifier "Reviewed".  Finally, what might be a provisional element
1634	   "in" appears near the end of the first segment.  It might have been
1635	   proposed as a way to complete a citation for an object originally
1636	   appearing inside another object (such as an article appearing in a
1637	   journal or an encyclopedia).

1639	7.5.  ERC Element Values

1641	   ERC element values tend to be straightforward strings.  If the
1642	   provider intends something special for an element, it will so
1643	   indicate with markers at the beginning of its value string.  The
1644	   markers are designed to be uncommon enough that they would not likely
1645	   occur in normal data except by deliberate intent.  Markers can only
1646	   occur near the beginning of a string, and once any octet of non-
1647	   marker data has been encountered, no further marker processing is
1648	   done for the element value.  In the absence of markers the string is
1649	   considered pure data; this has been the case with all the examples
1650	   seen thus far.  The fullest form of an element value with all three
1651	   optional markers in place looks like this.

1653	         VALUE =    [markup_flags]    (:ccode)    ,    DATA

1655	   In processing, the first non-whitespace character of an ERC element
1656	   value is examined.  An initial `[' is reserved to introduce a
1657	   bracketed set of markup flags (not described in this document) that
1658	   ends with `]'.  If ERC data is machine-generated, each value string
1659	   may be preceded by "[]" to prevent any of its data from being
1660	   mistaken for markup flags.  Once past the optional markup, the
1661	   remaining value may optionally begin with a controlled code.  A
1662	   controlled code always has the form "(:ccode)", for example,

1664	         who: (:unkn) Anonymous
1665	         what: (:791) Bee Stings

1667	   Any string after such a code is taken to be an uncontrolled (e.g.,
1668	   natural language) equivalent.  The code "unkn" indicates a
1669	   conventional explanation for a missing value (stating that the value
1670	   is unknown).  The remainder of the string makes an equivalent
1671	   statement in a form that the provider deemed most suitable to its
1672	   (probably human) audience.  The code "791" could be a fixed numeric
1673	   topic identifier within an unspecified topic vocabulary.  Any code
1674	   may be ignored by those that do not understand it.

1676	   There are several codes to explain different ways in which a required
1677	   element's value may go missing.

1679	         (:unac)   temporarily inaccessible
1680	         (:unal)   unallowed, suppressed intentionally
1681	         (:unap)   not applicable, makes no sense
1682	         (:unas)   value unassigned (e.g., Untitled)
1683	         (:unav)   value unavailable indefinitely
1684	         (:unkn)   unknown (e.g., Anonymous, Inconnue)
1685	         (:etal)   too numerous to list (I<et alia>).
1686	         (:none)   never had a value, never will
1687	         (:null)   explicitly empty
1688	         (:tba)    to be assigned or announced later

1690	   Once past an optional controlled code, the remaining string value is
1691	   subjected to one final test.  If the first next non-whitespace
1692	   character is a `,' (comma), it indicates that the string value is
1693	   "sort-friendly".  This means that the value is (a) laid out with an
1694	   inverted word order useful for sorting items having comparably laid
1695	   out element values (items might be the containing ERC records) and
1696	   (b) that the value may contain other commas that indicate inversion
1697	   points should it become necessary to recover the value in natural
1698	   word order.  Typically, this feature is used to express Western-style
1699	   personal names in family-name-given-name order.  It can also be used
1700	   wherever natural word order might make sorting tricky, such as when
1701	   data contains titles or corporate names.  Here are some example
1702	   elements.

1704	         who:   ,  van Gogh, Vincent
1705	         who:,Howell, III, PhD, 1922-1987, Thurston
1706	         who:, Acme Rocket Factory, Inc., The
1707	         who:, Mao Tse Tung
1708	         who:, McCartney, Paul, Sir,
1709	         what:, Health and Human Services, United States Government
1710	                 Department of, The,

1712	   There are rules to use in recovering a copy of the value in natural
1713	   word order, if desired.  The above example strings have the following
1714	   natural word order values, respectively.

1716	         Vincent van Gogh
1717	         Thurston Howell, III, PhD, 1922-1987
1718	         The Acme Rocket Factory, Inc.
1719	         Mao Tse Tung
1720	         Sir Paul McCartney
1721	         The United States Government Department of Health and Human Services

1723	7.6.  ERC Element Encoding and Dates

1725	   Some characters that need to appear in ERC element values might
1726	   conflict with special characters used for structuring ERCs, so there
1727	   needs to be a way to include them as literal characters that are
1728	   protected from special interpretation.  This is accomplished through
1729	   an encoding mechanism that resembles the %-encoding familiar to [URI]
1730	   handlers.

1732	   The ERC encoding mechanism also uses `%', but instead of taking two
1733	   following hexadecimal digits, it takes one non-alphanumeric character
1734	   or two alphabetic characters that cannot be mistaken for hex digits.
1735	   It is designed not to be confused with normal web-style %-encoding.
1736	   In particular it can be decoded without risking unintended decoding
1737	   of normal %-encoded data (which would introduce errors).  Here are
1738	   the one-character (non-alphanumeric) ERC encoding extensions.

1740	         ERC       Purpose
1741	         ---     ------------------------------------------------
1742	         %!      decodes to the element separator `|'
1743	         %%      decodes to a percent sign `%'
1744	         %.      decodes to a comma `,'
1745	         %_      a non-character used as syntax shim
1746	         %{      a non-character that begins an expansion block
1747	         %}      a non-character that ends an expansion block

1749	   One particularly useful construct in ERC element values is the pair
1750	   of special encoding markers ("%{" and "%}") that indicates a
1751	   "expansion" block.  Whatever string of characters they enclose will
1752	   be treated as if none of the contained whitespace (SPACEs, TABs,
1753	   Newlines) were present.  This comes in handy for writing long, multi-
1754	   part URLs in a readable way.  For example, the value in

1756	         where: http://foo.bar.org/node%{
1757	                    ? db = foo
1758	                    & start = 1
1759	                    & end = 5
1760	                    & buf = 2
1761	                    & query = foo + bar + zaf
1762	                %}

1764	   is decoded into an equivalent element, but with a correct and intact
1765	   URL:

1767	     where:
1768	      http://foo.bar.org/node?db=foo&start=1&end=5&buf=2&query=foo+bar+zaf

1770	   In a parting word about ERC element values, a commonly recurring
1771	   value type is a date, possibly followed by a time.  ERC dates use the
1772	   [TEMPER] format, taking on one of the following forms:

1774	         1999                (four digit year)
1775	         2000 12 29          (year, month, day)
1776	         2000 12 29 235955   (year, month, day, hour, minute, second)

1778	   In dates, all internal whitespace is squeezed out to achieve a
1779	   normalized form suitable for lexical comparison and sorting.  This
1780	   means that the following dates

1782	         2000 12 29 235955           (recommended for readability)
1783	         2000 12 29 23 59 55
1784	         20001229 23 59 55
1785	         20001229235955              (normalized date and time)

1787	   are all equivalent.  The first form is recommended for readability.
1788	   The last form (shortest and easiest to compute with) is the
1789	   normalized form.  Hyphens and commas are reserved to create date
1790	   ranges and lists, for example,

1792	         1996-2000                   (a range of four years)
1793	         1952, 1957, 1969            (a list of three years)
1794	         1952, 1958-1967, 1985       (a mixed list of dates and ranges)
1795	         20001229-20001231           (a range of three days)

1797	7.7.  ERC Stub Records and Internal Support

1799	   The ERC design introduces the concept of a "stub" record, which is an
1800	   incomplete ERC record intended to be supplemented with additional
1801	   elements before being released as a standalone ERC record.  A stub
1802	   ERC record has no minimum required elements.  It is just a group of
1803	   elements that does not begin with "erc:" but otherwise conforms to
1804	   the ERC record syntax.

1806	   ERC stubs may be useful in supporting internal procedures using the
1807	   ERC syntax.  Often they rely on the convenience and accuracy of
1808	   automatically supplied elements, even the basic ones.  To be ready
1809	   for external use, however, an ERC stub must be transformed into a
1810	   complete ERC record having the usual required elements.  An ERC stub
1811	   record can be convenient for metadata embedded in a document, where
1812	   elements such as location, modification date, and size - which one
1813	   would not omit from an externalized record - are omitted simply
1814	   because they are much better supplied by a computation.  A separate
1815	   local administrative procedure, not defined for ERC's in general,
1816	   would effect the promotion of stubs into complete records.

1818	   While the ERC is a general-purpose container for exchange of resource
1819	   descriptions, it does not dictate how records must be internally
1820	   stored, laid out, or assembled by data providers or recipients.
1821	   Arbitrary internal descriptive frameworks can support ERCs simply by
1822	   mapping (e.g., on demand) local records to the ERC container format
1823	   and making them available for export.  Therefore, to support ERCs
1824	   there is no need for a data provider to convert internal data to be
1825	   stored in an ERC format.  On the other hand, any provider (such as
1826	   one just getting started in the business of resource description) may
1827	   choose to store and manipulate local data natively in the ERC format.

1829	8.  Advice to Web Clients

1831	   This section offers some advice to web client software developers.
1832	   It is hard to write about because it tries to anticipate a series of
1833	   events that might lead to native web browser support for ARKs.

1835	   ARKs are envisaged to appear wherever durable object references are
1836	   planned.  Library cataloging records, literature citations, and
1837	   bibliographies are important examples.  In many of these places URLs
1838	   (Uniform Resource Locators) currently stand in, and URNs, DOIs, and
1839	   PURLs have been proposed as alternatives.

1841	   The strings representing ARKs are also envisaged to appear in some of
1842	   the places where URLs currently appear:  in hypertext links (where
1843	   they are not normally shown to users) and in rendered text (displayed
1844	   or printed).  Internet search engines, for example, tend to include
1845	   both actionable and manifest links when listing each item found.  A
1846	   normal HTML link for which the URL is not displayed looks like this.

1848	          <a href = "http://foo.bar.org/index.htm"> Click Here <a>

1850	   The same link with an ARK instead of a URL:

1852	          <a href = "ark:/14697/b12345x"> Click Here <a>

1854	   Web browsers would in general require a small modification to
1855	   recognize and convert this ARK, via mapping authority discovery, to
1856	   the URL form.

1858	          <a href = "http://a.b.org/ark:/14697/b12345x"> Click Here <a>

1860	   A browser that knows how to make that conversion could also
1861	   automatically detect and replace a non-working NMAH.

1863	   An NAA will typically make known the associations it creates by
1864	   publishing them in catalogs, actively advertizing them, or simply
1865	   leaving them on web sites for visitors (e.g., users, indexing
1866	   spiders) to stumble across in browsing.

1868	9.  Security Considerations

1870	   The ARK naming scheme poses no direct risk to computers and networks.
1871	   Implementors of ARK services need to be aware of security issues when
1872	   querying networks and filesystems for Name Mapping Authority
1873	   services, and the concomitant risks from spoofing and obtaining
1874	   incorrect information.  These risks are no greater for ARK mapping
1875	   authority discovery than for other kinds of service discovery.  For
1876	   example, recipients of ARKs with a specified hostport (NMAH) should
1877	   treat it like a URL and be aware that the identified ARK service may
1878	   no longer be operational.

1880	   Apart from mapping authority discovery, ARK clients and servers
1881	   subject themselves to all the risks that accompany normal operation
1882	   of the protocols underlying mapping services (e.g., HTTP, Z39.50).
1883	   As specializations of such protocols, an ARK service may limit
1884	   exposure to the usual risks.  Indeed, ARK services may enhance a kind
1885	   of security by helping users identify long-term reliable references
1886	   to information objects.

1888	10.  Authors' Addresses

1890	   John A. Kunze
1891	   California Digital Library
1892	   University of California, Office of the President
1893	   415 20th St, 4th Floor
1894	   Oakland, CA  94612-3550, USA

1896	   Fax:   +1 510-893-5212
1897	   EMail: jak@ucop.edu

1899	   R. P. C. Rodgers
1900	   US National Library of Medicine
1901	   8600 Rockville Pike, Bldg. 38A
1902	   Bethesda, MD  20894, USA

1904	   Fax:   +1 301-496-0673
1905	   EMail: rodgers@nlm.nih.gov

1907	11.  References

1909	   [ANVL]     J. Kunze, B. Kahle, et al, "A Name-Value Language", work
1910	              in progress,
1911	              http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf

1913	   [ARK]      J. Kunze, "Towards Electronic Persistence Using ARK
1914	              Identifiers", Proceedings of the 3rd ECDL Workshop on Web
1915	              Archives, August 2003, (PDF)
1916	              http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze

1918	   [DCORE]    Dublin Core Metadata Initiative, "Dublin Core Metadata
1919	              Element Set, Version 1.1:  Reference Description", July
1920	              1999, http://dublincore.org/documents/dces/.

1922	   [DERC]     J. Kunze, "Dictionary of the ERC", work in progress within
1923	              the Dublin Core Metadata Initiative's Kernel Working
1924	              Group, http://dublincore.org/groups/kernel/

1926	   [DNS]      P.V. Mockapetris, "Domain Names - Concepts and
1927	              Facilities", RFC 1034, November 1987.

1929	   [DOI]      International DOI Foundation, "The Digital Object
1930	              Identifier (DOI) System", February 2001,
1931	              http://dx.doi.org/10.1000/203.

1933	   [ERC]      J. Kunze, "A Metadata Kernel for Electronic Permanence",
1934	              Journal of Digital Information, Vol 2, Issue 2, January
1935	              2002, ISSN 1368-7506, (PDF)
1936	              http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/

1938	   [Handle]   L. Lannom, "Handle System Overview", ICSTI Forum, No. 30,
1939	              April 1999, http://www.icsti.org/forum/30/#lannom

1941	   [HTTP]     R. Fielding, et al, "Hypertext Transfer Protocol --
1942	              HTTP/1.1", RFC 2616, June 1999.

1944	   [MD5]      R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
1945	              April 1992.

1947	   [NAPTR]    M. Mealling, Daniel, R., "The Naming Authority Pointer
1948	              (NAPTR) DNS Resource Record", RFC 2915, September 2000.

1950	   [NLMPerm]  M. Byrnes, "Defining NLM's Commitment to the Permanence of
1951	              Electronic Information", ARL 212:8-9, October 2000,
1952	              http://www.arl.org/newsltr/212/nlm.html

1954	   [NOID]     J. Kunze, "Nice Opaque Identifiers", February 2005,
1955	              http://www.cdlib.org/inside/diglib/ark/noid.pdf

1957	   [PURL]     K. Shafer, et al, "Introduction to Persistent Uniform
1958	              Resource Locators", 1996,
1959	              http://purl.oclc.org/OCLC/PURL/INET96

1961	   [RFC822]   D. Crocker, "Standard for the format of ARPA Internet text
1962	              messages", RFC 822, August 1982.

1964	   [TELNET]   J. Postel, J.K. Reynolds, "Telnet Protocol Specification",
1965	              RFC 854, May 1983.

1967	   [TEMPER]   J. Kunze, "Temporal Enumerated Ranges", work in progress,
1968	              http://www.cdlib.org/inside/diglib/ark/temperspec.pdf

1970	   [THUMP]    J. Kunze, "The HTTP URL Mapping Protocol", work in
1971	              progress.

1973	   [URI]      T. Berners-Lee, et al, "Uniform Resource Identifiers
1974	              (URI): Generic Syntax", RFC 2396, August 1998.

1976	   [URNBIB]   C. Lynch, et al, "Using Existing Bibliographic Identifiers
1977	              as Uniform Resource Names", RFC 2288, February 1998.

1979	   [URNSYN]   R. Moats, "URN Syntax", RFC 2141, May 1997.

1981	   [URNNID]   L. Daigle, et al, "URN Namespace Definition Mechanisms",
1982	              RFC 2611, June 1999.

1984	12.  Appendix:  ARK Implementations

1986	   Currently, the primary implementation activity is at the California
1987	   Digital Library (CDL),

1989	         http://ark.cdlib.org/

1991	   housed at the University of California Office of the President, where
1992	   over 200,000 ARKs have been assigned to objects that the CDL owns or
1993	   controls.  Some experimentation in ARKs is taking place at JSTOR, the
1994	   Digital Curation Centre, WIPO and at the University of California's
1995	   San Diego, San Francisco, and Berkeley campuses.

1997	   The US National Library of Medicine (NLM) also has an experimental,
1998	   prototype ARK service under development.  It is being made available
1999	   for purposes of demonstrating various aspects of the ARK system, but
2000	   is subject to temporary or permanent withdrawal (without notice)
2001	   depending upon the circumstances of the small research group
2002	   responsible for making it available.  It is described at:

2004	         http://ark.nlm.nih.gov/

2006	   Comments and feedback may be addressed to rodgers@nlm.nih.gov.

2008	13.  Appendix:  Current ARK Name Authority Table

2010	   This appendix contains a copy of the Name Authority Table (a file) at
2011	   the time of writing.  It may be loaded into a local filesystem (e.g.,
2012	   /etc/natab) for use in mapping NAAs (Name Assigning Authorities) to
2013	   NMAHs (Name Mapping Authority Hostports).  It contains Perl code that
2014	   can be copied into a standalone script that processes the table (as a
2015	   file).  Because this is still a proposed file, none of the values in
2016	   it are real.

2018	     #
2019	     # Name Assigning Authority / Name Mapping Authority Lookup Table
2020	     #       Last change:   2004 12 14
2021	     #       Reload from:   http://ark.nlm.nih.gov/etc/natab
2022	     #       Mirrored at:   http://www.cdlib.org/inside/diglib/ark/natab
2023	     #       To register:   mailto:ark@cdlib.org?Subject=naareg
2024	     #       Process with:  Perl script at end of this file (optional)
2025	     #
2026	     # Each NAA appears at the beginning of a line with the NAA Number
2027	     # first, a colon, and an ARK or URL to a statement of naming policy
2028	     # (see http://ark.cdlib.org for an example).
2029	     # All the NMA hostports that service an NAA are listed, one per
2030	     # line, indented, after the corresponding NAA line.
2031	     #
2032	     #       National Library of Medicine
2033	     12025:  http://www.nlm.nih.gov/xxx/naapolicy.html
2034	             ark.nlm.nih.gov USNLM
2035	             foobar.zaf.org UCSF
2036	             sneezy.dopey.com BIREME
2037	     #
2038	     #       Library of Congress
2039	     12026:  http://www.loc.gov/xxx/naapolicy.html
2040	             foobar.zaf.org USLC
2041	             sneezy.dopey.com USLC
2042	     #
2043	     #       National Agriculture Library
2044	     12027:  http://www.nal.gov/xxx/naapolicy.html
2045	             foobar.zaf.gov:80 USNAL
2046	     #
2047	     #       California Digital Library
2048	     13030:  http://www.cdlib.org/inside/diglib/ark/
2049	             ark.cdlib.org CDL
2050	     #
2051	     #       World Intellectual Property Organization
2052	     13038:  http://www.wipo.int/xxx/naapolicy.html
2053	             www.wipo.int WIPO
2054	     #
2055	     #       University of California San Diego
2056	     20775:  http://library.ucsd.edu/xxx/naapolicy.html
2057	             ucsd.edu UCSD
2058	     #
2059	     #       University of California San Francisco
2060	     29114:  http://library.ucsf.edu/xxx/naapolicy.html
2061	             ucsf.edu UCSF
2062	     #
2063	     #       University of California Berkeley
2064	     28722:  http://library.berkeley.edu/xxx/naapolicy.html
2065	             berkeley.edu UCB
2066	     #
2067	     #       Rutgers University Libraries
2068	     15230:  http://rci.rutgers.edu/xxx/naapolicy.html
2069	             rutgers.edu RUL
2070	     #
2071	     #       Internet Archive
2072	     13960:  http://www.archive.org/xxx/naapolicy.html
2073	             archive.org IA
2074	     #
2075	     #       Digital Curation Centre
2076	     64269:  http://www.dcc.ac.uk/xxx/naapolicy.html
2077	             dcc.ac.uk DCC
2078	     #
2079	     #       New York University Libraries
2080	     62624:  http://library.nyu.edu/xxx/naapolicy.html
2081	             nyu.edu NYUL
2082	     #
2083	     #       University of North Texas Libraries
2084	     67531:  http://www.library.unt.edu/xxx/naapolicy.html
2085	             unt.edu UNTL
2086	     #
2087	     #       Ithaka Electronic-Archiving Initiative
2088	     27927:  http://www.ithaka.org/xxx/naapolicy.html
2089	             ithaka.org ITHAKA
2090	     #
2091	     #--- end of data ---
2092	     # The following Perl script takes an NAA as argument and outputs
2093	     # the NMAs in this file listed under any matching NAA.
2094	     #
2095	     # my $naa = shift;
2096	     # while (<>) {
2097	     #       next if (! /^$naa:/);
2098	     #       while (<>) {
2099	     #               last if (! /^[#\s]./);
2100	     #               print "$1\n" if (/^\s+(\S+)/);
2101	     #       }
2102	     # }
2103	     #
2104	     # Create a g/t/nroff-safe version of this table with the UNIX command,
2105	     #
2106	     #       expand natab | sed 's/\\/\\\e/g' > natab.roff
2107	     #
2108	     # end of file

2110	14.  Copyright Notice

2112	   Copyright (C) The Internet Society (2005).  This document is subject
2113	   to the rights, licenses and restrictions contained in BCP 78, and
2114	   except as set forth therein, the authors retain all their rights.

2116	   This document and the information contained herein are provided on an
2117	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2118	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
2119	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
2120	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
2121	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2122	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2124	Expires 19 August 2005
2125	                           Table of Contents

2127	Status of this Document  . . . . . . . . . . . . . . . . . . . . . .   1
2128	Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1
2129	1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .   3
2130	1.1.  Three Reasons to Use ARKs  . . . . . . . . . . . . . . . . . .   4
2131	1.2.  Organizing Support for ARKs  . . . . . . . . . . . . . . . . .   5
2132	1.3.  Definition of Identifier . . . . . . . . . . . . . . . . . . .   6
2133	2.  ARK Anatomy  . . . . . . . . . . . . . . . . . . . . . . . . . .   7
2134	2.1.  The Name Mapping Authority Hostport (NMAH) . . . . . . . . . .   7
2135	2.2.  The ARK Label Part - ark:  . . . . . . . . . . . . . . . . . .   9
2136	2.3.  The Name Assigning Authority Number (NAAN) . . . . . . . . . .   9
2137	2.4.  The Name Part  . . . . . . . . . . . . . . . . . . . . . . . .  10
2138	2.5.  The Qualifier Part . . . . . . . . . . . . . . . . . . . . . .  10
2139	2.5.1.  ARKs that Reveal Object Hierarchy  . . . . . . . . . . . . .  11
2140	2.5.2.  ARKs that Reveal Object Variants . . . . . . . . . . . . . .  12
2141	2.6.  Character Repertoires  . . . . . . . . . . . . . . . . . . . .  14
2142	2.7.  Normalization and Lexical Equivalence  . . . . . . . . . . . .  14
2143	2.8.  Naming Considerations  . . . . . . . . . . . . . . . . . . . .  15
2144	3.  Assigners of ARKs  . . . . . . . . . . . . . . . . . . . . . . .  17
2145	4.  Finding a Name Mapping Authority . . . . . . . . . . . . . . . .  17
2146	4.1.  Looking Up NMAHs in a Globally Accessible File . . . . . . . .  19
2147	4.2.  Looking up NMAHs Distributed via DNS . . . . . . . . . . . . .  19
2148	5.  Generic ARK Service Definition . . . . . . . . . . . . . . . . .  22
2149	5.1.  Generic ARK Access Service (access, location)  . . . . . . . .  22
2150	5.2.  Generic Policy Service (permanence, naming, etc.)  . . . . . .  22
2151	5.3.  Generic Description Service  . . . . . . . . . . . . . . . . .  24
2152	6.  Overview of the Tiny HTTP URL Mapping Protocol (THUMP) . . . . .  24
2153	7.  Overview of Electronic Resource Citations (ERCs) . . . . . . . .  27
2154	7.1.  ERC Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  29
2155	7.2.  ERC Stories  . . . . . . . . . . . . . . . . . . . . . . . . .  30
2156	7.3.  The ERC Anchoring Story  . . . . . . . . . . . . . . . . . . .  31
2157	7.4.  ERC Elements . . . . . . . . . . . . . . . . . . . . . . . . .  32
2158	7.5.  ERC Element Values . . . . . . . . . . . . . . . . . . . . . .  34
2159	7.6.  ERC Element Encoding and Dates . . . . . . . . . . . . . . . .  36
2160	7.7.  ERC Stub Records and Internal Support  . . . . . . . . . . . .  37
2161	8.  Advice to Web Clients  . . . . . . . . . . . . . . . . . . . . .  38
2162	9.  Security Considerations  . . . . . . . . . . . . . . . . . . . .  39
2163	10.  Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . .  39
2164	11.  References  . . . . . . . . . . . . . . . . . . . . . . . . . .  40
2165	12.  Appendix:  ARK Implementations  . . . . . . . . . . . . . . . .  41
2166	13.  Appendix:  Current ARK Name Authority Table . . . . . . . . . .  42
2167	14.  Copyright Notice  . . . . . . . . . . . . . . . . . . . . . . .  44