idnits 2.17.1 

draft-kunze-ark-22.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([Qualifier]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  == There are 2 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1776 has weird spacing: '... regexp  repla...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (June 22, 2019) is 1763 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'Qualifier' is mentioned on line 513, but not defined

  ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141)

  ** Obsolete normative reference: RFC 2611 (Obsoleted by RFC 3406)

  ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
     RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322)

  ** Obsolete normative reference: RFC 2915 (Obsoleted by RFC 3401, RFC 3402,
     RFC 3403, RFC 3404)


     Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           J. Kunze
3	Internet-Draft                                California Digital Library
4	Intended status: Informational                                 E. Bermes
5	Expires: December 24, 2019              Bibliotheque nationale de France
6	                                                           June 22, 2019

8	                       The ARK Identifier Scheme
9	                           draft-kunze-ark-22

11	Abstract

13	   The ARK (Archival Resource Key) naming scheme is designed to
14	   facilitate the high-quality and persistent identification of
15	   information objects.  A founding principle of the ARK is that
16	   persistence is purely a matter of service and is neither inherent in
17	   an object nor conferred on it by a particular naming syntax.  The
18	   best that an identifier can do is to lead users to the services that
19	   support robust reference.  The term ARK itself refers both to the
20	   scheme and to any single identifier that conforms to it.  An ARK has
21	   five components:

23	   [http://NMAH/]ark:[/]NAAN/Name[Qualifier]

25	   an optional and mutable Name Mapping Authority Hostport (usually a
26	   hostname), the "ark:" label, the Name Assigning Authority Number
27	   (NAAN), the assigned Name, and an optional and possibly mutable
28	   Qualifier supported by the NMA.  The NAAN and Name together form the
29	   immutable persistent identifier for the object independent of the URL
30	   hostname.  An ARK is a special kind of URL that connects users to
31	   three things: the named object, its metadata, and the provider's
32	   promise about its persistence.  When entered into the location field
33	   of a Web browser, the ARK leads the user to the named object.  That
34	   same ARK, inflected by appending a single question mark (`?'),
35	   returns a brief metadata record that is both human- and machine-
36	   readable.  When the ARK is inflected by appending dual question marks
37	   (`??'), the returned metadata contains a commitment statement from
38	   the current provider.  Tools exist for minting, binding, and
39	   resolving ARKs.

41	Status of This Memo

43	   This Internet-Draft is submitted in full conformance with the
44	   provisions of BCP 78 and BCP 79.

46	   Internet-Drafts are working documents of the Internet Engineering
47	   Task Force (IETF).  Note that other groups may also distribute
48	   working documents as Internet-Drafts.  The list of current Internet-
49	   Drafts is at https://datatracker.ietf.org/drafts/current/.

51	   Internet-Drafts are draft documents valid for a maximum of six months
52	   and may be updated, replaced, or obsoleted by other documents at any
53	   time.  It is inappropriate to use Internet-Drafts as reference
54	   material or to cite them other than as "work in progress."

56	   This Internet-Draft will expire on December 24, 2019.

58	Copyright Notice

60	   Copyright (c) 2019 IETF Trust and the persons identified as the
61	   document authors.  All rights reserved.

63	   This document is subject to BCP 78 and the IETF Trust's Legal
64	   Provisions Relating to IETF Documents
65	   (https://trustee.ietf.org/license-info) in effect on the date of
66	   publication of this document.  Please review these documents
67	   carefully, as they describe your rights and restrictions with respect
68	   to this document.

70	Table of Contents

72	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
73	     1.1.  Reasons to Use ARKs . . . . . . . . . . . . . . . . . . .   4
74	     1.2.  Three Requirements of ARKs  . . . . . . . . . . . . . . .   5
75	     1.3.  Organizing Support for ARKs:  Our Stuff vs. Their Stuff .   6
76	     1.4.  Definition of Identifier  . . . . . . . . . . . . . . . .   7
77	   2.  ARK Anatomy . . . . . . . . . . . . . . . . . . . . . . . . .   8
78	     2.1.  The Name Mapping Authority Hostport (NMAH)  . . . . . . .   9
79	     2.2.  The ARK Label Part (ark:) . . . . . . . . . . . . . . . .  11
80	     2.3.  The Name Assigning Authority Number (NAAN)  . . . . . . .  11
81	     2.4.  The Name Part . . . . . . . . . . . . . . . . . . . . . .  12
82	     2.5.  The Qualifier Part  . . . . . . . . . . . . . . . . . . .  13
83	       2.5.1.  ARKs that Reveal Object Hierarchy . . . . . . . . . .  14
84	       2.5.2.  ARKs that Reveal Object Variants  . . . . . . . . . .  15
85	     2.6.  Character Repertoires . . . . . . . . . . . . . . . . . .  16
86	     2.7.  Normalization and Lexical Equivalence . . . . . . . . . .  17
87	   3.  Naming Considerations . . . . . . . . . . . . . . . . . . . .  19
88	     3.1.  ARKS Embedded in Language . . . . . . . . . . . . . . . .  19
89	     3.2.  Objects Should Wear Their Identifiers . . . . . . . . . .  19
90	     3.3.  Names are Political, not Technological  . . . . . . . . .  20
91	     3.4.  Choosing a Hostname or NMA  . . . . . . . . . . . . . . .  20
92	     3.5.  Assigners of ARKs . . . . . . . . . . . . . . . . . . . .  22
93	     3.6.  NAAN Namespace Management . . . . . . . . . . . . . . . .  22
94	     3.7.  Sub-Object Naming . . . . . . . . . . . . . . . . . . . .  24
95	   4.  Finding a Name Mapping Authority  . . . . . . . . . . . . . .  24
96	     4.1.  Looking Up NMAHs in a Globally Accessible File  . . . . .  25
97	   5.  Generic ARK Service Definition  . . . . . . . . . . . . . . .  27
98	     5.1.  Generic ARK Access Service (access, location) . . . . . .  27
99	       5.1.1.  Generic Policy Service (permanence, naming, etc.) . .  27
100	       5.1.2.  Generic Description Service . . . . . . . . . . . . .  29
101	     5.2.  Overview of The HTTP URL Mapping Protocol (THUMP) . . . .  29
102	     5.3.  The Electronic Resource Citation (ERC)  . . . . . . . . .  32
103	     5.4.  Advice to Web Clients . . . . . . . . . . . . . . . . . .  34
104	     5.5.  Security Considerations . . . . . . . . . . . . . . . . .  35
105	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  35
106	   Appendix A.  ARK Maintenance Agency: arks.org . . . . . . . . . .  37
107	   Appendix B.  Looking up NMAHs Distributed via DNS . . . . . . . .  38
108	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  40

110	1.  Introduction

112	   [ Note about this transitional draft.  The ARKsInTheOpen.org
113	   Technical Working Group (https://wiki.duraspace.org/display/ARKs/
114	   Technical+Working+Group) is in the process of revising the ARK spec
115	   via a series of Internet-Drafts.  No breaking changes from the 2008
116	   spec are envisaged.  Some minor changes are being deferred to later
117	   in order to make it easier to review more important changes; some of
118	   those small changes would result in "noisy diffs" since they are
119	   global in scope, for example, converting all instances of http:// and
120	   NMAH to https:// and NMA, respectively.  ]

122	   This document describes a scheme for the high-quality naming of
123	   information resources.  The scheme, called the Archival Resource Key
124	   (ARK), is well suited to long-term access and identification of any
125	   information resources that accommodate reasonably regular electronic
126	   description.  This includes digital documents, databases, software,
127	   and websites, as well as physical objects (books, bones, statues,
128	   etc.) and intangible objects (chemicals, diseases, vocabulary terms,
129	   performances).  Hereafter the term "object" refers to an information
130	   resource.  The term ARK itself refers both to the scheme and to any
131	   single identifier that conforms to it.  A reasonably concise and
132	   accessible overview and rationale for the scheme is available at
133	   [ARK].

135	   Schemes for persistent identification of network-accessible objects
136	   are not new.  In the early 1990's, the design of the Uniform Resource
137	   Name [RFC2141] responded to the observed failure rate of URLs by
138	   articulating an indirect, non-hostname-based naming scheme and the
139	   need for responsible name management.  Meanwhile, promoters of the
140	   Digital Object Identifier [DOI] succeeded in building a community of
141	   providers around a mature software system [Handle] that supports name
142	   management.  The Persistent Uniform Resource Locator [PURL] was
143	   another scheme that had the advantage of working with unmodified web
144	   browsers.  ARKs represent an approach that attempts to build on the
145	   strengths and to avoid the weaknesses of these schemes.

147	   A founding principle of the ARK is that persistence is purely a
148	   matter of service.  Persistence is neither inherent in an object nor
149	   conferred on it by a particular naming syntax.  Nor is the technique
150	   of name indirection -- upon which URNs, Handles, DOIs, and PURLs are
151	   founded -- of central importance.  Name indirection is an ancient and
152	   well-understood practice; new mechanisms for it keep appearing and
153	   distracting practitioner attention, with the Domain Name System (DNS)
154	   [RFC1034] being a particularly dazzling and elegant example.  What is
155	   often forgotten is that maintenance of an indirection table is an
156	   unavoidable cost to the organization providing persistence, and that
157	   cost is equivalent across naming schemes.  That indirection has
158	   always been a native part of the web while being so lightly utilized
159	   for the persistence of web-based objects indicates how unsuited most
160	   organizations will probably be to the task of table maintenance and
161	   to the much more fundamental challenge of keeping the objects
162	   themselves viable.

164	   Persistence is achieved through a provider's successful stewardship
165	   of objects and their identifiers.  The highest level of persistence
166	   will be reinforced by a provider's robust contingency, redundancy,
167	   and succession strategies.  It is further safeguarded to the extent
168	   that a provider's mission is shielded from funding and political
169	   instabilities.  These are by far the major challenges confronting
170	   persistence providers, and no identifier scheme has any direct impact
171	   on them.  In fact, some schemes may actually be liabilities for
172	   persistence because they create short- and long-term dependencies for
173	   every object access on complex, special-purpose infrastructures,
174	   parts of which are proprietary and all of which increase the carry-
175	   forward burden for the preservation community.  It is for this reason
176	   that the ARK scheme relies only on educated name assignment and light
177	   use of general-purpose infrastructures that are maintained mostly by
178	   the internet community at large (the DNS, web servers, and web
179	   browsers).

181	1.1.  Reasons to Use ARKs

183	   If no persistent identifier scheme contributes directly to
184	   persistence, why not just use URLs?  A particular URL may be as
185	   durable an identifier as it is possible to have, but nothing
186	   distinguishes it from an ordinary URL to the recipient who is
187	   wondering if it is suitable for long-term reference.  An ARK embedded
188	   in a URL provides some of the necessary conditions for credible
189	   persistence, inviting access to not one, but to three things: to the
190	   object, to its metadata, and to a nuanced statement of commitment
191	   from the provider in question (the NMA, described below) regarding
192	   the object.  Existence of the two extra services can be probed
193	   automatically by appending `?' and `??' to the ARK.

195	   The form of the ARK also supports the natural separation of naming
196	   authorities into the original name assigning authority and the
197	   diverse multiple name mapping (or servicing) authorities that in
198	   succession and in parallel will take over custodial responsibilities
199	   from the original assigner (assuming the assigner ever held that
200	   responsibility) for the large majority of a long-term object's
201	   archival lifetime.  The name mapping authority, indicated by the
202	   hostname part of the URL that contains the ARK, serves to launch the
203	   ARK into cyberspace.  Should it ever fail (and there is no reason why
204	   a well-chosen hostname for a 100-year-old cultural memory institution
205	   shouldn't last as long as the DNS), that host name is considered
206	   disposeable and replaceable.  Again, the form of the ARK helps
207	   because it defines exactly how to recover the core immutable object
208	   identity, and simple algorithms (one based on the URN model) or even
209	   by-hand internet query can be used for for locating another mapping
210	   authority.

212	   There are tools to assist in generating ARKs and other identifiers,
213	   such as [NOID] and "uuidgen", both of which rely for uniqueness on
214	   human-maintained registries.  This document also contains some
215	   guidelines and considerations for managing namespaces and choosing
216	   hostnames with persistence in mind.

218	1.2.  Three Requirements of ARKs

220	   The first requirement of an ARK is to give users a link from an
221	   object to a promise of stewardship for it.  That promise is a multi-
222	   faceted covenant that binds the word of an identified service
223	   provider to a specific set of responsibilities.  It is critical for
224	   the promise to come from a current provider and almost irrelevant,
225	   over a long period of time, what the original assigner's intentions
226	   were.  No one can tell if successful stewardship will take place
227	   because no one can predict the future.  Reasonable conjecture,
228	   however, may be based on past performance.  There must be a way to
229	   tie a promise of persistence to a provider's demonstrated or
230	   perceived ability -- its reputation -- in that arena.  Provider
231	   reputations would then rise and fall as promises are observed
232	   variously to be kept and broken.  This is perhaps the best way we
233	   have for gauging the strength of any persistence promise.

235	   The second requirement of an ARK is to give users a link from an
236	   object to a description of it.  The problem with a naked identifier
237	   is that without a description real identification is incomplete.
238	   Identifiers common today are relatively opaque, though some contain
239	   ad hoc clues reflecting assertions that were briefly true, such as
240	   where in a filesystem hierarchy an object lived during a short stay.
241	   Possession of both an identifier and an object is some improvement,
242	   but positive identification may still be uncertain since the object
243	   itself might not include a matching identifier or might not carry
244	   evidence obvious enough to reveal its identity without significant
245	   research.  In either case, what is called for is a record bearing
246	   witness to the identifier's association with the object, as supported
247	   by a recorded set of object characteristics.  This descriptive record
248	   is partly an identification "receipt" with which users and archivists
249	   can verify an object's identity after brief inspection and a
250	   plausible match with recorded characteristics such as title and size.

252	   The final requirement of an ARK is to give users a link to the object
253	   itself (or to a copy) if at all possible.  Persistent access is the
254	   central duty of an ARK.  Persistent identification plays a vital
255	   supporting role but, strictly speaking, it can be construed as no
256	   more than a record attesting to the original assignment of a never-
257	   reassigned identifier.  Object access may not be feasible for various
258	   reasons, such as a transient service outage, a catastrophic loss, a
259	   licensing agreement that keeps an archive "dark" for a period of
260	   years, or when an object's own lack of tangible existence confuses
261	   normal concepts of access (e.g., a vocabulary term might be
262	   "accessed" through its definition).  In such cases the ARK's
263	   identification role assumes a much higher profile.  But attempts to
264	   simplify the persistence problem by decoupling access from
265	   identification and concentrating exclusively on the latter are of
266	   questionable utility.  A perfect system for assigning forever unique
267	   identifiers might be created, but if it did so without reducing
268	   access failure rates, no one would be interested.  The central issue
269	   -- which may be summed up as the "HTTP 404 Not Found" problem --
270	   would not have been addressed.

272	1.3.  Organizing Support for ARKs: Our Stuff vs. Their Stuff

274	   An organization and the user community it serves can often be seen to
275	   struggle with two different areas of persistent identification: the
276	   Our Stuff problem and the Their Stuff problem.  In the Our Stuff
277	   problem, we in the organization want our own objects to acquire
278	   persistent names.  Since we possess or control these objects, our
279	   organization tackles the Our Stuff problem directly.  Whether or not
280	   the objects are named by ARKs, our organization is the responsible
281	   party, so it can plan for, maintain, and make commitments about the
282	   objects.

284	   In the Their Stuff problem, we in the organization want others'
285	   objects to acquire persistent names.  These are objects that we do
286	   not own or control, but some of which are critically important to us.
287	   But because they are beyond our influence as far as support is
288	   concerned, creating and maintaining persistent identifiers for Their
289	   Stuff is not especially purposeful or feasible for us to engage in.
290	   There is little that we can do about someone else's stuff except
291	   encourage their uptake or adoption of persistence services.

293	   Co-location of persistent access and identification services is
294	   natural.  Any organization that undertakes ongoing support of true
295	   persistent identification (which includes description) is well-served
296	   if it controls, owns, or otherwise has clear internal access to the
297	   identified objects, and this gives it an advantage if it wishes also
298	   to support persistent access to outsiders.  Conversely, persistent
299	   access to outsiders requires orderly internal collection management
300	   procedures that include monitoring, acquisition, verification, and
301	   change control over objects, which in turn requires object
302	   identifiers persistent enough to support auditable record keeping
303	   practices.

305	   Although, organizing ARK services under one roof thus tends to make
306	   sense, object hosting can successfully be separated from name
307	   mapping.  An example is when a name mapping authority centrally
308	   provides uniform resolution services via a protocol gateway on behalf
309	   of organizations that host objects behind a variety of access
310	   protocols.  It is also reasonable to build value-added description
311	   services that rely on the underlying services of a set of mapping
312	   authorities.

314	   Supporting ARKs is not for every organization.  By requiring
315	   specific, revealed commitments to preservation, to object access, and
316	   to description, the bar for providing ARK services is higher than for
317	   some other identifier schemes.  On the other hand, it would be hard
318	   to grant credence to a persistence promise from an organization that
319	   could not muster the minimum ARK services.  Not that there isn't a
320	   business model for an ARK-like, description-only service built on top
321	   of another organization's full complement of ARK services.  For
322	   example, there might be competition at the description level for
323	   abstracting and indexing a body of scientific literature archived in
324	   a combination of open and fee-based repositories.  The description-
325	   only service would have no direct commitment to the objects, but
326	   would act as an intermediary, forwarding commitment statements from
327	   object hosting services to requestors.

329	1.4.  Definition of Identifier

331	   An identifier is not a string of character data -- an identifier is
332	   an association between a string of data and an object.  This
333	   abstraction is necessary because without it a string is just data.
334	   It's nonsense to talk about a string's breaking, or about its being
335	   strong, maintained, and authentic.  But as a representative of an
336	   association, a string can do, metaphorically, the things that we
337	   expect of it.

339	   Without regard to whether an object is physical, digital, or
340	   conceptual, to identify it is to claim an association between it and
341	   a representative string, such as "Jane" or "ISBN 0596000278".  What
342	   gives a claim credibility is a set of verifiable assertions, or
343	   metadata, about the object, such as age, height, title, or number of
344	   pages.  In other words, the association is made manifest by a record
345	   (e.g., a cataloging or other metadata record) that vouches for it.

347	   In the complete absence of any testimony (metadata) regarding an
348	   association, a would-be identifier string is a meaningless sequence
349	   of characters.  To keep an externally visible but otherwise internal
350	   string from being perceived as an identifier by outsiders, for
351	   example, it suffices for an organization not to disclose the nature
352	   of its association.  For our immediate purpose, actual existence of
353	   an association record is more important than its authenticity or
354	   verifiability, which are outside the scope of this specification.

356	   It is a gift to the identification process if an object carries its
357	   own name as an inseparable part of itself, such as an identifier
358	   imprinted on the first page of a document or embedded in a data
359	   structure element of a digital document header.  In cases where the
360	   object is large, unwieldy, or unavailable (such as when licensing
361	   restrictions are in effect), a metadata record that includes the
362	   identifier string will usually suffice.  That record becomes a
363	   conveniently manipulable object surrogate, acting as both an
364	   association "receipt" and "declaration".

366	   Note that our definition of identifier extends the one in use for
367	   Uniform Resource Identifiers [RFC3986].  The present document still
368	   sometimes (ab)uses the terms "ARK" and "identifier" as shorthand for
369	   the string part of an identifier, but the context should make the
370	   meaning clear.

372	2.  ARK Anatomy

374	   An ARK is represented by a sequence of characters (a string) that
375	   contains the label, "ark:", optionally preceded by the beginning part
376	   of a URL.  Here is a diagrammed example.

378	   ARK ANATOMY                 Core Immutable Identity
379	   ===========             _______________|_______________
380	                          /                               \
381	        Resolver Service   Base Object Name    Qualifier
382	        _________|_______  ________|_______  ______|______
383	       /                 \/                \/             \
384	       http://example.org/ark:12025/654xz321/s3/f8.05v.tiff
385	              \_________/ \__/\___/ \______/\____/\_______/
386	                  |        |    |      |      |       |
387	                  |      Label  |      |  Sub-parts  Variants
388	                  |             |      |
389	    Name Mapping Authority      |   Assigned Name
390	       Hostport (NMAH)          |
391	                        Name Assigning Authority Number (NAAN)

393	   The ARK syntax can be summarized,

395	                  [http://NMAH/]ark:[/]NAAN/Name[Qualifier]

397	   where the NMAH, '/', and Qualifier parts are in brackets to indicate
398	   that they are optional.  The Base Object Name is the substring
399	   comprising the "ark:" label, the NAAN and the assigned Name.  The
400	   Resolver Service is replaceable and makes the ARK actionable for a
401	   period of time.  Without the Resolver Service part, what remains is
402	   the Core Immutable Identity (the "persistible") part of the ARK.

404	2.1.  The Name Mapping Authority Hostport (NMAH)

406	   Before the "ark:" label may appear an optional Name Mapping Authority
407	   Hostport (NMAH) that is a temporary address where ARK service
408	   requests may be sent.  Preceded by a URI-type protocol designation
409	   such as "https://", it specifies a Resolver Service.  The NMAH itself
410	   is an Internet hostname or hostport combination having the same
411	   format and semantics as the hostport part of a URL.  The most
412	   important thing about the NMAH is that it is "identity inert" from
413	   the point of view of object identification.  In other words, ARKs
414	   that differ only in the optional NMAH part identify the same object.
415	   Thus, for example, the following three ARKs are synonyms for just one
416	   information object:

418	                    http://loc.gov/ark:12025/654xz321
419	                http://rutgers.edu/ark:12025/654xz321
420	                                   ark:12025/654xz321

422	   Strictly speaking, in the realm of digital objects, these ARKs may
423	   lead over time to somewhat different or diverging instances of the
424	   originally named object.  In an ideal world, divergence of persistent
425	   objects is not desirable, but it is widely believed that digital
426	   preservation efforts will inevitably lead to alterations in some
427	   original objects (e.g, a format migration in order to preserve the
428	   ability to display a document).  If any of those objects are held
429	   redundantly in more than one organization (a common preservation
430	   strategy), chances are small that all holding organizations will
431	   perform the same precise transformations and all maintain the same
432	   object metadata.  More significant divergence would be expected when
433	   the holding organizations serve different audiences or compete with
434	   each other.

436	   The NMAH part makes an ARK into an actionable URL.  As with many
437	   internet parameters, it is helpful to approach the NMAH being liberal
438	   in what you accept and conservative in what you propose.  From the
439	   recipient's point of view, the NMAH part should be treated as
440	   temporary, disposable, and replaceable.  From the NMA's point of
441	   view, it should be chosen with the greatest concern for longevity.  A
442	   carefully chosen NMAH should be at least as permanent as the
443	   providing organization's own hostname.  In the case of a national or
444	   university library, for example, there is no reason why the NMAH
445	   should not be considerably more permanent than soft-funded proxy
446	   hostnames such as hdl.handle.net, dx.doi.org, and purl.org.  In
447	   general and over time, however, it is not unexpected for an NMAH
448	   eventually to stop working and require replacement with the NMAH of a
449	   currently active service provider.

451	   This replacement relies on a mapping authority "resolver" discovery
452	   process, of which two alternate methods are outlined in a later
453	   section.  The ARK, URN, Handle, and DOI schemes all use a resolver
454	   discovery model that sooner or later requires matching the original
455	   assigning authority with a current provider servicing that
456	   authority's named objects; once found, the resolver at that provider
457	   performs what amounts to a redirect to a place where the object is
458	   currently held.  All the schemes rely on the ongoing functionality of
459	   currently mainstream technologies such as the Domain Name System
460	   [RFC1034] and web browsers.  The Handle and DOI schemes in addition
461	   require that the Handle protocol layer and global server grid be
462	   available at all times.

464	   The practice of prepending "http://" and an NMAH to an ARK is a way
465	   of creating an actionable identifier by a method that is itself
466	   temporary.  Assuming that infrastructure supporting [RFC2616]
467	   information retrieval will no longer be available one day, ARKs will
468	   then have to be converted into new kinds of actionable identifiers.
469	   By that time, if ARKs see widespread use, web browsers would
470	   presumably evolve to perform this (currently simple) transformation
471	   automatically.

473	2.2.  The ARK Label Part (ark:)

475	   The label part distinguishes an ARK from an ordinary identifier.
476	   There is a new form of the label, "ark:", and an old form, "ark:/",
477	   both of which must be recognized in perpetuity.  Implementations
478	   should generate new ARKs in the new form (without the "/") and
479	   resolvers must always treat received ARKs as equivalent if they
480	   differ only in regard to new form versus old form labels.  Thus these
481	   two ARKs are equivalent:

483	                             ark:/12025/654xz321
484	                              ark:12025/654xz321

486	   In a URL found in the wild, the label indicates that the URL stands a
487	   reasonable chance of being an ARK.  If the context warrants,
488	   verification that it actually is an ARK can be done by testing it for
489	   existence of the three ARK services.

491	   Since nothing about an identifier syntax directly affects
492	   persistence, the "ark:" label (like "urn:", "doi:", and "hdl:")
493	   cannot tell you whether the identifier is persistent or whether the
494	   object is available.  It does tell you that the original Name
495	   Assigning Authority (NAA) had some sort of hopes for it, but it
496	   doesn't tell you whether that NAA is still in existence, or whether a
497	   decade ago it ceased to have any responsibility for providing
498	   persistence, or whether it ever had any responsibility beyond naming.

500	   Only a current provider can say for certain what sort of commitment
501	   it intends, and the ARK label suggests that you can query the NMAH
502	   directly to find out exactly what kind of persistence is promised.
503	   Even if what is promised is impersistence (i.e., a short-term
504	   identifier), saying so is valuable information to the recipient.
505	   Thus an ARK is a high-functioning identifier in the sense that it
506	   provides access to the object, the metadata, and a commitment
507	   statement, even if the commitment is explicitly very weak.

509	2.3.  The Name Assigning Authority Number (NAAN)

511	   Recalling that the general form of the ARK is,

513	                  [http://NMAH/]ark:[/]NAAN/Name[Qualifier]

515	   the part of the ARK directly following the "ark:" (or older "ark:/")
516	   label is the Name Assigning Authority Number (NAAN), up to but not
517	   including the next `/' (slash) character.  This part is always
518	   required, as it identifies the organization that originally assigned
519	   the Name of the object.  It is used to discover a currently valid
520	   NMAH and to provide top-level partitioning of the space of all ARKs.

522	   An organization may request a NAAN from the ARK Maintenance Agency
523	   [ARKagency] (described in Appendix A) by filling out the form at
524	   [NAANrequest].  NAANs are opaque strings of one or more characters
525	   drawn from this set,

527	       0123456789bcdfghjkmnpqrstvwxz

529	   which consists of digits and consonants, minus the letter 'l'.
530	   Restricting NAANs to this set serves two goals.  It reduces the
531	   chances that words -- past, present, and future -- will appear in
532	   NAANs and carry unintended semantics.  It also helps usability by not
533	   mixing commonly confused characters ('0' and 'O', '1' and 'l') and by
534	   being compatible with strong transcription error detection (eg, the
535	   [NOID] check digit algorithm).  Since 2001, every assigned NAAN has
536	   consisted of exactly five digits, and no immediate change in that
537	   practice is foreseen.

539	   The NAAN designates a top-level ARK namespace.  Once registered for a
540	   namespace, a NAAN is never re-registered.  It is possible, however,
541	   for there to be a succession of organizations that manage an ARK
542	   namespace.

544	2.4.  The Name Part

546	   The part of the ARK just after the NAAN is the Name assigned by the
547	   NAA, and it is also required.  Semantic opaqueness in the Name part
548	   is strongly encouraged in order to reduce an ARK's vulnerability to
549	   era- and language-specific change.  Identifier strings containing
550	   linguistic fragments can create support difficulties down the road.
551	   No matter how appropriate or even meaningless they are today, such
552	   fragments may one day create confusion, give offense, or infringe on
553	   a trademark as the semantic environment around us and our communities
554	   evolves.

556	   Names that look more or less like numbers avoid common problems that
557	   defeat persistence and international acceptance.  The use of digits
558	   is highly recommended.  Mixing in non-vowel alphabetic characters a
559	   couple at a time is a relatively safe and easy way to achieve a
560	   denser namespace (more possible names for a given length of the name
561	   string).  Such names have a chance of aging and traveling well.
562	   Tools exists that mint, bind, and resolve opaque identifiers, with or
563	   without check characters [NOID].  More on naming considerations is
564	   given in a subsequent section.

566	2.5.  The Qualifier Part

568	   The part of the ARK following the NAA-assigned Name is an optional
569	   Qualifier.  It is a string that extends the base ARK in order to
570	   create a kind of service entry point into the object named by the
571	   NAA.  At the discretion of the providing NMA, such a service entry
572	   point permits an ARK to support access to individual hierarchical
573	   components and subcomponents of an object, and to variants (versions,
574	   languages, formats) of components.  A Qualifier may be invented by
575	   the NAA or by any NMA servicing the object.

577	   In form, the Qualifier is a ComponentPath, or a VariantPath, or a
578	   ComponentPath followed by a VariantPath.  A VariantPath is introduced
579	   and subdivided by the reserved character `.', and a ComponentPath is
580	   introduced and subdivided by the reserved character `/'.  In this
581	   example,

583	       http://example.org/ark:12025/654xz321/s3/f8.05v.tiff

585	   the string "/s3/f8" is a ComponentPath and the string ".05v.tiff" is
586	   a VariantPath.  The ARK Qualifier is a formalization of some
587	   currently mainstream URL syntax conventions.  This formalization
588	   specifically reserves meanings that permit recipients to make strong
589	   inferences about logical sub-object containment and equivalence based
590	   only on the form of the received identifiers; there is great
591	   efficiency in not having to inspect metadata records to discover such
592	   relationships.  NMAs are free not to disclose any of these
593	   relationships merely by avoiding the reserved characters above.
594	   Hierarchical components and variants are discussed further in the
595	   next two sections.

597	   The Qualifier, if present, differs from the Name in several important
598	   respects.  First, a Qualifier may have been assigned either by the
599	   NAA or later by the NMA.  The assignment of a Qualifier by an NMA
600	   effectively amounts to an act of publishing a service entry point
601	   within the conceptual object originally named by the NAA.  For our
602	   purposes, an ARK extended with a Qualifier assigned by an NMA will be
603	   called an NMA-qualified ARK.

605	   Second, a Qualifier assignment on the part of an NMA is made in
606	   fulfillment of its service obligations and may reflect changing
607	   service expectations and technology requirements.  NMA-qualified ARKs
608	   could therefore be transient, even if the base, unqualified ARK is
609	   persistent.  For example, it would be reasonable for an NMA to
610	   support access to an image object through an actionable ARK that is
611	   considered persistent even if the experience of that access changes
612	   as linking, labeling, and presentation conventions evolve and as
613	   format and security standards are updated.  For an image "thumbnail",
614	   that NMA could also support an NMA-qualified ARK that is considered
615	   impersistent because the thumbnail will be replaced with higher
616	   resolution images as network bandwidth and CPU speeds increase.  At
617	   the same time, for an originally scanned, high-resolution master, the
618	   NMA could publish an NMA-qualfied ARK that is itself considered
619	   persistent.  Of course, the NMA must be able to return its separate
620	   commitments to unqualified, NAA-assigned ARKs, to NMA-qualified ARKs,
621	   and to any NAA-qualified ARKs that it supports.

623	   A third difference between a Qualifier and a Name concerns the
624	   semantic opaqueness constraint.  When an NMA-qualified ARK is to be
625	   used as a transient service entry point into a persistent object, the
626	   priority given to semantic opaqueness observed by the NAA in the Name
627	   part may be relaxed by the NMA in the Qualifier part.  If service
628	   priorities in the Qualifier take precedence over persistence, short-
629	   term usability considerations may recommend somewhat semantically
630	   laden Qualifier strings.

632	   Finally, not only is the set of Qualifiers supported by an NMA
633	   mutable, but different NMAs may support different Qualifier sets for
634	   the same NAA-identified object.  In this regard the NMAs act
635	   independently of each other and of the NAA.

637	   The next two sections describe how ARK syntax may be used to declare,
638	   or to avoid declaring, certain kinds of relatedness among qualified
639	   ARKs.

641	2.5.1.  ARKs that Reveal Object Hierarchy

643	   An NAA or NMA may choose to reveal the presence of a hierarchical
644	   relationship between objects using the `/' (slash) character after
645	   the Name part of an ARK.  Some authorities will choose not to
646	   disclose this information, while others will go ahead and disclose so
647	   that manipulators of large sets of ARKs can infer object
648	   relationships by simple identifier inspection; for example, this
649	   makes it possible for a system to present a collapsed view of a large
650	   search result set.

652	   If the ARK contains an internal slash after the NAAN, the piece to
653	   its left indicates a containing object.  For example, publishing an
654	   ARK of the form,

656	                       ark:12025/654/xz/321

658	   is equivalent to publishing three ARKs,
659	                       ark:12025/654/xz/321
660	                       ark:12025/654/xz
661	                       ark:12025/654

663	   together with a declaration that the first object is contained in the
664	   second object, and that the second object is contained in the third.

666	   Revealing the presence of hierarchy is completely up to the assigner
667	   (NMA or NAA).  It is hard enough to commit to one object's name, let
668	   alone to three objects' names and to a specific, ongoing relatedness
669	   among them.  Thus, regardless of whether hierarchy was present
670	   initially, the assigner, by not using slashes, reveals no shared
671	   inferences about hierarchical or other inter-relatedness in the
672	   following ARKs:

674	                       ark:12025/654_xz_321
675	                       ark:12025/654_xz
676	                       ark:12025/654xz321
677	                       ark:12025/654xz
678	                       ark:12025/654

680	   Note that slashes around the ARK's NAAN (/12025/ in these examples)
681	   are not part of the ARK's Name and therefore do not indicate the
682	   existence of some sort of NAAN super object containing all objects in
683	   its namespace.  A slash must have at least one non-structural
684	   character (one that is neither a slash nor a period) on both sides in
685	   order for it to separate recognizable structural components.  So
686	   initial or final slashes may be removed, and double slashes may be
687	   converted into single slashes.

689	2.5.2.  ARKs that Reveal Object Variants

691	   An NAA or NMA may choose to reveal the possible presence of variant
692	   objects or object components using the `.' (period) character after
693	   the Name part of an ARK.  Some authorities will choose not to
694	   disclose this information, while others will go ahead and disclose so
695	   that manipulators of large sets of ARKs can infer object
696	   relationships by simple identifier inspection; for example, this
697	   makes it possible for a system to present a collapsed view of a large
698	   search result set.

700	   If the ARK contains an internal period after Name, the piece to its
701	   left is a root name and the piece to its right, and up to the end of
702	   the ARK or to the next period is a suffix.  A Name may have more than
703	   one suffix, for example,
704	                       ark:12025/654.24
705	                       ark:12025/xz4/654.24
706	                       ark:12025/654.20v.78g.f55

708	   There are two main rules.  First, if two ARKs share the same root
709	   name but have different suffixes, the corresponding objects were
710	   considered variants of each other (different formats, languages,
711	   versions, etc.) by the assigner (NMA or NAA).  Thus, the following
712	   ARKs are variants of each other:

714	                       ark:12025/654.20v.78g.f55
715	                       ark:12025/654.321xz
716	                       ark:12025/654.44

718	   Second, publishing an ARK with a suffix implies the existence of at
719	   least one variant identified by the ARK without its suffix.  The ARK
720	   otherwise permits no further assumptions about what variants might
721	   exist.  So publishing the ARK,

723	                       ark:12025/654.20v.78g.f55

725	   is equivalent to publishing the four ARKs,

727	                       ark:12025/654.20v.78g.f55
728	                       ark:12025/654.20v.78g
729	                       ark:12025/654.20v
730	                       ark:12025/654

732	   Revealing the possibility of variants is completely up to the
733	   assigner.  It is hard enough to commit to one object's name, let
734	   alone to multiple variants' names and to a specific, ongoing
735	   relatedness among them.  The assigner is the sole arbiter of what
736	   constitutes a variant within its namespace, and whether to reveal
737	   that kind of relatedness by using periods within its names.

739	   A period must have at least one non-structural character (one that is
740	   neither a slash nor a period) on both sides in order for it to
741	   separate recognizable structural components.  So initial or final
742	   periods may be removed, and adjacent periods may be converted into a
743	   single period.  Multiple suffixes should be arranged in sorted order
744	   (pure ASCII collating sequence) at the end of an ARK.

746	2.6.  Character Repertoires

748	   The Name and Qualifier parts are strings of visible ASCII characters.
749	   For received ARKs, implementations must support a minimum length of
750	   255 octets for the string composed of the Base ARK plus Qualifier.
751	   Implementations generating strings exceeding this length should
752	   understand that receiving implementations may not be able to index
753	   such ARKs properly.  Characters may be letters, digits, or any of
754	   these seven characters:

756	       =   ~   *   +   @   _   $

758	   The following characters may also be used, but their meanings are
759	   reserved:

761	       %   -   .   /

763	   The characters `/' and `.' are ignored if either appears as the last
764	   character of an ARK.  If used internally, they allow a name assigner
765	   to reveal object hierarchy and object variants as previously
766	   described.

768	   Hyphens are considered to be insignificant and are always ignored in
769	   ARKs.  A `-' (hyphen) may appear in an ARK for readability, or it may
770	   have crept in during the formatting and wrapping of text, but it must
771	   be ignored in lexical comparisons.  As in a telephone number, hyphens
772	   have no meaning in an ARK.  It is always safe for an NMA that
773	   receives an ARK to remove any hyphens found in it.  As a result, like
774	   the NMAH, hyphens are "identity inert" in comparing ARKs for
775	   equivalence.  For example, the following ARKs are equivalent for
776	   purposes of comparison and ARK service access:

778	                               ark:12025/65-4-xz-321
779	       http://sneezy.dopey.com/ark:12025/654--xz32-1
780	                               ark:12025/654xz321

782	   The `%' character is reserved for %-encoding all other octets that
783	   would appear in the ARK string, in the same manner as for URIs
784	   [RFC3986].  A %-encoded octet consists of a `%' followed by two hex
785	   digits; for example, "%7d" stands in for `}'.  Lower case hex digits
786	   are preferred to reduce the chances of false acronym recognition;
787	   thus it is better to use "%acT" instead of "%ACT".  The character `%'
788	   itself must be represented using "%25".  As with URNs, %-encoding
789	   permits ARKs to support legacy namespaces (e.g., ISBN, ISSN, SICI)
790	   that have less restricted character repertoires [RFC2288].

792	2.7.  Normalization and Lexical Equivalence

794	   To determine if two or more ARKs identify the same object, the ARKs
795	   are compared for lexical equivalence after first being normalized.
796	   Since ARK strings may appear in various forms (e.g., having different
797	   NMAHs), normalizing them minimizes the chances that comparing two ARK
798	   strings for equality will fail unless they actually identify
799	   different objects.  In a specified-host ARK (one having an NMAH), the
800	   NMAH never participates in such comparisons.  Normalization described
801	   here serves to define lexical equivalence but does not restrict how
802	   implementors normalize ARKs locally for storage.

804	   Normalization of a received ARK for the purpose of octet-by-octet
805	   equality comparison with another ARK consists of the following steps.

807	   1.  The NMAH part (eg, everything from an initial "http://" up to the
808	       next slash), if present is removed.

810	   2.  Any URI query string is removed (everything from the first
811	       literal '?' to the end of the string).

813	   3.  The first case-insensitive match on "ark:/" or "ark:" is
814	       converted to "ark:" (replacing any upper case letters and
815	       removing any terminal '/').

817	   4.  In the string that remains, the two characters following every
818	       occurrence of `%' are converted to lower case.  The case of all
819	       other letters in the ARK string must be preserved.

821	   5.  All hyphens, are removed.

823	   6.  If normalization is being done as part of a resolution step, and
824	       if the end of the remaining string matches a known inflection,
825	       the inflection is noted and removed.

827	   7.  Structural characters (slash and period) are normalized: initial
828	       and final occurrences are removed, and two structural characters
829	       in a row (e.g., // or ./) are replaced by the first character,
830	       iterating until each occurrence has at least one non-structural
831	       character on either side.

833	   8.  If there are any components with a period on the left and a slash
834	       on the right, either the component and the preceding period must
835	       be moved to the end of the Name part or the ARK must be thrown
836	       out as malformed.

838	   9.  The final step is to arrange the suffixes in ASCII collating
839	       sequence (that is, to sort them) and to remove duplicate
840	       suffixes, if any.  It is also permissible to throw out ARKs for
841	       which the suffixes are not sorted.

843	   The resulting ARK string is now normalized.  Comparisons between
844	   normalized ARKs are case-sensitive, meaning that upper case letters
845	   are considered different from their lower case counterparts.

847	   To keep ARK string variation to a minimum, no reserved ARK characters
848	   should be %-encoded unless it is deliberately to conceal their
849	   reserved meanings.  No non-reserved ARK characters should ever be
850	   %-encoded.  Finally, no %-encoded character should ever appear in an
851	   ARK in its decoded form.

853	3.  Naming Considerations

855	   The most important threats faced by persistence providers include
856	   such things as funding loss, natural disaster, political and social
857	   upheaval, processing faults, and errors in human oversight.  There is
858	   nothing that an identifer scheme can do about such things.  Still, a
859	   few observed identifier failures and inconveniences can be traced
860	   back to naming practices that we now know to be less than optimal for
861	   persistence.

863	3.1.  ARKS Embedded in Language

865	   The ARK has different goals from the URI, so it has different
866	   character set requirements.  Because linguistic constructs imperil
867	   persistence, for ARKs non-ASCII character support is unimportant.
868	   ARKs and URIs share goals of transcribability and transportability
869	   within web documents, so characters are required to be visible, non-
870	   conflicting with HTML/XML syntax, and not subject to tampering during
871	   transmission across common transport gateways.  Add the goal of
872	   making an undelimited ARK recognizable in running prose, as in
873	   ark:12025/=@_22*$, and certain punctuation characters (e.g., comma,
874	   period) end up being excluded from the ARK lest the end of a phrase
875	   or sentence be mistaken for part of the ARK.

877	   This consideration has more direct effect on ARK usability in a
878	   natural language context than it has on ARK persistence.  The same is
879	   true of the rule preventing hyphens from having lexical significance.
880	   It is fine to publish ARKs with hyphens in them (e.g., such as the
881	   output of UUID/GUID generators), but the uniform treatment of hyphens
882	   as insignificant reduces the possibility of users transcribing
883	   identifiers that will have been broken through unpredictable
884	   hyphenation by word processors.  Any measure that reduces user
885	   irritation with an identifier will increase its chances of survival.

887	3.2.  Objects Should Wear Their Identifiers

889	   A valuable technique for provision of persistent objects is to try to
890	   arrange for the complete identifier to appear on, with, or near its
891	   retrieved object.  An object encountered at a moment in time when its
892	   discovery context has long since disappeared could then easily be
893	   traced back to its metadata, to alternate versions, to updates, etc.
894	   This has seen reasonable success, for example, in book publishing and
895	   software distribution.  An identifier string only has meaning when
896	   its association is known, and this a very sure, simple, and low-tech
897	   method of reminding everyone exactly what that association is.

899	3.3.  Names are Political, not Technological

901	   If persistence is the goal, a deliberate local strategy for
902	   systematic name assignment is crucial.  Names must be chosen with
903	   great care.  Poorly chosen and managed names will devastate any
904	   persistence strategy, and they do not discriminate by identifier
905	   scheme.  Whether a mistakenly re-assigned name is a URN, DOI, PURL,
906	   URL, or ARK, the damage -- failed access and confusion -- is not
907	   mitigated more in one scheme than in another.  Conversely, in-house
908	   efforts to manage names responsibly will go much further towards
909	   safeguarding persistence than any choice of naming scheme or name
910	   resolution technology.

912	   Branding (e.g., at the corporate or departmental level) is important
913	   for funding and visibility, but substrings representing brands and
914	   organizational names should be given a wide berth except when
915	   absolutely necessary in the hostname (the identity-inert) part of the
916	   ARK.  These substrings are not only unstable because organizations
917	   change frequently, but they are also dangerous because successor
918	   organizations often have political or legal reasons to actively
919	   suppress predecessor names and brands.  Any measure that reduces the
920	   chances of future political or legal pressure on an identifier will
921	   decrease the chances that our descendants will be obliged to
922	   deliberately break it.

924	3.4.  Choosing a Hostname or NMA

926	   Hostnames appearing in any identifier meant to be persistent must be
927	   chosen with extra care.  The tendency in hostname selection has
928	   traditionally been to choose a token with recognizable attributes,
929	   such as a corporate brand, but that tendency wreaks havoc with
930	   persistence that is supposed to outlive brands, corporations, subject
931	   classifications, and natural language semantics (e.g., what did the
932	   three letters "gay" mean in 1958, 1978, and 1998?).  Today's
933	   recognized and correct attributes are tomorrow's stale or incorrect
934	   attributes.  In making hostnames (any names, actually) long-term
935	   persistent, it helps to eliminate recognizable attributes to the
936	   extent possible.  This affects selection of any name based on URLs,
937	   including PURLs and the explicitly disposable NMAHs.

939	   There is no excuse for a provider that manages its internal names
940	   impeccably not to exercise the same care in choosing what could be an
941	   exceptionally durable hostname, especially if it would form the
942	   prefix for all the provider's URL-based external names.  Registering
943	   an opaque hostname in the ".org" or ".net" domain would not be a bad
944	   start.  Another way is to publish your ARKs with an organizational
945	   domain name that will be mapped by DNS to an appropriate NMA host.
946	   This makes for shorter names with less branding vulnerability.

948	   It is a mistake to think that hostnames are inherently unstable.  If
949	   you require brand visibility, that may be a fact of life.  But things
950	   are easier if yours is the brand of long-lived cultural memory
951	   institution such as a national or university library or archive.
952	   Well-chosen hostnames from organizations that are sheltered from the
953	   direct effects of a volatile marketplace can easily provide longer-
954	   lived global resolvers than the domain names explicitly or implicitly
955	   used as starting points for global resolution by indirection-based
956	   persistent identifier schemes.  For example, it is hard to imagine
957	   circumstances under which the Library of Congress' domain name would
958	   disappear sooner than, say, "handle.net".

960	   For smaller libraries, archives, and preservation organizations,
961	   there is a natural concern about whether they will be able to keep
962	   their web servers and domain names in the face of uncertain funding.
963	   One option is to form or join a consortium [N2T] of like-minded
964	   organizations with the purpose of providing mutual preservation
965	   support.  The first goal of such a consortium would be to perpetually
966	   rent a hostname on which to establish a web server that simply
967	   redirects incoming member organization requests to the appropriate
968	   member server; using ARKs, for example, a 150-member consortium could
969	   run a very small server (24x7) that contained nothing more than 150
970	   rewrite rules in its configuration file.  Even more helpful would be
971	   additional consortial support for a member organization that was
972	   unable to continue providing services and needed to find a successor
973	   archival organization.  This would be a low-cost, low-tech way to
974	   publish ARKs (or URLs) under highly persistent hostnames.

976	   There are no obvious reasons why the organizations registering DNS
977	   names, URN Namespaces, and DOI publisher IDs should have among them
978	   one that is intrinsically more fallible than the next.  Moreover, it
979	   is a misconception that the demise of DNS and of HTTP need adversely
980	   affect the persistence of URLs.  At such a time, certainly URLs from
981	   the present day might not then be actionable by our present-day
982	   mechanisms, but resolution systems for future non-actionable URLs are
983	   no harder to imagine than resolution systems for present-day non-
984	   actionable URNs and DOIs.  There is no more stable a namespace than
985	   one that is dead and frozen, and that would then characterize the
986	   space of names bearing the "http://" prefix.  It is useful to
987	   remember that just because hostnames have been carelessly chosen in
988	   their brief history does not mean that they are unsuitable in NMAHs
989	   (and URLs) intended for use in situations demanding the highest level
990	   of persistence available in the Internet environment.  A well-planned
991	   name assignment strategy is everything.

993	3.5.  Assigners of ARKs

995	   A Name Assigning Authority (NAA) is an organization that creates (or
996	   delegates creation of) long-term associations between identifiers and
997	   information objects.  Examples of NAAs include national libraries,
998	   national archives, and publishers.  An NAA may arrange with an
999	   external organization for identifier assignment.  The US Library of
1000	   Congress, for example, allows OCLC (the Online Computer Library
1001	   Center, a major world cataloger of books) to create associations
1002	   between Library of Congress call numbers (LCCNs) and the books that
1003	   OCLC processes.  A cataloging record is generated that testifies to
1004	   each association, and the identifier is included by the publisher,
1005	   for example, in the front matter of a book.

1007	   An NAA does not so much create an identifier as create an
1008	   association.  The NAA first draws an unused identifier string from
1009	   its namespace, which is the set of all identifiers under its control.
1010	   It then records the assignment of the identifier to an information
1011	   object having sundry witnessed characteristics, such as a particular
1012	   author and modification date.  A namespace is usually reserved for an
1013	   NAA by agreement with recognized community organizations (such as
1014	   IANA and ISO) that all names containing a particular string be under
1015	   its control.  In the ARK an NAA is represented by the Name Assigning
1016	   Authority Number (NAAN).

1018	   The ARK namespace reserved for an NAA is the set of names bearing its
1019	   particular NAAN.  For example, all strings beginning with
1020	   "ark:12025/" are under control of the NAA registered under 12025,
1021	   which might be the National Library of Finland.  Because each NAA has
1022	   a different NAAN, names from one namespace cannot conflict with those
1023	   from another.  Each NAA is free to assign names from its namespace
1024	   (or delegate assignment) according to its own policies.  These
1025	   policies must be documented in a manner similar to the declarations
1026	   required for URN Namespace registration [RFC2611].

1028	   Organizations can request or update a NAAN by filling out a form
1029	   [NAANrequest].

1031	3.6.  NAAN Namespace Management

1033	   Every NAA must have a namespace management strategy.  A time-honored
1034	   technique is to hierarchically partition a namespace into
1035	   subnamespaces using prefixes that guarantee non-collision of names in
1036	   different partition.  This practice is strongly encouraged for all
1037	   NAAs, especially when subnamespace management will be delegated to
1038	   other departments, units, or projects within an organization.  For
1039	   example, with a NAAN that is assigned to a university and managed by
1040	   its main library, care should be taken to reserve semantically opaque
1041	   prefixes that will set aside large parts of the unused namespace for
1042	   future assignments.  Prefix-based partition management is an
1043	   important responsibility of the NAA.

1045	   This sort of delegation by prefix is well-used in the formation of
1046	   DNS names and ISBN identifiers.  An important difference is that in
1047	   the former, the hierarchy is deliberately exposed and in the latter
1048	   it is hidden.  Rather than using lexical boundary markers such as the
1049	   period (`.') found in domain names, the ISBN uses a publisher prefix
1050	   but doesn't disclose where the prefix ends and the publisher's
1051	   assigned name begins.  This practice of non-disclosure, borrowed from
1052	   the ISBN and ISSN schemes, is encouraged in assigning ARKs, because
1053	   it reduces the visibility of an assertion that is probably not
1054	   important now and may become a vulnerability later.

1056	   Reasonable prefixes for assigned names usually consist of consonants
1057	   and digits and are 1-5 characters in length.  For example, the
1058	   constant prefix "x9t" might be delegated to a book digitization
1059	   project that creates identifiers such as

1061	           http://444.berkeley.edu/ark:28722/x9t38rk45c

1063	   If longevity is the goal, it is important to keep the prefixes free
1064	   of recognizable semantics; for example, using an acronym representing
1065	   a project or a department is discouraged.  At the same time, you may
1066	   wish to set aside a subnamespace for testing purposes under a prefix
1067	   such as "fk..." that can serve as a visual clue and reminder to
1068	   maintenance staff that this "fake" identifier was never published.

1070	   There are other measures one can take to avoid user confusion,
1071	   transcription errors, and the appearance of accidental semantics when
1072	   creating identifiers.  If you are generating identifiers
1073	   automatically, pure numeric identifiers are likeley to be
1074	   semantically opaque enough, but it's probably useful to avoid leading
1075	   zeroes because some users mistakenly treat them as optional, thinking
1076	   (arithmetically) that they don't contribute to the "value" of the
1077	   identifier.

1079	   If you need lots of identifiers and you don't want them to get too
1080	   long, you can mix digits with consonants (but avoid vowels since they
1081	   might accidentally spell words) to get more identifiers without
1082	   increasing the string length.  In this case you may not want more
1083	   than a two letters in a row because it reduces the chance of
1084	   generating acronyms.  Generator tools such as [NOID] provide support
1085	   for these sorts of identifiers, and can also add a computed check
1086	   character as a guarantee against the most common transcription
1087	   errors.

1089	3.7.  Sub-Object Naming

1091	   As mentioned previously, semantically opaque identifiers are very
1092	   useful for long-term naming of abstract objects, however, it may be
1093	   appropriate to extend these names with less opaque extensions that
1094	   reference contemporary service entry points (sub-objects) in support
1095	   of the object.  Sub-object extensions beginning with a digit or
1096	   underscore (`_') are reserved for the possibilty of developing a
1097	   future registry of canonical service points (e.g., numeric references
1098	   to versions, formats, languages, etc).

1100	4.  Finding a Name Mapping Authority

1102	   In order to derive an actionable identifier (these days, a URL) from
1103	   an ARK, a hostport (hostname or hostname plus port combination) for a
1104	   working Name Mapping Authority (NMA) must be found.  An NMA is a
1105	   service that is able to respond to the three basic ARK service
1106	   requests.  Relying on registration and client-side discovery, NMAs
1107	   make known which NAAs' identifiers they are willing to service.

1109	   Upon encountering an ARK, a user (or client software) looks inside it
1110	   for the optional NMAH part (the hostport of the NMA's ARK service).
1111	   If it contains an NMAH that is working, this NMAH discovery step may
1112	   be skipped; the NMAH effectively uses the beginning of an ARK to
1113	   cache the results of a prior mapping authority discovery process.  If
1114	   a new NMAH needs to found, the client looks inside the ARK again for
1115	   the NAAN (Name Assigning Authority Number).  Querying a global
1116	   database, it then uses the NAAN to look up all current NMAHs that
1117	   service ARKs issued by the identified NAA.

1119	   The global database is key, and ideally the lookup would be automatic
1120	   and transparent to the user.  For this, the most promising method is
1121	   probably the Name-to-Thing (N2T) Resolver [N2T] at n2t.net.  It is a
1122	   proposed low-cost, highly reliable, consortially maintained NMAH that
1123	   simply exists to support actionable HTTP-based URLs for as long as
1124	   HTTP is used.  One of its big advantages over the other two methods
1125	   and the URN, Handle, DOI, and PURL methods, is that N2T addresses the
1126	   namespace splitting problem.  When objects maintained by one NMA are
1127	   inherited by more than one successor NMA, until now one of those
1128	   successors would be required to maintain forwarding tables on behalf
1129	   of the other successors.

1131	   There are two other ways to discover an NMAH, one of them described
1132	   in a subsection below.  Another way, described in an appendix, is
1133	   based on a simplification of the URN resolver discovery method,
1134	   itself very similar in principle to the resolver discovery method
1135	   used by Handles and DOIs.  None of these methods does more than what
1136	   can be done with a very small, consortially maintained web server
1137	   such as [N2T].

1139	   In the interests of long-term persistence, however, ARK mechanisms
1140	   are first defined in high-level, protocol-independent terms so that
1141	   mechanisms may evolve and be replaced over time without compromising
1142	   fundamental service objectives.  Either or both specific methods
1143	   given here may eventually be supplanted by better methods since, by
1144	   design, the ARK scheme does not depend on a particular method, but
1145	   only on having some method to locate an active NMAH.

1147	   At the time of issuance, at least one NMAH for an ARK should be
1148	   prepared to service it.  That NMA may or may not be administered by
1149	   the Name Assigning Authority (NAA) that created it.  Consider the
1150	   following hypothetical example of providing long-term access to a
1151	   cancer research journal.  The publisher wishes to turn a profit and
1152	   the National Library of Medicine wishes to preserve the scholarly
1153	   record.  An agreement might be struck whereby the publisher would act
1154	   as the NAA and the national library would archive the journal issue
1155	   when it appears, but without providing direct access for the first
1156	   six months.  During the first six months of peak commercial
1157	   viability, the publisher would retain exclusive delivery rights and
1158	   would charge access fees.  Again, by agreement, both the library and
1159	   the publisher would act as NMAs, but during that initial period the
1160	   library would redirect requests for issues less than six months old
1161	   to the publisher.  At the end of the waiting period, the library
1162	   would then begin servicing requests for issues older than six months
1163	   by tapping directly into its own archives.  Meanwhile, the publisher
1164	   might routinely redirect incoming requests for older issues to the
1165	   library.  Long-term access is thereby preserved, and so is the
1166	   commercial incentive to publish content.

1168	   Although it will be common for an NAA also to run an NMA service, it
1169	   is never a requirement.  Over time NAAs and NMAs will come and go.
1170	   One NMA will succeed another, and there might be many NMAs serving
1171	   the same ARKs simultaneously (e.g., as mirrors or as competitors).
1172	   There might also be asymmetric but coordinated NMAs as in the
1173	   library-publisher example above.

1175	4.1.  Looking Up NMAHs in a Globally Accessible File

1177	   This subsection describes a way to look up NMAHs using a simple name
1178	   authority table represented as a plain text file.  For efficient
1179	   access the file may be stored in a local filesystem, but it needs to
1180	   be reloaded periodically to incorporate updates.  It is not expected
1181	   that the size of the file or frequency of update should impose an
1182	   undue maintenance or searching burden any time soon, for even
1183	   primitive linear search of a file with ten-thousand NAAs is a
1184	   subsecond operation on modern server machines.  The proposed file
1185	   strategy is similar to the /etc/hosts file strategy that supported
1186	   Internet host address lookup for a period of years before the advent
1187	   of DNS.

1189	   The name authority table file is updated on an ongoing basis and is
1190	   available for copying over the internet from a number of mirror sites
1191	   [NAANregistry].  The file contains comment lines (lines that begin
1192	   with `#') explaining the format and giving the file's modification
1193	   time, reloading address, and NAA registration instructions.  There is
1194	   even a Perl script that processes the file embedded in the file's
1195	   comments.  The currently registered Name Assigning Authorities are:

1197	   12025   National Library of Medicine
1198	   12026   Library of Congress
1199	   12027   National Agriculture Library
1200	   13030   California Digital Library
1201	   13038   World Intellectual Property Organization
1202	   20775   University of California San Diego
1203	   29114   University of California San Francisco
1204	   28722   University of California Berkeley
1205	   21198   University of California Los Angeles
1206	   15230   Rutgers University
1207	   13960   Internet Archive
1208	   64269   Digital Curation Centre
1209	   62624   New York University
1210	   67531   University of North Texas
1211	   27927   Ithaka Electronic-Archiving Initiative
1212	   12148   Bibliotheque nationale de France
1213	              / National Library of France
1214	   78319   Google
1215	   88435   Princeton University
1216	   78428   University of Washington
1217	   89901   Archives of the Region of Vaestra Goetaland
1218	              and City of Gothenburg, Sweden
1219	   80444   Northwest Digital Archives
1220	   25593   Emory University
1221	   25031   University of Kansas
1222	   17101   Centre for Ecology & Hydrology, UK
1223	   65323   University of Calgary
1224	   61001   University of Chicago
1225	   52327   Bibliotheque et Archives Nationales du Quebec
1226	              / National Libary and Archives of Quebec
1227	   39331   National Szechenyi Library / National Library of Hungary
1228	   26677   Library and Archives Canada / Bibliotheque et Archives Canada

1230	5.  Generic ARK Service Definition

1232	   An ARK request's output is delivered information; examples include
1233	   the object itself, a policy declaration (e.g., a promise of support),
1234	   a descriptive metadata record, or an error message.  The experience
1235	   of object delivery is expected to be an evolving mix of information
1236	   that reflects changing service expectations and technology
1237	   requirements; contemporary examples include such things as an object
1238	   summary and component links formatted for human consumption.  ARK
1239	   services must be couched in high-level, protocol-independent terms if
1240	   persistence is to outlive today's networking infrastructural
1241	   assumptions.  The high-level ARK service definitions listed below are
1242	   followed in the next section by a concrete method (one of many
1243	   possible methods) for delivering these services with today's
1244	   technology.

1246	5.1.  Generic ARK Access Service (access, location)

1248	   Returns (a copy of) the object or a redirect to the same, although a
1249	   sensible object proxy may be substituted.  Examples of sensible
1250	   substitutes include,

1252	   o  a table of contents instead of a large complex document,

1254	   o  a home page instead of an entire web site hierarchy,

1256	   o  a rights clearance challenge before accessing protected data,

1258	   o  directions for access to an offline object (e.g., a book),

1260	   o  a description of an intangible object (a disease, an event), or

1262	   o  an applet acting as "player" for a large multimedia object.

1264	   May also return a discriminated list of alternate object locators.
1265	   If access is denied, returns an explanation of the object's current
1266	   (perhaps permanent) inaccessibility.

1268	5.1.1.  Generic Policy Service (permanence, naming, etc.)

1270	   Returns declarations of policy and support commitments for given
1271	   ARKs.  Declarations are returned in either a structured metadata
1272	   format or a human readable text format; sometimes one format may
1273	   serve both purposes.  Policy subareas may be addressed in separate
1274	   requests, but the following areas should should be covered: object
1275	   permanence, object naming, object fragment addressing, and
1276	   operational service support.

1278	   The permanence declaration for an object is a rating defined with
1279	   respect to an identified permanence provider (guarantor), which will
1280	   be the NMA.  It may include the following aspects.

1282	      (a) "object availability" -- whether and how access to the object
1283	      is supported (e.g., online 24x7, or offline only),

1285	      (b) "identifier validity" -- under what conditions the identifier
1286	      will be or has been re-assigned,

1288	      (c) "content invariance" -- under what conditions the content of
1289	      the object is subject to change, and

1291	      (d) "change history" -- access to corrections, migrations, and
1292	      revisions, whether through links to the changed objects themselves
1293	      or through a document summarizing the change history

1295	   A recent approach to persistence statements, conceived independently
1296	   from ARKs, can be found at [PStatements], with ongoing work available
1297	   at Appendix A.  An older approach to a permanence rating framework is
1298	   given in [NLMPerm], which identified the following "permanence
1299	   levels":

1301	      Not Guaranteed: No commitment has been made to retain this
1302	      resource.  It could become unavailable at any time.  Its
1303	      identifier could be changed.

1305	      Permanent: Dynamic Content: A commitment has been made to keep
1306	      this resource permanently available.  Its identifier will always
1307	      provide access to the resource.  Its content could be revised or
1308	      replaced.

1310	      Permanent: Stable Content: A commitment has been made to keep this
1311	      resource permanently available.  Its identifier will always
1312	      provide access to the resource.  Its content is subject only to
1313	      minor corrections or additions.

1315	      Permanent: Unchanging Content: A commitment has been made to keep
1316	      this resource permanently available.  Its identifier will always
1317	      provide access to the resource.  Its content will not change.

1319	   Naming policy for an object includes an historical description of the
1320	   NAA's (and its successor NAA's) policies regarding differentiation of
1321	   objects.  Since it the NMA who responds to requests for policy
1322	   statements, it is useful for the NMA to be able to produce or
1323	   summarize these historical NAA documents.  Naming policy may include
1324	   the following aspects.

1326	      (i) "similarity" -- (or "unity") the limit, defined by the NAA, to
1327	      the level of dissimilarity beyond which two similar objects
1328	      warrant separate identifiers but before which they share one
1329	      single identifier, and

1331	      (ii) "granularity" -- the limit, defined by the NAA, to the level
1332	      of object subdivision beyond which sub-objects do not warrant
1333	      separately assigned identifiers but before which sub-objects are
1334	      assigned separate identifiers.

1336	   Subnaming policy for an object describes the qualifiers that the NMA,
1337	   in fulfilling its ongoing and evolving service obligations, allows as
1338	   extensions to an NAA-assigned ARK.  To the conceptual object that the
1339	   NAA named with an ARK, the NMA may add component access points and
1340	   derivatives (e.g., format migrations in aid of preservation) in order
1341	   to provide both basic and value-added services.

1343	   Addressing policy for an object includes a description of how, during
1344	   access, object components (e.g., paragraphs, sections) or views
1345	   (e.g., image conversions) may or may not be "addressed", in other
1346	   words, how the NMA permits arguments or parameters to modify the
1347	   object delivered as the result of an ARK request.  If supported,
1348	   these sorts of operations would provide things like byte-ranged
1349	   fragment delivery and open-ended format conversions, or any set of
1350	   possible transformations that would be too numerous to list or to
1351	   identify with separately assigned ARKs.

1353	   Operational service support policy includes a description of general
1354	   operational aspects of the NMA service, such as after-hours staffing
1355	   and trouble reporting procedures.

1357	5.1.2.  Generic Description Service

1359	   Returns a description of the object.  Descriptions are returned in a
1360	   structured metadata format, human readable text format, or in one
1361	   format that serves both purposes (such as human-readable HTML with
1362	   embedded machine-readable metadata).  A description must at a minimum
1363	   answer the who, what, when, and where questions concerning an
1364	   expression of the object.  Standalone descriptions should be
1365	   accompanied by the modification date and source of the description
1366	   itself.  May also return discriminated lists of ARKs that are related
1367	   to the given ARK.

1369	5.2.  Overview of The HTTP URL Mapping Protocol (THUMP)

1371	   The HTTP URL Mapping Protocol (THUMP) is a way of taking a key (any
1372	   identifier) and asking such questions as, what information does this
1373	   identify and how permanent is it?  [THUMP] is in fact one specific
1374	   method under development for delivering ARK services.  The protocol
1375	   runs over HTTP to exploit the web browser's current pre-eminence as
1376	   user interface to the Internet.  THUMP is designed so that a person
1377	   can enter ARK requests directly into the location field of current
1378	   browser interfaces.  Because it runs over HTTP, THUMP can be
1379	   simulated and tested via keyboard-based interactions [RFC0854].

1381	   The asker (a person or client program) starts with an identifier,
1382	   such as an ARK or a URL.  The identifier reveals to the asker (or
1383	   allows the asker to infer) the Internet host name and port number of
1384	   a server system that responds to questions.  Here, this is just the
1385	   NMAH that is obtained by inspection and possibly lookup based on the
1386	   ARK's NAAN.  The asker then sets up an HTTP session with the server
1387	   system, sends a question via a THUMP request (contained within an
1388	   HTTP request), receives an answer via a THUMP response (contained
1389	   within an HTTP response), and closes the session.  That concludes the
1390	   connected portion of the protocol.

1392	   A THUMP request is a string of characters beginning with a `?'
1393	   (question mark) that is appended to the identifier string.  The
1394	   resulting string is sent as an argument to HTTP's GET command.
1395	   Request strings too long for GET may be sent using HTTP's POST
1396	   command.  The three most common requests correspond to three
1397	   degenerate special cases that keep the user's learning and typing
1398	   burden low.  First, a simple key with no request at all is the same
1399	   as an ordinary access request.  Thus a plain ARK entered into a
1400	   browser's location field behaves much like a plain URL, and returns
1401	   access to the primary identified object, for instance, an HTML
1402	   document.

1404	   The second special case is a minimal ARK description request string
1405	   consisting of just "?".  For example, entering the string,

1407	           ark.nlm.nih.gov/12025/psbbantu?

1409	   into the browser's location field directly precipitates a request for
1410	   a metadata record describing the object identified by ark:12025/
1411	   psbbantu.  The browser, unaware of THUMP, prepares and sends an HTTP
1412	   GET request in the same manner as for a URL.  THUMP is designed so
1413	   that the response (indicated by the returned HTTP content type) is
1414	   normally displayed, whether the output is structured for machine
1415	   processing (text/plain) or formatted for human consumption (text/
1416	   html).

1418	   In the following example THUMP session, each line has been annotated
1419	   to include a line number and whether it was the client or server that
1420	   sent it.  Without going into much depth, the session has four pieces
1421	   separated from each other by blank lines: the client's piece (lines
1422	   1-3), the server's HTTP/THUMP response headers (4-7), and the body of
1423	   the server's response (8-13).  The first and last lines (1 and 13)
1424	   correspond to the client's steps to start the TCP session and the
1425	   server's steps to end it, respectively.

1427	    1  C: [opens session]
1428	       C: GET http://ark.nlm.nih.gov/ark:12025/psbbantu? HTTP/1.1
1429	       C:
1430	       S: HTTP/1.1 200 OK
1431	    5  S: Content-Type: text/plain
1432	       S: THUMP-Status: 0.6 200 OK
1433	       S:
1434	       S: erc:
1435	       S: who:    Lederberg, Joshua
1436	   10  S: what:   Studies of Human Families for Genetic Linkage
1437	       S: when:   1974
1438	       S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1439	       S: [closes session]

1441	   The first two server response lines (4-5) above are typical of HTTP.
1442	   The next line (6) is peculiar to THUMP, and indicates the THUMP
1443	   version and a normal return status.

1445	   The balance of the response consists of a single metadata record
1446	   (8-12) that comprises the ARK description service response.  The
1447	   returned record is in the format of an Electronic Resource Citation
1448	   [ERC], which is discussed in overview in the next section.  For now,
1449	   note that it contains four elements that answer the top priority
1450	   questions regarding an expression of the object: who played a major
1451	   role in expressing it, what the expression was called, when is was
1452	   created, and where the expression may be found.  This quartet of
1453	   elements comes up again and again in ERCs.

1455	   The third degenerate special case of an ARK request (and no other
1456	   cases will be described in this document) is the string "??",
1457	   corresponding to a minimal permanence policy request.  It can be seen
1458	   in use appended to an ARK (on line 2) in the example session that
1459	   follows.

1461	    1  C: [opens session]
1462	       C: GET http://ark.nlm.nih.gov/ark:12025/psbbantu?? HTTP/1.1
1463	       C:
1464	       S: HTTP/1.1 200 OK
1465	    5  S: Content-Type: text/plain
1466	       S: THUMP-Status: 0.6 200 OK
1467	       S:
1468	       S: erc:
1469	       S: who:    Lederberg, Joshua
1470	   10  S: what:   Studies of Human Families for Genetic Linkage
1471	       S: when:   1974
1472	       S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1473	       S: erc-support:
1474	       S: who:    USNLM
1475	   15  S: what:   Permanent, Unchanging Content
1476	       S: when:   20010421
1477	       S: where:  http://ark.nlm.nih.gov/yy22948
1478	       S: [closes session]

1480	   Each segment in an ERC tells a different story relating to the
1481	   object, so although the same four questions (elements) appear in
1482	   each, the answers depend on the segment's story type.  While the
1483	   first segment tells the story of an expression of the object, the
1484	   second segment tells the story of the support commitment made to it:
1485	   who made the commitment, what the nature of the commitment was, when
1486	   it was made, and where a fuller explanation of the commitment may be
1487	   found.

1489	5.3.  The Electronic Resource Citation (ERC)

1491	   An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a
1492	   kind of object description that uses Dublin Core Kernel metadata
1493	   elements [DCKernel].  The ERC with Kernel elements provides a simple,
1494	   compact, and printable record for holding data associated with an
1495	   information resource.  As originally designed [Kernel], Kernel
1496	   metadata balances the needs for expressive power, very simple machine
1497	   processing, and direct human manipulation.

1499	   The previous section shows two limited examples of what is fully
1500	   described elsewhere [ERC].  The rest of this short section provides
1501	   some of the background and rationale for this record format.

1503	   A founding principle of Kernel metadata is that direct human contact
1504	   with metadata will be a necessary and sufficient condition for the
1505	   near term rapid development of metadata standards, systems, and
1506	   services.  Thus the machine-processable Kernel elements must only
1507	   minimally strain people's ability to read, understand, change, and
1508	   transmit ERCs without their relying on intermediation with
1509	   specialized software tools.  The basic ERC needs to be succinct,
1510	   transparent, and trivially parseable by software.

1512	   In the current Internet, it is natural seriously to consider using
1513	   XML as an exchange format because of predictions that it will obviate
1514	   many ad hoc formats and programs, and unify much of the world's
1515	   information under one reliable data structuring discipline that is
1516	   easy to generate, verify, parse, and render.  It appears, however,
1517	   that XML is still only catching on after years of standards work and
1518	   implementation experience.  The reasons for it are unclear, but for
1519	   now very simple XML interpretation is still out of reach.  Another
1520	   important caution is that XML structures are hard on the eyeballs,
1521	   taking up an amount of display (and page) space that significantly
1522	   exceeds that of traditional formats.  Until these conflicts with ERC
1523	   principle are resolved, XML is not a first choice for representing
1524	   ERCs.  Borrowing instead from the data structuring format that
1525	   underlies the successful spread of email and web services, the first
1526	   ERC format uses [ANVL], which is based on email and HTTP headers
1527	   [RFC2822].  There is a naturalness to ANVL's label-colon-value format
1528	   (seen in the previous section) that barely needs explanation to a
1529	   person beginning to enter ERC metadata.

1531	   Besides simplicity of ERC system implementation and data entry
1532	   mechanics, ERC semantics (what the record and its constituent parts
1533	   mean) must also be easy to explain.  ERC semantics are based on a
1534	   reformulation and extension of the Dublin Core [RFC5013] hypothesis,
1535	   which suggests that the fifteen Dublin Core metadata elements have a
1536	   key role to play in cross-domain resource description.  The ERC
1537	   design recognizes that the Dublin Core's primary contribution is the
1538	   international, interdisciplinary consensus that identified fifteen
1539	   semantic buckets (element categories), regardless of how they are
1540	   labeled.  The ERC then adds a definition for a record and some
1541	   minimal compliance rules.  In pursuing the limits of simplicity, the
1542	   ERC design combines and relabels some Dublin Core buckets to isolate
1543	   a tiny kernel (subset) of four elements for basic cross-domain
1544	   resource description.

1546	   For the cross-domain kernel, the ERC uses the four basic elements --
1547	   who, what, when, and where -- to pretend that every object in the
1548	   universe can have a uniform minimal description.  Each has a name or
1549	   other identifier, a location, some responsible person or party, and a
1550	   date.  It doesn't matter what type of object it is, or whether one
1551	   plans to read it, interact with it, smoke it, wear it, or navigate
1552	   it.  Of course, this approach is flawed because uniformity of
1553	   description for some object types requires more semantic contortion
1554	   and sacrifice than for others.  That is why at the beginning of this
1555	   document, the ARK was said to be suited to objects that accommodate
1556	   reasonably regular electronic description.

1558	   While insisting on uniformity at the most basic level provides
1559	   powerful cross-domain leverage, the semantic sacrifice is great for
1560	   many applications.  So the ERC also permits a semantically rich and
1561	   nuanced description to co-exist in a record along with a basic
1562	   description.  In that way both sophisticated and naive recipients of
1563	   the record can extract the level of meaning from it that best suits
1564	   their needs and abilities.  Key to unlocking the richer description
1565	   is a controlled vocabulary of ERC record types (not explained in this
1566	   document) that permit knowledgeable recipients to apply defined sets
1567	   of additional assumptions to the record.

1569	5.4.  Advice to Web Clients

1571	   ARKs are envisaged to appear wherever durable object references are
1572	   planned.  Library cataloging records, literature citations, and
1573	   bibliographies are important examples.  In many of these places URLs
1574	   (Uniform Resource Locators) are currently used, and inside some of
1575	   those URLs are embedded URNs, Handles, and DOIs.  Unfortunately,
1576	   there's no suggestion of a way to probe for extra services that would
1577	   build confidence in those identifiers; in other words, there's no way
1578	   to tell whether any of those identifiers is any better managed than
1579	   the average URL.

1581	   ARKs are also envisaged to appear in hypertext links (where they are
1582	   not normally shown to users) and in rendered text (displayed or
1583	   printed).  A normal HTML link for which the URL is not displayed
1584	   looks like this.

1586	   <a href = "http://example.org/index.htm"> Click Here <a>

1588	   A URL with an embedded ARK invites access (via `?' and `??') to extra
1589	   services:

1591	   <a href = "http://example.org/ark:14697/b12345x"> Click Here <a>

1593	   Using the [N2T] resolver to provide identifier-scheme-agnostic
1594	   protection against hostname instability, this ARK could be published
1595	   as:

1597	   <a href = "http://n2t.net/ark:14697/b12345x"> Click Here <a>

1599	   An NAA will typically make known the associations it creates by
1600	   publishing them in catalogs, actively advertizing them, or simply
1601	   leaving them on web sites for visitors (e.g., users, indexing
1602	   spiders) to stumble across in browsing.

1604	5.5.  Security Considerations

1606	   The ARK naming scheme poses no direct risk to computers and networks.
1607	   Implementors of ARK services need to be aware of security issues when
1608	   querying networks and filesystems for Name Mapping Authority
1609	   services, and the concomitant risks from spoofing and obtaining
1610	   incorrect information.  These risks are no greater for ARK mapping
1611	   authority discovery than for other kinds of service discovery.  For
1612	   example, recipients of ARKs with a specified hostport (NMAH) should
1613	   treat it like a URL and be aware that the identified ARK service may
1614	   no longer be operational.

1616	   Apart from mapping authority discovery, ARK clients and servers
1617	   subject themselves to all the risks that accompany normal operation
1618	   of the protocols underlying mapping services (e.g., HTTP, Z39.50).
1619	   As specializations of such protocols, an ARK service may limit
1620	   exposure to the usual risks.  Indeed, ARK services may enhance a kind
1621	   of security by helping users identify long-term reliable references
1622	   to information objects.

1624	6.  References

1626	   [ANVL]     Kunze, J. and B. Kahle, "A Name-Value Language", 2008,
1627	              <http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf>.

1629	   [ARK]      Kunze, J., "Towards Electronic Persistence Using ARK
1630	              Identifiers", IWAW/ECDL Annual Workshop Proceedings 3rd,
1631	              August 2003,
1632	              <http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze>.

1634	   [ARKagency]
1635	              ARKs-in-the-Open, "ARK Maintenance Agency", 2019,
1636	              <https://arks.org>.

1638	   [DCKernel]
1639	              Initiative, D. C. M., "Kernel Metadata Working Group",
1640	              2001-2008, <http://dublincore.org/groups/kernel/>.

1642	   [DOI]      Foundation, I. D., "The Digital Object Identifier (DOI)
1643	              System", February 2001, <http://dx.doi.org/10.1000/203>.

1645	   [ERC]      Kunze, J. and A. Turner, "Kernel Metadata and Electronic
1646	              Resource Citations", October 2007,
1647	              <http://www.cdlib.org/inside/diglib/ark/ercspec.html>.

1649	   [Handle]   Lannom, L., "Handle System Overview", ICSTI Forum No. 30,
1650	              April 1999, <http://www.icsti.org/forum/30/#lannom>.

1652	   [Kernel]   Kunze, J., "A Metadata Kernel for Electronic Permanence",
1653	              Journal of Digital Information Vol 2, Issue 2,
1654	              ISSN 1368-7506, January 2002,
1655	              <http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/>.

1657	   [N2T]      Library, C. D., "Name-to-Thing Resolver", August 2006,
1658	              <http://n2t.net>.

1660	   [NAANregistry]
1661	              ARKs.org, "NAAN Registry", 2019,
1662	              <https://arks.org/e/pub/naan_registry.txt>.

1664	   [NAANrequest]
1665	              ARKs.org, "NAAN Request Form", 2018,
1666	              <https://n2t.net/e/naan_request>.

1668	   [NLMPerm]  Byrnes, M., "Defining NLM's Commitment to the Permanence
1669	              of Electronic Information", ARL 212:8-9, October 2000,
1670	              <http://www.arl.org/newsltr/212/nlm.html>.

1672	   [NOID]     Kunze, J., "Nice Opaque Identifiers", February 2005,
1673	              <http://www.cdlib.org/inside/diglib/ark/noid.pdf>.

1675	   [PStatements]
1676	              Kunze, J., "Persistence statements: describing digital
1677	              stickiness", October 2016,
1678	              <https://n2t.net/ark:/13030/c7833mx7t>.

1680	   [PURL]     Shafer, K., "Introduction to Persistent Uniform Resource
1681	              Locators", 1996, <http://purl.oclc.org/OCLC/PURL/INET96>.

1683	   [RFC0854]  Postel, J. and J. Reynolds, "Telnet Protocol
1684	              Specification", STD 8, RFC 854, DOI 10.17487/RFC0854, May
1685	              1983, <https://www.rfc-editor.org/info/rfc854>.

1687	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
1688	              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
1689	              <https://www.rfc-editor.org/info/rfc1034>.

1691	   [RFC2141]  Moats, R., "URN Syntax", RFC 2141, DOI 10.17487/RFC2141,
1692	              May 1997, <https://www.rfc-editor.org/info/rfc2141>.

1694	   [RFC2288]  Lynch, C., Preston, C., and R. Daniel, "Using Existing
1695	              Bibliographic Identifiers as Uniform Resource Names",
1696	              RFC 2288, DOI 10.17487/RFC2288, February 1998,
1697	              <https://www.rfc-editor.org/info/rfc2288>.

1699	   [RFC2611]  Daigle, L., van Gulik, D., Iannella, R., and P. Faltstrom,
1700	              "URN Namespace Definition Mechanisms", BCP 33, RFC 2611,
1701	              DOI 10.17487/RFC2611, June 1999,
1702	              <https://www.rfc-editor.org/info/rfc2611>.

1704	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1705	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
1706	              Transfer Protocol -- HTTP/1.1", RFC 2616,
1707	              DOI 10.17487/RFC2616, June 1999,
1708	              <https://www.rfc-editor.org/info/rfc2616>.

1710	   [RFC2822]  Resnick, P., Ed., "Internet Message Format", RFC 2822,
1711	              DOI 10.17487/RFC2822, April 2001,
1712	              <https://www.rfc-editor.org/info/rfc2822>.

1714	   [RFC2915]  Mealling, M. and R. Daniel, "The Naming Authority Pointer
1715	              (NAPTR) DNS Resource Record", RFC 2915,
1716	              DOI 10.17487/RFC2915, September 2000,
1717	              <https://www.rfc-editor.org/info/rfc2915>.

1719	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1720	              Resource Identifier (URI): Generic Syntax", STD 66,
1721	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
1722	              <https://www.rfc-editor.org/info/rfc3986>.

1724	   [RFC5013]  Kunze, J. and T. Baker, "The Dublin Core Metadata Element
1725	              Set", RFC 5013, DOI 10.17487/RFC5013, August 2007,
1726	              <https://www.rfc-editor.org/info/rfc5013>.

1728	   [THUMP]    Gamiel, K. and J. Kunze, "The HTTP URL Mapping Protocol",
1729	              August 2007,
1730	              <http://www.cdlib.org/inside/diglib/thumpspec.pdf>.

1732	Appendix A.  ARK Maintenance Agency: arks.org

1734	   The ARK Maintenance Agency [ARKagency] at arks.org has several
1735	   functions.

1737	   o  To manage the registry of organizations that will be assigning
1738	      ARKs.  Organizations can request or update a NAAN by filling out a
1739	      form [NAANrequest].

1741	   o  To be a clearinghouse for information about ARKs, such as best
1742	      practices, introductory documentation, tutorials, community
1743	      forums, etc.  These supplemental resources help ARK implementor in
1744	      high-level applications across different sectors and disciplines,
1745	      and with a variety of metadata standards.

1747	   o  To be a locus of discussion about future versions of the ARK
1748	      specification.

1750	Appendix B.  Looking up NMAHs Distributed via DNS

1752	   This subsection introduces an older method for looking up NMAHs that
1753	   is based on the method for discovering URN resolvers described in
1754	   [RFC2915].  It relies on querying the DNS system already installed in
1755	   the background infrastructure of most networked computers.  A query
1756	   is submitted to DNS asking for a list of resolvers that match a given
1757	   NAAN.  DNS distributes the query to the particular DNS servers that
1758	   can best provide the answer, unless the answer can be found more
1759	   quickly in a local DNS cache as a side-effect of a recent query.
1760	   Responses come back inside Name Authority Pointer (NAPTR) records.
1761	   The normal result is one or more candidate NMAHs.

1763	   In its full generality the [RFC2915] algorithm ambitiously
1764	   accommodates a complex set of preferences, orderings, protocols,
1765	   mapping services, regular expression rewriting rules, and DNS record
1766	   types.  This subsection proposes a drastic simplification of it for
1767	   the special case of ARK mapping authority discovery.  The simplified
1768	   algorithm is called Maptr.  It uses only one DNS record type (NAPTR)
1769	   and restricts most of its field values to constants.  The following
1770	   hypothetical excerpt from a DNS data file for the NAAN known as 12026
1771	   shows three example NAPTR records ready to use with the Maptr
1772	   algorithm.

1774	     12026.ark.arpa.
1775	     ;; US Library of Congress
1776	     ;;       order pref flags service regexp  replacement
1777	      IN NAPTR  0     0   "h"  "ark"   "USLC"  lhc.nlm.nih.gov:8080
1778	      IN NAPTR  0     0   "h"  "ark"   "USLC"  foobar.zaf.org
1779	      IN NAPTR  0     0   "h"  "ark"   "USLC"  sneezy.dopey.com

1781	   All the fields are held constant for Maptr except for the "flags",
1782	   "regexp", and "replacement" fields.  The "service" field contains the
1783	   constant value "ark" so that NAPTR records participating in the Maptr
1784	   algorithm will not be confused with other NAPTR records.  The "order"
1785	   and "pref" fields are held to 0 (zero) and otherwise ignored for now;
1786	   the algorithm may evolve to use these fields for ranking decisions
1787	   when usage patterns and local administrative needs are better
1788	   understood.

1790	   When a Maptr query returns a record with a flags field of "h" (for
1791	   hostport, a Maptr extension to the NAPTR flags), the replacement
1792	   field contains the NMAH (hostport) of an ARK service provider.  When
1793	   a query returns a record with a flags field of "" (the empty string),
1794	   the client needs to submit a new query containing the domain name
1795	   found in the replacement field.  This second sort of record exploits
1796	   the distributed nature of DNS by redirecting the query to another
1797	   domain name.  It looks like this.

1799	     12345.ark.arpa.
1800	     ;; Digital Library Consortium
1801	     ;;       order pref flags service regexp replacement
1802	      IN NAPTR  0     0    ""  "ark"     ""   dlc.spct.org.

1804	   Here is the Maptr algorithm for ARK mapping authority discovery.  In
1805	   it replace <NAAN> with the NAAN from the ARK for which an NMAH is
1806	   sought.

1808	   1.  Initialize the DNS query: type=NAPTR, query=<NAAN>.ark.arpa.

1810	   2.  Submit the query to DNS and retrieve (NAPTR) records, discarding
1811	       any record that does not have "ark" for the service field.

1813	   3.  All remaining records with a flags fields of "h" contain
1814	       candidate NMAHs in their replacement fields.  Set them aside, if
1815	       any.

1817	   4.  Any record with an empty flags field ("") has a replacement field
1818	       containing a new domain name to which a subsequent query should
1819	       be redirected.  For each such record, set query=<replacement>
1820	       then go to step (2).  When all such records have been recursively
1821	       exhausted, go to step (5).

1823	   5.  All redirected queries have been resolved and a set of candidate
1824	       NMAHs has been accumulated from steps (3).  If there are zero
1825	       NMAHs, exit -- no mapping authority was found.  If there is one
1826	       or more NMAH, choose one using any criteria you wish, then exit.

1828	   A Perl script that implements this algorithm is included here.

1830	   #!/depot/bin/perl

1832	   use Net::DNS;                           # include simple DNS package
1833	   my $qtype = "NAPTR";                    # initialize query type
1834	   my $naa = shift;                        # get NAAN script argument
1835	   my $mad = new Net::DNS::Resolver;       # mapping authority discovery

1837	   &maptr("$naa.ark.arpa");                # call maptr - that's it

1839	   sub maptr {                             # recursive maptr algorithm
1840	           my $dname = shift;              # domain name as argument
1841	           my ($rr, $order, $pref, $flags, $service, $regexp,
1842	                   $replacement);
1843	           my $query = $mad->query($dname, $qtype);
1844	           return                          # non-productive query
1845	                   if (! $query || ! $query->answer);
1846	           foreach $rr ($query->answer) {
1847	                   next                    # skip records of wrong type
1848	                           if ($rr->type ne $qtype);
1849	                   ($order, $pref, $flags, $service, $regexp,
1850	                           $replacement) = split(/\s/, $rr->rdatastr);
1851	                   if ($flags eq "") {
1852	                           &maptr($replacement);   # recurse
1853	                   } elsif ($flags eq "h") {
1854	                           print "$replacement\n"; # candidate NMAH
1855	                   }
1856	           }
1857	   }

1859	   The global database thus distributed via DNS and the Maptr algorithm
1860	   can easily be seen to mirror the contents of the Name Authority
1861	   Table file described in the previous section.

1863	Authors' Addresses

1865	   John A. Kunze
1866	   California Digital Library
1867	   415 20th St, 4th Floor
1868	   Oakland, CA  94612
1869	   USA

1871	   Email: jak@ucop.edu
1872	   Emmanuelle Bermes
1873	   Bibliotheque nationale de France
1874	   Quai Francois Mauriac
1875	   Paris, Cedex 13  75706
1876	   France

1878	   Email: emmanuelle.bermes@bnf.fr