idnits 2.17.1 

draft-kunze-ark-27.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** The abstract seems to contain references ([Qualifier]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1794 has weird spacing: '... regexp  repla...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (February 21, 2021) is 1159 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'Qualifier' is mentioned on line 523, but not defined

  ** Obsolete normative reference: RFC 2141 (Obsoleted by RFC 8141)

  ** Obsolete normative reference: RFC 2611 (Obsoleted by RFC 3406)

  ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231,
     RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322)

  ** Obsolete normative reference: RFC 2915 (Obsoleted by RFC 3401, RFC 3402,
     RFC 3403, RFC 3404)


     Summary: 8 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           J. Kunze
3	Internet-Draft                                California Digital Library
4	Intended status: Informational                                 E. Bermes
5	Expires: August 25, 2021                Bibliotheque nationale de France
6	                                                       February 21, 2021

8	                       The ARK Identifier Scheme
9	                           draft-kunze-ark-27

11	Abstract

13	   The ARK (Archival Resource Key) naming scheme is designed to
14	   facilitate the high-quality and persistent identification of
15	   information objects.  A founding principle of the ARK is that
16	   persistence is purely a matter of service and is neither inherent in
17	   an object nor conferred on it by a particular naming syntax.  The
18	   best that an identifier can do is to lead users to the services that
19	   support robust reference.  The term ARK itself refers both to the
20	   scheme and to any single identifier that conforms to it.  An ARK has
21	   five components:

23	   [https://NMA/]ark:[/]NAAN/Name[Qualifier]

25	   an optional and mutable Name Mapping Authority (usually a hostname),
26	   the "ark:" label, the Name Assigning Authority Number (NAAN), the
27	   assigned Name, and an optional and possibly mutable Qualifier
28	   supported by the NMA.  The NAAN and Name together form the immutable
29	   persistent identifier for the object independent of the URL hostname.
30	   An ARK is a special kind of URL that connects users to three things:
31	   the named object, its metadata, and the provider's promise about its
32	   persistence.  When entered into the location field of a Web browser,
33	   the ARK leads the user to the named object.  That same ARK, inflected
34	   by appending `?info', returns a metadata record that is both human-
35	   and machine-readable.  The returned record contains core metadata and
36	   a commitment statement from the current provider.  Tools exist for
37	   minting, binding, and resolving ARKs.

39	Status of This Memo

41	   This Internet-Draft is submitted in full conformance with the
42	   provisions of BCP 78 and BCP 79.

44	   Internet-Drafts are working documents of the Internet Engineering
45	   Task Force (IETF).  Note that other groups may also distribute
46	   working documents as Internet-Drafts.  The list of current Internet-
47	   Drafts is at https://datatracker.ietf.org/drafts/current/.

49	   Internet-Drafts are draft documents valid for a maximum of six months
50	   and may be updated, replaced, or obsoleted by other documents at any
51	   time.  It is inappropriate to use Internet-Drafts as reference
52	   material or to cite them other than as "work in progress."

54	   This Internet-Draft will expire on August 25, 2021.

56	Copyright Notice

58	   Copyright (c) 2021 IETF Trust and the persons identified as the
59	   document authors.  All rights reserved.

61	   This document is subject to BCP 78 and the IETF Trust's Legal
62	   Provisions Relating to IETF Documents
63	   (https://trustee.ietf.org/license-info) in effect on the date of
64	   publication of this document.  Please review these documents
65	   carefully, as they describe your rights and restrictions with respect
66	   to this document.

68	Table of Contents

70	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
71	     1.1.  Reasons to Use ARKs . . . . . . . . . . . . . . . . . . .   4
72	     1.2.  Three Requirements of ARKs  . . . . . . . . . . . . . . .   5
73	     1.3.  Organizing Support for ARKs:  Our Stuff vs. Their Stuff .   6
74	     1.4.  Definition of Identifier  . . . . . . . . . . . . . . . .   8
75	   2.  ARK Anatomy . . . . . . . . . . . . . . . . . . . . . . . . .   9
76	     2.1.  The Name Mapping Authority (NMA)  . . . . . . . . . . . .   9
77	     2.2.  The ARK Label Part (ark:) . . . . . . . . . . . . . . . .  11
78	     2.3.  The Name Assigning Authority Number (NAAN)  . . . . . . .  11
79	     2.4.  The Name Part . . . . . . . . . . . . . . . . . . . . . .  12
80	       2.4.1.  Optional: Shoulder and Blade  . . . . . . . . . . . .  13
81	     2.5.  The Qualifier Part  . . . . . . . . . . . . . . . . . . .  14
82	       2.5.1.  ARKs that Reveal Object Hierarchy . . . . . . . . . .  15
83	       2.5.2.  ARKs that Reveal Object Variants  . . . . . . . . . .  16
84	     2.6.  Character Repertoires . . . . . . . . . . . . . . . . . .  18
85	     2.7.  Normalization and Lexical Equivalence . . . . . . . . . .  19
86	   3.  Naming Considerations . . . . . . . . . . . . . . . . . . . .  20
87	     3.1.  ARKS Embedded in Language . . . . . . . . . . . . . . . .  20
88	     3.2.  Objects Should Wear Their Identifiers . . . . . . . . . .  21
89	     3.3.  Names are Political, not Technological  . . . . . . . . .  21
90	     3.4.  Choosing a Hostname or NMA  . . . . . . . . . . . . . . .  21
91	     3.5.  Assigners of ARKs . . . . . . . . . . . . . . . . . . . .  23
92	     3.6.  NAAN Namespace Management . . . . . . . . . . . . . . . .  24
93	     3.7.  Sub-Object Naming . . . . . . . . . . . . . . . . . . . .  25
94	   4.  Finding a Name Mapping Authority  . . . . . . . . . . . . . .  25
95	     4.1.  Looking Up NMAs in a Globally Accessible File . . . . . .  27
96	   5.  Generic ARK Service Definition  . . . . . . . . . . . . . . .  27
97	     5.1.  Generic ARK Access Service (access, location) . . . . . .  27
98	       5.1.1.  Generic Policy Service (permanence, naming, etc.) . .  28
99	       5.1.2.  Generic Description Service . . . . . . . . . . . . .  30
100	     5.2.  Overview of The HTTP URL Mapping Protocol (THUMP) . . . .  30
101	     5.3.  The Electronic Resource Citation (ERC)  . . . . . . . . .  33
102	     5.4.  Advice to Web Clients . . . . . . . . . . . . . . . . . .  34
103	     5.5.  Security Considerations . . . . . . . . . . . . . . . . .  35
104	   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  35
105	   Appendix A.  ARK Maintenance Agency: arks.org . . . . . . . . . .  38
106	   Appendix B.  Looking up NMAs Distributed via DNS  . . . . . . . .  38
107	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  41

109	1.  Introduction

111	   [ Note about this transitional draft.  The ARKsInTheOpen.org
112	   Technical Working Group (https://wiki.duraspace.org/display/ARKs/
113	   Technical+Working+Group) is in the process of revising the ARK spec
114	   via a series of Internet-Drafts.  This draft contains many minor but
115	   noisy changes (lots of diffs but not much real change).  While the
116	   spec is in transition, new implementors should follow
117	   https://datatracker.ietf.org/doc/html/draft-kunze-ark-18. ]

119	   This document describes a scheme for the high-quality naming of
120	   information resources.  The scheme, called the Archival Resource Key
121	   (ARK), is well suited to long-term access and identification of any
122	   information resources that accommodate reasonably regular electronic
123	   description.  This includes digital documents, databases, software,
124	   and websites, as well as physical objects (books, bones, statues,
125	   etc.) and intangible objects (chemicals, diseases, vocabulary terms,
126	   performances).  Hereafter the term "object" refers to an information
127	   resource.  The term ARK itself refers both to the scheme and to any
128	   single identifier that conforms to it.  A reasonably concise and
129	   accessible overview and rationale for the scheme is available at
130	   [ARK].

132	   Schemes for persistent identification of network-accessible objects
133	   are not new.  In the early 1990's, the design of the Uniform Resource
134	   Name [RFC2141] responded to the observed failure rate of URLs by
135	   articulating an indirect, non-hostname-based naming scheme and the
136	   need for responsible name management.  Meanwhile, promoters of the
137	   Digital Object Identifier [DOI] succeeded in building a community of
138	   providers around a mature software system [Handle] that supports name
139	   management.  The Persistent Uniform Resource Locator [PURL] was
140	   another scheme that had the advantage of working with unmodified web
141	   browsers.  ARKs represent an approach that attempts to build on the
142	   strengths and to avoid the weaknesses of these schemes.

144	   A founding principle of the ARK is that persistence is purely a
145	   matter of service.  Persistence is neither inherent in an object nor
146	   conferred on it by a particular naming syntax.  Nor is the technique
147	   of name indirection -- upon which URNs, Handles, DOIs, and PURLs are
148	   founded -- of central importance.  Name indirection is an ancient and
149	   well-understood practice; new mechanisms for it keep appearing and
150	   distracting practitioner attention, with the Domain Name System (DNS)
151	   [RFC1034] being a particularly dazzling and elegant example.  What is
152	   often forgotten is that maintenance of an indirection table is an
153	   unavoidable cost to the organization providing persistence, and that
154	   cost is equivalent across naming schemes.  That indirection has
155	   always been a native part of the web while being so lightly utilized
156	   for the persistence of web-based objects indicates how unsuited most
157	   organizations will probably be to the task of table maintenance and
158	   to the much more fundamental challenge of keeping the objects
159	   themselves viable.

161	   Persistence is achieved through a provider's successful stewardship
162	   of objects and their identifiers.  The highest level of persistence
163	   will be reinforced by a provider's robust contingency, redundancy,
164	   and succession strategies.  It is further safeguarded to the extent
165	   that a provider's mission is shielded from funding and political
166	   instabilities.  These are by far the major challenges confronting
167	   persistence providers, and no identifier scheme has any direct impact
168	   on them.  In fact, some schemes may actually be liabilities for
169	   persistence because they create short- and long-term dependencies for
170	   every object access on complex, special-purpose infrastructures,
171	   parts of which are proprietary and all of which increase the carry-
172	   forward burden for the preservation community.  It is for this reason
173	   that the ARK scheme relies only on educated name assignment and light
174	   use of general-purpose infrastructures that are maintained mostly by
175	   the internet community at large (the DNS, web servers, and web
176	   browsers).

178	1.1.  Reasons to Use ARKs

180	   If no persistent identifier scheme contributes directly to
181	   persistence, why not just use URLs?  A particular URL may be as
182	   durable an identifier as it is possible to have, but nothing
183	   distinguishes it from an ordinary URL to the recipient who is
184	   wondering if it is suitable for long-term reference.  An ARK embedded
185	   in a URL provides some of the necessary conditions for credible
186	   persistence, inviting access to not one, but to three things: to the
187	   object, to its metadata, and to a nuanced statement of commitment
188	   from the provider in question (the NMA, described below) regarding
189	   the object.  Existence of the extra service can be probed
190	   automatically by appending `?info' to the ARK.

192	   The form of the ARK also supports the natural separation of naming
193	   authorities into the original name assigning authority and the
194	   diverse multiple name mapping (or servicing) authorities that in
195	   succession and in parallel will take over custodial responsibilities
196	   from the original assigner (assuming the assigner ever held that
197	   responsibility) for the large majority of a long-term object's
198	   archival lifetime.  The name mapping authority, indicated by the
199	   hostname part of the URL that contains the ARK, serves to launch the
200	   ARK into cyberspace.  Should it ever fail (and there is no reason why
201	   a well-chosen hostname for a 100-year-old cultural memory institution
202	   shouldn't last as long as the DNS), that host name is considered
203	   disposeable and replaceable.  Again, the form of the ARK helps
204	   because it defines exactly how to recover the core immutable object
205	   identity, and simple algorithms (one based on the URN model) or even
206	   by-hand internet query can be used for for locating another mapping
207	   authority.

209	   There are tools to assist in generating ARKs and other identifiers,
210	   such as [NOID] and "uuidgen", both of which rely for uniqueness on
211	   human-maintained registries.  This document also contains some
212	   guidelines and considerations for managing namespaces and choosing
213	   hostnames with persistence in mind.

215	1.2.  Three Requirements of ARKs

217	   The first requirement of an ARK is to give users a link from an
218	   object to a promise of stewardship for it.  That promise is a multi-
219	   faceted covenant that binds the word of an identified service
220	   provider to a specific set of responsibilities.  It is critical for
221	   the promise to come from a current provider and almost irrelevant,
222	   over a long period of time, what the original assigner's intentions
223	   were.  No one can tell if successful stewardship will take place
224	   because no one can predict the future.  Reasonable conjecture,
225	   however, may be based on past performance.  There must be a way to
226	   tie a promise of persistence to a provider's demonstrated or
227	   perceived ability -- its reputation -- in that arena.  Provider
228	   reputations would then rise and fall as promises are observed
229	   variously to be kept and broken.  This is perhaps the best way we
230	   have for gauging the strength of any persistence promise.

232	   The second requirement of an ARK is to give users a link from an
233	   object to a description of it.  The problem with a naked identifier
234	   is that without a description real identification is incomplete.
235	   Identifiers common today are relatively opaque, though some contain
236	   ad hoc clues reflecting assertions that were briefly true, such as
237	   where in a filesystem hierarchy an object lived during a short stay.
238	   Possession of both an identifier and an object is some improvement,
239	   but positive identification may still be uncertain since the object
240	   itself might not include a matching identifier or might not carry
241	   evidence obvious enough to reveal its identity without significant
242	   research.  In either case, what is called for is a record bearing
243	   witness to the identifier's association with the object, as supported
244	   by a recorded set of object characteristics.  This descriptive record
245	   is partly an identification "receipt" with which users and archivists
246	   can verify an object's identity after brief inspection and a
247	   plausible match with recorded characteristics such as title and size.

249	   The final requirement of an ARK is to give users a link to the object
250	   itself (or to a copy) if at all possible.  Persistent identification
251	   plays a vital supporting role but, strictly speaking, it can be
252	   construed as no more than a record attesting to the original
253	   assignment of a never-reassigned identifier.  Object access may not
254	   be feasible for various reasons, such as a transient service outage,
255	   a catastrophic loss, a licensing agreement that keeps an archive
256	   "dark" for a period of years, or when an object's own lack of
257	   tangible existence confuses normal concepts of access (e.g., a
258	   vocabulary term might be "accessed" through its definition).  In such
259	   cases the ARK's identification role assumes a much higher profile.
260	   But attempts to simplify the persistence problem by decoupling access
261	   from identification and concentrating exclusively on the latter are
262	   of questionable utility.  A perfect system for assigning forever
263	   unique identifiers might be created, but if it did so without
264	   reducing access failure rates, no one would be interested.  The
265	   central issue -- which may be crudely summed up as the "HTTP 404 Not
266	   Found" problem -- would not have been addressed.

268	   The central duty of an ARK is a high-quality experience of access and
269	   identification.  This means supporting reliable access during the
270	   period described in its stewardship promise and, failing that,
271	   supporting reliable access to a record describing the thing the ARK
272	   is associated with.

274	   ARK resolvers must support the `?info' inflection for requesting
275	   metadata.  Older versions of this specification distinguished between
276	   two minimal inflections: `?' (brief metadata) and `??' (more
277	   metadata).  While these older inflections are still reserved, because
278	   they have proven hard to recognize in some environments, supporting
279	   them is optional.

281	1.3.  Organizing Support for ARKs: Our Stuff vs. Their Stuff

283	   An organization and the user community it serves can often be seen to
284	   struggle with two different areas of persistent identification: the
285	   Our Stuff problem and the Their Stuff problem.  In the Our Stuff
286	   problem, we in the organization want our own objects to acquire
287	   persistent names.  Since we possess or control these objects, our
288	   organization tackles the Our Stuff problem directly.  Whether or not
289	   the objects are named by ARKs, our organization is the responsible
290	   party, so it can plan for, maintain, and make commitments about the
291	   objects.

293	   In the Their Stuff problem, we in the organization want others'
294	   objects to acquire persistent names.  These are objects that we do
295	   not own or control, but some of which are critically important to us.
296	   But because they are beyond our influence as far as support is
297	   concerned, creating and maintaining persistent identifiers for Their
298	   Stuff is not especially purposeful or feasible for us to engage in.
299	   There is little that we can do about someone else's stuff except
300	   encourage their uptake or adoption of persistence services.

302	   Co-location of persistent access and identification services is
303	   natural.  Any organization that undertakes ongoing support of true
304	   persistent identification (which includes description) is well-served
305	   if it controls, owns, or otherwise has clear internal access to the
306	   identified objects, and this gives it an advantage if it wishes also
307	   to support persistent access to outsiders.  Conversely, persistent
308	   access to outsiders requires orderly internal collection management
309	   procedures that include monitoring, acquisition, verification, and
310	   change control over objects, which in turn requires object
311	   identifiers persistent enough to support auditable record keeping
312	   practices.

314	   Although organizing ARK support under one roof thus tends to make
315	   sense, object hosting can successfully be separated from name
316	   mapping.  An example is when a name mapping authority centrally
317	   provides uniform resolution services via a protocol gateway on behalf
318	   of organizations that host objects behind a variety of access
319	   protocols.  It is also reasonable to build value-added description
320	   services that rely on the underlying services of a set of mapping
321	   authorities.

323	   Supporting ARKs is not for every organization.  By requiring
324	   specific, revealed commitments to preservation, to object access, and
325	   to description, the bar for providing ARK services is higher than for
326	   some other identifier schemes.  On the other hand, it would be hard
327	   to grant credence to a persistence promise from an organization that
328	   could not muster the minimum ARK services.  Not that there isn't a
329	   business model for an ARK-like, description-only service built on top
330	   of another organization's full complement of ARK services.  For
331	   example, there might be competition at the description level for
332	   abstracting and indexing a body of scientific literature archived in
333	   a combination of open and fee-based repositories.  The description-
334	   only service would have no direct commitment to the objects, but
335	   would act as an intermediary, forwarding commitment statements from
336	   object hosting services to requestors.

338	1.4.  Definition of Identifier

340	   An identifier is not a string of character data -- an identifier is
341	   an association between a string of data and an object.  This
342	   abstraction is necessary because without it a string is just data.
343	   It's nonsense to talk about a string's breaking, or about its being
344	   strong, maintained, and authentic.  But as a representative of an
345	   association, a string can do, metaphorically, the things that we
346	   expect of it.

348	   Without regard to whether an object is physical, digital, or
349	   conceptual, to identify it is to claim an association between it and
350	   a representative string, such as "Jane" or "ISBN 0596000278".  What
351	   gives a claim credibility is a set of verifiable assertions, or
352	   metadata, about the object, such as age, height, title, or number of
353	   pages.  In other words, the association is made manifest by a record
354	   (e.g., a cataloging or other metadata record) that vouches for it.

356	   In the complete absence of any testimony (metadata) regarding an
357	   association, a would-be identifier string is a meaningless sequence
358	   of characters.  To keep an externally visible but otherwise internal
359	   string from being perceived as an identifier by outsiders, for
360	   example, it suffices for an organization not to disclose the nature
361	   of its association.  For our immediate purpose, actual existence of
362	   an association record is more important than its authenticity or
363	   verifiability, which are outside the scope of this specification.

365	   It is a gift to the identification process if an object carries its
366	   own name as an inseparable part of itself, such as an identifier
367	   imprinted on the first page of a document or embedded in a data
368	   structure element of a digital document header.  In cases where the
369	   object is large, unwieldy, or unavailable (such as when licensing
370	   restrictions are in effect), a metadata record that includes the
371	   identifier string will usually suffice.  That record becomes a
372	   conveniently manipulable object surrogate, acting as both an
373	   association "receipt" and "declaration".

375	   Note that our definition of identifier extends the one in use for
376	   Uniform Resource Identifiers [RFC3986].  The present document still
377	   sometimes (ab)uses the terms "ARK" and "identifier" as shorthand for
378	   the string part of an identifier, but the context should make the
379	   meaning clear.

381	2.  ARK Anatomy

383	   An ARK is represented by a sequence of characters (a string) that
384	   contains the label, "ark:", optionally preceded by the beginning part
385	   of a URL.  Here is a diagrammed example.

387	   ARK ANATOMY
388	   ===========

390	         Resolver Service   Base Object Name    Qualifiers
391	        __________________  ________________  _____________
392	       /                  \/                \/             \
393	       https://example.org/ark:12345/x54xz321/s3/f8.05v.tiff
394	               \_________/ \__/\___/\_/\____/\____/\_______/
395	                   |      Label  |   |  Blade   |       |
396	                   |             |   |          |       |
397	   Name Mapping Authority (NMA)  |  Shoulder  Sub-parts Variants
398	                                 |  \_______/
399	                                 | Assigned Base Name
400	                                 |
401	   Name Assigning Authority Number (NAAN)

403	   The ARK syntax can be summarized,

405	                  [https://NMA/]ark:[/]NAAN/Name[Qualifier]

407	   where the NMA, '/', and Qualifier parts are in brackets to indicate
408	   that they are optional.  The Base Object Name is the substring
409	   comprising the "ark:" label, the NAAN and the assigned Name.  The
410	   Resolver Service is replaceable and makes the ARK actionable for a
411	   period of time.  Without the Resolver Service part, what remains is
412	   the Core Immutable Identity (the "persistible") part of the ARK.

414	2.1.  The Name Mapping Authority (NMA)

416	   Before the "ark:" label may appear an optional Name Mapping Authority
417	   (NMA) that is a temporary address where ARK service requests may be
418	   sent.  Preceded by a URI-type protocol designation such as
419	   "https://", it specifies a Resolver Service.  The NMA itself is an
420	   Internet hostname or host/port combination having the same format and
421	   semantics as the host/port part of a URL.  The most important thing
422	   about the NMA is that it is "identity inert" from the point of view
423	   of object identification.  In other words, ARKs that differ only in
424	   the optional NMA part identify the same object.  Thus, for example,
425	   the following three ARKs are synonyms for just one information
426	   object:

428	                    https://loc.gov/ark:12345/x54xz321
429	                https://rutgers.edu/ark:12345/x54xz321
430	                                    ark:12345/x54xz321

432	   Strictly speaking, in the realm of digital objects, these ARKs may
433	   lead over time to somewhat different or diverging instances of the
434	   originally named object.  In an ideal world, divergence of persistent
435	   objects is not desirable, but it is widely believed that digital
436	   preservation efforts will inevitably lead to alterations in some
437	   original objects (e.g, a format migration in order to preserve the
438	   ability to display a document).  If any of those objects are held
439	   redundantly in more than one organization (a common preservation
440	   strategy), chances are small that all holding organizations will
441	   perform the same precise transformations and all maintain the same
442	   object metadata.  More significant divergence would be expected when
443	   the holding organizations serve different audiences or compete with
444	   each other.

446	   The NMA part makes an ARK into an actionable URL.  As with many
447	   internet parameters, it is helpful to approach the NMA being liberal
448	   in what you accept and conservative in what you propose.  From the
449	   recipient's point of view, the NMA part should be treated as
450	   temporary, disposable, and replaceable.  From the NMA's point of
451	   view, it should be chosen with the greatest concern for longevity.  A
452	   carefully chosen NMA should be at least as permanent as the providing
453	   organization's own hostname.  In the case of a national or university
454	   library, for example, there is no reason why the NMA should not be
455	   considerably more permanent than soft-funded proxy hostnames such as
456	   hdl.handle.net, dx.doi.org, and purl.org.  In general and over time,
457	   however, it is not unexpected for an NMA eventually to stop working
458	   and require replacement with the NMA of a currently active service
459	   provider.

461	   This replacement relies on a mapping authority "resolver" discovery
462	   process, of which two alternate methods are outlined in a later
463	   section.  The ARK, URN, Handle, and DOI schemes all use a resolver
464	   discovery model that sooner or later requires matching the original
465	   assigning authority with a current provider servicing that
466	   authority's named objects; once found, the resolver at that provider
467	   performs what amounts to a redirect to a place where the object is
468	   currently held.  All the schemes rely on the ongoing functionality of
469	   currently mainstream technologies such as the Domain Name System
470	   [RFC1034] and web browsers.  The Handle and DOI schemes in addition
471	   require that the Handle protocol layer and global server grid be
472	   available at all times.

474	   The practice of prepending "https://" and an NMA to an ARK is a way
475	   of creating an actionable identifier by a method that is itself
476	   temporary.  Assuming that infrastructure supporting [RFC2616]
477	   information retrieval will no longer be available one day, ARKs will
478	   then have to be converted into new kinds of actionable identifiers.
479	   By that time, if ARKs see widespread use, web browsers would
480	   presumably evolve to perform this (currently simple) transformation
481	   automatically.

483	2.2.  The ARK Label Part (ark:)

485	   The label part distinguishes an ARK from an ordinary identifier.
486	   There is a new form of the label, "ark:", and an old form, "ark:/",
487	   both of which must be recognized in perpetuity.  Implementations
488	   should generate new ARKs in the new form (without the "/") and
489	   resolvers must always treat received ARKs as equivalent if they
490	   differ only in regard to new form versus old form labels.  Thus these
491	   two ARKs are equivalent:

493	                             ark:/12345/x54xz321
494	                              ark:12345/x54xz321

496	   In a URL found in the wild, the label indicates that the URL stands a
497	   reasonable chance of being an ARK.  If the context warrants,
498	   verification that it actually is an ARK can be done by testing it for
499	   existence of the three ARK services.

501	   Since nothing about an identifier syntax directly affects
502	   persistence, the "ark:" label (like "urn:", "doi:", and "hdl:")
503	   cannot tell you whether the identifier is persistent or whether the
504	   object is available.  It does tell you that the original Name
505	   Assigning Authority (NAA) had some sort of hopes for it, but it
506	   doesn't tell you whether that NAA is still in existence, or whether a
507	   decade ago it ceased to have any responsibility for providing
508	   persistence, or whether it ever had any responsibility beyond naming.

510	   Only a current provider can say for certain what sort of commitment
511	   it intends, and the ARK label suggests that you can query the NMA
512	   directly to find out exactly what kind of persistence is promised.
513	   Even if what is promised is impersistence (i.e., a short-term
514	   identifier), saying so is valuable information to the recipient.
515	   Thus an ARK is a high-functioning identifier in the sense that it
516	   provides access to the object, the metadata, and a commitment
517	   statement, even if the commitment is explicitly very weak.

519	2.3.  The Name Assigning Authority Number (NAAN)

521	   Recalling that the general form of the ARK is,

523	                  [https://NMA/]ark:[/]NAAN/Name[Qualifier]

525	   the part of the ARK directly following the "ark:" (or older "ark:/")
526	   label is the Name Assigning Authority Number (NAAN), up to but not
527	   including the next `/' (slash) character.  This part is always
528	   required, as it identifies a hostname of the organization that
529	   originally assigned the Name of the object.  Typically the
530	   organization is an institution, a department, a laboratory, or any
531	   group that conducts a stable, policy-driven name assigning effort.
532	   It is used to discover a currently valid NMA and to provide top-level
533	   partitioning of the space of all ARKs.

535	   An organization may request a NAAN from the ARK Maintenance Agency
536	   [ARKagency] (described in Appendix A) by filling out the form at
537	   [NAANrequest].  NAANs are opaque strings of one or more "betanumeric"
538	   characters, specifically,

540	       0123456789bcdfghjkmnpqrstvwxz

542	   which consists of digits and consonants, minus the letter 'l'.
543	   Restricting NAANs to betanumerics (alphanumerics without vowels or
544	   'l') serves two goals.  It reduces the chances that words -- past,
545	   present, and future -- will appear in NAANs and carry unintended
546	   semantics.  It also helps usability by not mixing commonly confused
547	   characters ('0' and 'O', '1' and 'l') and by being compatible with
548	   strong transcription error detection (eg, the [NOID] check digit
549	   algorithm).  Since 2001, every assigned NAAN has consisted of exactly
550	   five digits.

552	   The NAAN designates a top-level ARK namespace.  Once registered for a
553	   namespace, a NAAN is never re-registered.  It is possible, however,
554	   for there to be a succession of organizations that manage an ARK
555	   namespace.

557	2.4.  The Name Part

559	   The part of the ARK just after the NAAN is the Name assigned by the
560	   NAA, and it is also required.  Semantic opaqueness in the Name part
561	   is strongly encouraged in order to reduce an ARK's vulnerability to
562	   era- and language-specific change.  Identifier strings containing
563	   linguistic fragments can create support difficulties down the road.
564	   No matter how appropriate or even meaningless they are today, such
565	   fragments may one day create confusion, give offense, or infringe on
566	   a trademark as the semantic environment around us and our communities
567	   evolves.

569	   Names that look more or less like numbers avoid common problems that
570	   defeat persistence and international acceptance.  The use of digits
571	   is highly recommended.  Mixing in non-vowel alphabetic characters
572	   (eg, betanumerics) a couple at a time is a relatively safe and easy
573	   way to achieve a denser namespace (more possible names for a given
574	   length of the name string).  Such names have a chance of aging and
575	   traveling well.  The absence of recognizable words makes typos harder
576	   to detect in opaque strings, so a common mitigation is to add a check
577	   character.  Tools exists that mint, bind, and resolve opaque
578	   identifiers, with or without check characters [NOID].  More on naming
579	   considerations is given in a subsequent section.

581	2.4.1.  Optional: Shoulder and Blade

583	   Just as a ARK namespace is subdivided by NAANs reserved for NAAs,
584	   each NAAN is a namespace that can be subdivided into "shoulders",
585	   where each shoulder is reserved for an internal department or unit.
586	   Like the NAAN, which is a string of characters that follows the
587	   "ark:" label, a shoulder is a string of characters (starting with a
588	   "/") that extends the NAAN.  The base object name assigned by the NAA
589	   consists of the NAAN, the shoulder, a final string known as the
590	   "blade".  (The shoulder plus blade terminology mirrors locksmith
591	   jargon describing the information-bearing parts of a key.)

593	   The blade string is chosen by the NAA such that the string created by
594	   concatenating the NAAN plus shoulder plus blade becomes the unique
595	   base object name.  Otherwise the blade may come from any source, for
596	   example, it might come from a counter, a timestamp, a [NOID] minter,
597	   a legacy 100-year-old accession number, etc.  If there is a check
598	   digit, it is expected to appear at the end of the blade and to be
599	   computed over the base object name, which is generally the most
600	   important part of an ARK to make opaque.  In particular, check digits
601	   are not expected to cover qualifiers, which often name subobjects of
602	   a persistent object that are less stable and less opaquely named than
603	   the parent object (for example, ten years hence, the object's
604	   thumbnail image will be of a higher resolution and the OCR text file
605	   will be re-derived with improved algorithms.

607	   It is important not to use any delimiter between the shoulder string
608	   and blade string, especially not a "/" since it declares an object
609	   boundary (see the section on ARKs that reveal object hierarchy).
610	   This little bit of discretion shields organizations from end users
611	   making inferences about expected levels of support based on
612	   recognizable shoulders.  To help in-house ARK administrators reliably
613	   know where the shoulder ends, it is recommended to use the "first-
614	   digit convention" so that shoulders are "primordinal".  A primordinal
615	   shoulder is a sequence of one or more betanumeric characters ending
616	   in a digit.  This means that the shoulder is all consonant letters
617	   (often just one) after the NAAN and "/" up to and including the first
618	   digit encountered after the NAAN.  One property of primordinal
619	   shoulders is that there is an infinite number of them possible under
620	   any NAAN.

622	   To help manage each namespace into the future, NAAs are encouraged to
623	   create at shoulders, even if there is only one to start with.  There
624	   are four NAANs (99999, 12345, 99152, 99166, XXX describe these) that
625	   are shared across organizations.  The create a shoulder on one of
626	   them requires a registration process (XXX).

628	2.5.  The Qualifier Part

630	   The part of the ARK following the NAA-assigned Name is an optional
631	   Qualifier.  It is a string that extends the base ARK in order to
632	   create a kind of service entry point into the object named by the
633	   NAA.  At the discretion of the providing NMA, such a service entry
634	   point permits an ARK to support access to individual hierarchical
635	   components and subcomponents of an object, and to variants (versions,
636	   languages, formats) of components.  A Qualifier may be invented by
637	   the NAA or by any NMA servicing the object.

639	   In form, the Qualifier is a ComponentPath, or a VariantPath, or a
640	   ComponentPath followed by a VariantPath.  A VariantPath is introduced
641	   and subdivided by the reserved character `.', and a ComponentPath is
642	   introduced and subdivided by the reserved character `/'.  In this
643	   example,

645	       https://example.org/ark:12345/x54xz321/s3/f8.05v.tiff

647	   the string "/s3/f8" is a ComponentPath and the string ".05v.tiff" is
648	   a VariantPath.  The ARK Qualifier is a formalization of some
649	   currently mainstream URL syntax conventions.  This formalization
650	   specifically reserves meanings that permit recipients to make strong
651	   inferences about logical sub-object containment and equivalence based
652	   only on the form of the received identifiers; there is great
653	   efficiency in not having to inspect metadata records to discover such
654	   relationships.  NMAs are free not to disclose any of these
655	   relationships merely by avoiding the reserved characters above.
656	   Hierarchical components and variants are discussed further in the
657	   next two sections.

659	   The Qualifier, if present, differs from the Name in several important
660	   respects.  First, a Qualifier may have been assigned either by the
661	   NAA or later by the NMA.  The assignment of a Qualifier by an NMA
662	   effectively amounts to an act of publishing a service entry point
663	   within the conceptual object originally named by the NAA.  For our
664	   purposes, an ARK extended with a Qualifier assigned by an NMA will be
665	   called an NMA-qualified ARK.

667	   Second, a Qualifier assignment on the part of an NMA is made in
668	   fulfillment of its service obligations and may reflect changing
669	   service expectations and technology requirements.  NMA-qualified ARKs
670	   could therefore be transient, even if the base, unqualified ARK is
671	   persistent.  For example, it would be reasonable for an NMA to
672	   support access to an image object through an actionable ARK that is
673	   considered persistent even if the experience of that access changes
674	   as linking, labeling, and presentation conventions evolve and as
675	   format and security standards are updated.  For an image "thumbnail",
676	   that NMA could also support an NMA-qualified ARK that is considered
677	   impersistent because the thumbnail will be replaced with higher
678	   resolution images as network bandwidth and CPU speeds increase.  At
679	   the same time, for an originally scanned, high-resolution master, the
680	   NMA could publish an NMA-qualfied ARK that is itself considered
681	   persistent.  Of course, the NMA must be able to return its separate
682	   commitments to unqualified, NAA-assigned ARKs, to NMA-qualified ARKs,
683	   and to any NAA-qualified ARKs that it supports.

685	   A third difference between a Qualifier and a Name concerns the
686	   semantic opaqueness constraint.  When an NMA-qualified ARK is to be
687	   used as a transient service entry point into a persistent object, the
688	   priority given to semantic opaqueness observed by the NAA in the Name
689	   part may be relaxed by the NMA in the Qualifier part.  If service
690	   priorities in the Qualifier take precedence over persistence, short-
691	   term usability considerations may recommend somewhat semantically
692	   laden Qualifier strings.

694	   Finally, not only is the set of Qualifiers supported by an NMA
695	   mutable, but different NMAs may support different Qualifier sets for
696	   the same NAA-identified object.  In this regard the NMAs act
697	   independently of each other and of the NAA.

699	   The next two sections describe how ARK syntax may be used to declare,
700	   or to avoid declaring, certain kinds of relatedness among qualified
701	   ARKs.

703	2.5.1.  ARKs that Reveal Object Hierarchy

705	   An NAA or NMA may choose to reveal the presence of a hierarchical
706	   relationship between objects using the `/' (slash) character after
707	   the Name part of an ARK.  Some authorities will choose not to
708	   disclose this information, while others will go ahead and disclose so
709	   that manipulators of large sets of ARKs can infer object
710	   relationships by simple identifier inspection; for example, this
711	   makes it possible for a system to present a collapsed view of a large
712	   search result set.

714	   If the ARK contains an internal slash after the NAAN, the piece to
715	   its left indicates a containing object.  For example, publishing an
716	   ARK of the form,
717	                       ark:12345/x54/xz/321

719	   is equivalent to publishing three ARKs,

721	                       ark:12345/x54/xz/321
722	                       ark:12345/x54/xz
723	                       ark:12345/x54

725	   together with a declaration that the first object is contained in the
726	   second object, and that the second object is contained in the third.

728	   Revealing the presence of hierarchy is completely up to the assigner
729	   (NMA or NAA).  It is hard enough to commit to one object's name, let
730	   alone to three objects' names and to a specific, ongoing relatedness
731	   among them.  Thus, regardless of whether hierarchy was present
732	   initially, the assigner, by not using slashes, reveals no shared
733	   inferences about hierarchical or other inter-relatedness in the
734	   following ARKs:

736	                       ark:12345/x54_xz_321
737	                       ark:12345/x54_xz
738	                       ark:12345/x54xz321
739	                       ark:12345/x54xz
740	                       ark:12345/x54

742	   Note that slashes around the ARK's NAAN (/12345/ in these examples)
743	   are not part of the ARK's Name and therefore do not indicate the
744	   existence of some sort of NAAN super object containing all objects in
745	   its namespace.  A slash must have at least one non-structural
746	   character (one that is neither a slash nor a period) on both sides in
747	   order for it to separate recognizable structural components.  So
748	   initial or final slashes may be removed, and double slashes may be
749	   converted into single slashes.

751	2.5.2.  ARKs that Reveal Object Variants

753	   An NAA or NMA may choose to reveal the possible presence of variant
754	   objects or object components using the `.' (period) character after
755	   the Name part of an ARK.  Some authorities will choose not to
756	   disclose this information, while others will go ahead and disclose so
757	   that manipulators of large sets of ARKs can infer object
758	   relationships by simple identifier inspection; for example, this
759	   makes it possible for a system to present a collapsed view of a large
760	   search result set.

762	   If the ARK contains an internal period after Name, the piece to its
763	   left is a root name and the piece to its right, and up to the end of
764	   the ARK or to the next period is a suffix.  A Name may have more than
765	   one suffix, for example,

767	                       ark:12345/x54.24
768	                       ark:12345/x4z/x54.24
769	                       ark:12345/x54.20v.78g.f55

771	   There are two main rules.  First, if two ARKs share the same root
772	   name but have different suffixes, the corresponding objects were
773	   considered variants of each other (different formats, languages,
774	   versions, etc.) by the assigner (NMA or NAA).  Thus, the following
775	   ARKs are variants of each other:

777	                       ark:12345/x54.20v.78g.f55
778	                       ark:12345/x54.321xz
779	                       ark:12345/x54.44

781	   Second, publishing an ARK with a suffix implies the existence of at
782	   least one variant identified by the ARK without its suffix.  The ARK
783	   otherwise permits no further assumptions about what variants might
784	   exist.  So publishing the ARK,

786	                       ark:12345/x54.20v.78g.f55

788	   is equivalent to publishing the four ARKs,

790	                       ark:12345/x54.20v.78g.f55
791	                       ark:12345/x54.20v.78g
792	                       ark:12345/x54.20v
793	                       ark:12345/x54

795	   Revealing the possibility of variants is completely up to the
796	   assigner.  It is hard enough to commit to one object's name, let
797	   alone to multiple variants' names and to a specific, ongoing
798	   relatedness among them.  The assigner is the sole arbiter of what
799	   constitutes a variant within its namespace, and whether to reveal
800	   that kind of relatedness by using periods within its names.

802	   A period must have at least one non-structural character (one that is
803	   neither a slash nor a period) on both sides in order for it to
804	   separate recognizable structural components.  So initial or final
805	   periods may be removed, and adjacent periods may be converted into a
806	   single period.  Multiple suffixes should be arranged in sorted order
807	   (pure ASCII collating sequence) at the end of an ARK.

809	2.6.  Character Repertoires

811	   The Name and Qualifier parts are strings of visible ASCII characters.
812	   For received ARKs, implementations must support a minimum length of
813	   255 octets for the string composed of the Base ARK plus Qualifier.
814	   Implementations generating strings exceeding this length should
815	   understand that receiving implementations may not be able to index
816	   such ARKs properly.  Characters may be letters, digits, or any of
817	   these seven characters:

819	       =   ~   *   +   @   _   $

821	   The following characters may also be used, but their meanings are
822	   reserved:

824	       %   -   .   /

826	   The characters `/' and `.' are ignored if either appears as the last
827	   character of an ARK.  If used internally, they allow a name assigner
828	   to reveal object hierarchy and object variants as previously
829	   described.

831	   Hyphens are considered to be insignificant and are always ignored in
832	   ARKs.  A `-' (hyphen) may appear in an ARK for readability, or it may
833	   have crept in during the formatting and wrapping of text, but it must
834	   be ignored in lexical comparisons.  As in a telephone number, hyphens
835	   have no meaning in an ARK.  It is always safe for an NMA that
836	   receives an ARK to remove any hyphens found in it.  As a result, like
837	   the NMA, hyphens are "identity inert" in comparing ARKs for
838	   equivalence.  For example, the following ARKs are equivalent for
839	   purposes of comparison and ARK service access:

841	                               ark:12345/x5-4-xz-321
842	      https://sneezy.dopey.com/ark:12345/x54--xz32-1
843	                               ark:12345/x54xz321

845	   The `%' character is reserved for %-encoding all other octets that
846	   would appear in the ARK string, in the same manner as for URIs
847	   [RFC3986].  A %-encoded octet consists of a `%' followed by two hex
848	   digits; for example, "%7d" stands in for `}'.  Lower case hex digits
849	   are preferred to reduce the chances of false acronym recognition;
850	   thus it is better to use "%acT" instead of "%ACT".  The character `%'
851	   itself must be represented using "%25".  As with URNs, %-encoding
852	   permits ARKs to support legacy namespaces (e.g., ISBN, ISSN, SICI)
853	   that have less restricted character repertoires [RFC2288].

855	2.7.  Normalization and Lexical Equivalence

857	   To determine if two or more ARKs identify the same object, the ARKs
858	   are compared for lexical equivalence after first being normalized.
859	   Since ARK strings may appear in various forms (e.g., having different
860	   NMAs), normalizing them minimizes the chances that comparing two ARK
861	   strings for equality will fail unless they actually identify
862	   different objects.  In a specified-host ARK (one having an NMA), the
863	   NMA never participates in such comparisons.  Normalization described
864	   here serves to define lexical equivalence but does not restrict how
865	   implementors normalize ARKs locally for storage.

867	   Normalization of a received ARK for the purpose of octet-by-octet
868	   equality comparison with another ARK consists of the following steps.

870	   1.  The NMA part (eg, everything from an initial "https://" up to the
871	       next slash), if present is removed.

873	   2.  Any URI query string is removed (everything from the first
874	       literal '?' to the end of the string).

876	   3.  The first case-insensitive match on "ark:/" or "ark:" is
877	       converted to "ark:" (replacing any upper case letters and
878	       removing any terminal '/').

880	   4.  In the string that remains, the two characters following every
881	       occurrence of `%' are converted to lower case.  The case of all
882	       other letters in the ARK string must be preserved.

884	   5.  All hyphens are removed.

886	   6.  If normalization is being done as part of a resolution step, and
887	       if the end of the remaining string matches a known inflection,
888	       the inflection is noted and removed.

890	   7.  Structural characters (slash and period) are normalized: initial
891	       and final occurrences are removed, and two structural characters
892	       in a row (e.g., // or ./) are replaced by the first character,
893	       iterating until each occurrence has at least one non-structural
894	       character on either side.

896	   8.  If there are any components with a period on the left and a slash
897	       on the right, either the component and the preceding period must
898	       be moved to the end of the Name part or the ARK must be thrown
899	       out as malformed.

901	   The resulting ARK string is now normalized.  Comparisons between
902	   normalized ARKs are case-sensitive, meaning that upper case letters
903	   are considered different from their lower case counterparts.

905	   To keep ARK string variation to a minimum, no reserved ARK characters
906	   should be %-encoded unless it is deliberately to conceal their
907	   reserved meanings.  No non-reserved ARK characters should ever be
908	   %-encoded.  Finally, no %-encoded character should ever appear in an
909	   ARK in its decoded form.

911	3.  Naming Considerations

913	   The most important threats faced by persistence providers include
914	   such things as funding loss, natural disaster, political and social
915	   upheaval, processing faults, and errors in human oversight.  There is
916	   nothing that an identifer scheme can do about such things.  Still, a
917	   few observed identifier failures and inconveniences can be traced
918	   back to naming practices that we now know to be less than optimal for
919	   persistence.

921	3.1.  ARKS Embedded in Language

923	   The ARK has different goals from the URI, so it has different
924	   character set requirements.  Because linguistic constructs imperil
925	   persistence, for ARKs non-ASCII character support is unimportant.
926	   ARKs and URIs share goals of transcribability and transportability
927	   within web documents, so characters are required to be visible, non-
928	   conflicting with HTML/XML syntax, and not subject to tampering during
929	   transmission across common transport gateways.  Add the goal of
930	   making an undelimited ARK recognizable in running prose, as in
931	   ark:12345/=@_22*$, and certain punctuation characters (e.g., comma,
932	   period) end up being excluded from the ARK lest the end of a phrase
933	   or sentence be mistaken for part of the ARK.

935	   This consideration has more direct effect on ARK usability in a
936	   natural language context than it has on ARK persistence.  The same is
937	   true of the rule preventing hyphens from having lexical significance.
938	   It is fine to publish ARKs with hyphens in them (e.g., such as the
939	   output of UUID/GUID generators), but the uniform treatment of hyphens
940	   as insignificant reduces the possibility of users transcribing
941	   identifiers that will have been broken through unpredictable
942	   hyphenation by word processors.  Any measure that reduces user
943	   irritation with an identifier will increase its chances of survival.

945	3.2.  Objects Should Wear Their Identifiers

947	   A valuable technique for provision of persistent objects is to try to
948	   arrange for the complete identifier to appear on, with, or near its
949	   retrieved object.  An object encountered at a moment in time when its
950	   discovery context has long since disappeared could then easily be
951	   traced back to its metadata, to alternate versions, to updates, etc.
952	   This has seen reasonable success, for example, in book publishing and
953	   software distribution.  An identifier string only has meaning when
954	   its association is known, and this a very sure, simple, and low-tech
955	   method of reminding everyone exactly what that association is.

957	3.3.  Names are Political, not Technological

959	   If persistence is the goal, a deliberate local strategy for
960	   systematic name assignment is crucial.  Names must be chosen with
961	   great care.  Poorly chosen and managed names will devastate any
962	   persistence strategy, and they do not discriminate by identifier
963	   scheme.  Whether a mistakenly re-assigned name is a URN, DOI, PURL,
964	   URL, or ARK, the damage -- failed access and confusion -- is not
965	   mitigated more in one scheme than in another.  Conversely, in-house
966	   efforts to manage names responsibly will go much further towards
967	   safeguarding persistence than any choice of naming scheme or name
968	   resolution technology.

970	   Branding (e.g., at the corporate or departmental level) is important
971	   for funding and visibility, but substrings representing brands and
972	   organizational names should be given a wide berth except when
973	   absolutely necessary in the hostname (the identity-inert) part of the
974	   ARK.  These substrings are not only unstable because organizations
975	   change frequently, but they are also dangerous because successor
976	   organizations often have political or legal reasons to actively
977	   suppress predecessor names and brands.  Any measure that reduces the
978	   chances of future political or legal pressure on an identifier will
979	   decrease the chances that our descendants will be obliged to
980	   deliberately break it.

982	3.4.  Choosing a Hostname or NMA

984	   Hostnames appearing in any identifier meant to be persistent must be
985	   chosen with extra care.  The tendency in hostname selection has
986	   traditionally been to choose a token with recognizable attributes,
987	   such as a corporate brand, but that tendency wreaks havoc with
988	   persistence that is supposed to outlive brands, corporations, subject
989	   classifications, and natural language semantics (e.g., what did the
990	   three letters "gay" mean in 1958, 1978, and 1998?).  Today's
991	   recognized and correct attributes are tomorrow's stale or incorrect
992	   attributes.  In making hostnames (any names, actually) long-term
993	   persistent, it helps to eliminate recognizable attributes to the
994	   extent possible.  This affects selection of any name based on URLs,
995	   including PURLs and the explicitly disposable NMAs.

997	   There is no excuse for a provider that manages its internal names
998	   impeccably not to exercise the same care in choosing what could be an
999	   exceptionally durable hostname, especially if it would form the
1000	   prefix for all the provider's URL-based external names.  Registering
1001	   an opaque hostname in the ".org" or ".net" domain would not be a bad
1002	   start.  Another way is to publish your ARKs with an organizational
1003	   domain name that will be mapped by DNS to an appropriate NMA host.
1004	   This makes for shorter names with less branding vulnerability.

1006	   It is a mistake to think that hostnames are inherently unstable.  If
1007	   you require brand visibility, that may be a fact of life.  But things
1008	   are easier if yours is the brand of long-lived cultural memory
1009	   institution such as a national or university library or archive.
1010	   Well-chosen hostnames from organizations that are sheltered from the
1011	   direct effects of a volatile marketplace can easily provide longer-
1012	   lived global resolvers than the domain names explicitly or implicitly
1013	   used as starting points for global resolution by indirection-based
1014	   persistent identifier schemes.  For example, it is hard to imagine
1015	   circumstances under which the Library of Congress' domain name would
1016	   disappear sooner than, say, "handle.net".

1018	   For smaller libraries, archives, and preservation organizations,
1019	   there is a natural concern about whether they will be able to keep
1020	   their web servers and domain names in the face of uncertain funding.
1021	   One option is to form or join a consortium [N2T] of like-minded
1022	   organizations with the purpose of providing mutual preservation
1023	   support.  The first goal of such a consortium would be to perpetually
1024	   rent a hostname on which to establish a web server that simply
1025	   redirects incoming member organization requests to the appropriate
1026	   member server; using ARKs, for example, a 150-member consortium could
1027	   run a very small server (24x7) that contained nothing more than 150
1028	   rewrite rules in its configuration file.  Even more helpful would be
1029	   additional consortial support for a member organization that was
1030	   unable to continue providing services and needed to find a successor
1031	   archival organization.  This would be a low-cost, low-tech way to
1032	   publish ARKs (or URLs) under highly persistent hostnames.

1034	   There are no obvious reasons why the organizations registering DNS
1035	   names, URN Namespaces, and DOI publisher IDs should have among them
1036	   one that is intrinsically more fallible than the next.  Moreover, it
1037	   is a misconception that the demise of DNS and of HTTP need adversely
1038	   affect the persistence of URLs.  At such a time, certainly URLs from
1039	   the present day might not then be actionable by our present-day
1040	   mechanisms, but resolution systems for future non-actionable URLs are
1041	   no harder to imagine than resolution systems for present-day non-
1042	   actionable URNs and DOIs.  There is no more stable a namespace than
1043	   one that is dead and frozen, and that would then characterize the
1044	   space of names bearing the "http://" or "https://" prefix.  It is
1045	   useful to remember that just because hostnames have been carelessly
1046	   chosen in their brief history does not mean that they are unsuitable
1047	   in NMAs (and URLs) intended for use in situations demanding the
1048	   highest level of persistence available in the Internet environment.
1049	   A well-planned name assignment strategy is everything.

1051	3.5.  Assigners of ARKs

1053	   A Name Assigning Authority (NAA) is an organization that creates (or
1054	   delegates creation of) long-term associations between identifiers and
1055	   information objects.  Examples of NAAs include national libraries,
1056	   national archives, and publishers.  An NAA may arrange with an
1057	   external organization for identifier assignment.  The US Library of
1058	   Congress, for example, allows OCLC (the Online Computer Library
1059	   Center, a major world cataloger of books) to create associations
1060	   between Library of Congress call numbers (LCCNs) and the books that
1061	   OCLC processes.  A cataloging record is generated that testifies to
1062	   each association, and the identifier is included by the publisher,
1063	   for example, in the front matter of a book.

1065	   An NAA does not so much create an identifier as create an
1066	   association.  The NAA first draws an unused identifier string from
1067	   its namespace, which is the set of all identifiers under its control.
1068	   It then records the assignment of the identifier to an information
1069	   object having sundry witnessed characteristics, such as a particular
1070	   author and modification date.  A namespace is usually reserved for an
1071	   NAA by agreement with recognized community organizations (such as
1072	   IANA and ISO) that all names containing a particular string be under
1073	   its control.  In the ARK an NAA is represented by the Name Assigning
1074	   Authority Number (NAAN).

1076	   The ARK namespace reserved for an NAA is the set of names bearing its
1077	   particular NAAN.  For example, all strings beginning with
1078	   "ark:12345/" are under control of the NAA registered under 12345,
1079	   which might be the National Library of Finland.  Because each NAA has
1080	   a different NAAN, names from one namespace cannot conflict with those
1081	   from another.  Each NAA is free to assign names from its namespace
1082	   (or delegate assignment) according to its own policies.  These
1083	   policies must be documented in a manner similar to the declarations
1084	   required for URN Namespace registration [RFC2611].

1086	   Organizations can request or update a NAAN by filling out a form
1087	   [NAANrequest].

1089	3.6.  NAAN Namespace Management

1091	   Every NAA must have a namespace management strategy.  A time-honored
1092	   technique is to hierarchically partition a namespace into
1093	   subnamespaces using prefixes that guarantee non-collision of names in
1094	   different partition.  This practice is strongly encouraged for all
1095	   NAAs, especially when subnamespace management will be delegated to
1096	   other departments, units, or projects within an organization.  For
1097	   example, with a NAAN that is assigned to a university and managed by
1098	   its main library, care should be taken to reserve semantically opaque
1099	   prefixes that will set aside large parts of the unused namespace for
1100	   future assignments.  Prefix-based partition management is an
1101	   important responsibility of the NAA.

1103	   This sort of delegation by prefix is well-used in the formation of
1104	   DNS names and ISBN identifiers.  An important difference is that in
1105	   the former, the hierarchy is deliberately exposed and in the latter
1106	   it is hidden.  Rather than using lexical boundary markers such as the
1107	   period (`.') found in domain names, the ISBN uses a publisher prefix
1108	   but doesn't disclose where the prefix ends and the publisher's
1109	   assigned name begins.  This practice of non-disclosure, borrowed from
1110	   the ISBN and ISSN schemes, is encouraged in assigning ARKs, because
1111	   it reduces the visibility of an assertion that is probably not
1112	   important now and may become a vulnerability later.

1114	   Reasonable prefixes for assigned names usually consist of consonants
1115	   and digits and are 1-5 characters in length.  For example, the
1116	   constant prefix "x9t" might be delegated to a book digitization
1117	   project that creates identifiers such as

1119	           https://444.berkeley.edu/ark:28722/x9t38rk45c

1121	   If longevity is the goal, it is important to keep the prefixes free
1122	   of recognizable semantics; for example, using an acronym representing
1123	   a project or a department is discouraged.  At the same time, you may
1124	   wish to set aside a subnamespace for testing purposes under a prefix
1125	   such as "fk..." that can serve as a visual clue and reminder to
1126	   maintenance staff that this "fake" identifier was never published.

1128	   There are other measures one can take to avoid user confusion,
1129	   transcription errors, and the appearance of accidental semantics when
1130	   creating identifiers.  If you are generating identifiers
1131	   automatically, pure numeric identifiers are likeley to be
1132	   semantically opaque enough, but it's probably useful to avoid leading
1133	   zeroes because some users mistakenly treat them as optional, thinking
1134	   (arithmetically) that they don't contribute to the "value" of the
1135	   identifier.

1137	   If you need lots of identifiers and you don't want them to get too
1138	   long, you can mix digits with consonants (but avoid vowels since they
1139	   might accidentally spell words) to get more identifiers without
1140	   increasing the string length.  In this case you may not want more
1141	   than a two letters in a row because it reduces the chance of
1142	   generating acronyms.  Generator tools such as [NOID] provide support
1143	   for these sorts of identifiers, and can also add a computed check
1144	   character as a guarantee against the most common transcription
1145	   errors.  If used, it is recommended that the check character be
1146	   appended to the original Base Object Name string (ie, minus the check
1147	   character), that original string having been the basis for computing
1148	   the check character.

1150	3.7.  Sub-Object Naming

1152	   As mentioned previously, semantically opaque identifiers are very
1153	   useful for long-term naming of abstract objects, however, it may be
1154	   appropriate to extend these names with less opaque extensions that
1155	   reference contemporary service entry points (sub-objects) in support
1156	   of the object.  Sub-object extensions beginning with a digit or
1157	   underscore (`_') are reserved for the possibilty of developing a
1158	   future registry of canonical service points (e.g., numeric references
1159	   to versions, formats, languages, etc).

1161	4.  Finding a Name Mapping Authority

1163	   In order to derive an actionable identifier (these days, a URL) from
1164	   an ARK, a hostname (or hostname plus port combination) for a working
1165	   Name Mapping Authority (NMA) must be found.  An NMA is a service that
1166	   is able to respond to basic ARK service requests.  Relying on
1167	   registration and client-side discovery, NMAs make known which NAAs'
1168	   identifiers they are willing to service.

1170	   Upon encountering an ARK, a user (or client software) looks inside it
1171	   for the optional NMA part (the host part of the NMA's ARK service).
1172	   If it contains an NMA that is working, this NMA discovery step may be
1173	   skipped; the NMA effectively uses the beginning of an ARK to cache
1174	   the results of a prior mapping authority discovery process.  If a new
1175	   NMA needs to found, the client looks inside the ARK again for the
1176	   NAAN (Name Assigning Authority Number).  Querying a global database,
1177	   it then uses the NAAN to look up all current NMAs that service ARKs
1178	   issued by the identified NAA.

1180	   The global database is key, and ideally the lookup would be automatic
1181	   and transparent to the user.  For this, the most promising method is
1182	   probably the Name-to-Thing (N2T) Resolver [N2T] at n2t.net.  It is a
1183	   proposed low-cost, highly reliable, consortially maintained NMA that
1184	   simply exists to support actionable HTTP-based URLs for as long as
1185	   HTTP is used.  One of its big advantages over the other two methods
1186	   and the URN, Handle, DOI, and PURL methods, is that N2T addresses the
1187	   namespace splitting problem.  When objects maintained by one NMA are
1188	   inherited by more than one successor NMA, until now one of those
1189	   successors would be required to maintain forwarding tables on behalf
1190	   of the other successors.

1192	   There are two other ways to discover an NMA, one of them described in
1193	   a subsection below.  Another way, described in an appendix, is based
1194	   on a simplification of the URN resolver discovery method, itself very
1195	   similar in principle to the resolver discovery method used by Handles
1196	   and DOIs.  None of these methods does more than what can be done with
1197	   a very small, consortially maintained web server such as [N2T].

1199	   In the interests of long-term persistence, however, ARK mechanisms
1200	   are first defined in high-level, protocol-independent terms so that
1201	   mechanisms may evolve and be replaced over time without compromising
1202	   fundamental service objectives.  Either or both specific methods
1203	   given here may eventually be supplanted by better methods since, by
1204	   design, the ARK scheme does not depend on a particular method, but
1205	   only on having some method to locate an active NMA.

1207	   At the time of issuance, at least one NMA for an ARK should be
1208	   prepared to service it.  That NMA may or may not be administered by
1209	   the Name Assigning Authority (NAA) that created it.  Consider the
1210	   following hypothetical example of providing long-term access to a
1211	   cancer research journal.  The publisher wishes to turn a profit and
1212	   the National Library of Medicine wishes to preserve the scholarly
1213	   record.  An agreement might be struck whereby the publisher would act
1214	   as the NAA and the national library would archive the journal issue
1215	   when it appears, but without providing direct access for the first
1216	   six months.  During the first six months of peak commercial
1217	   viability, the publisher would retain exclusive delivery rights and
1218	   would charge access fees.  Again, by agreement, both the library and
1219	   the publisher would act as NMAs, but during that initial period the
1220	   library would redirect requests for issues less than six months old
1221	   to the publisher.  At the end of the waiting period, the library
1222	   would then begin servicing requests for issues older than six months
1223	   by tapping directly into its own archives.  Meanwhile, the publisher
1224	   might routinely redirect incoming requests for older issues to the
1225	   library.  Long-term access is thereby preserved, and so is the
1226	   commercial incentive to publish content.

1228	   Although it will be common for an NAA also to run an NMA service, it
1229	   is never a requirement.  Over time NAAs and NMAs will come and go.
1230	   One NMA will succeed another, and there might be many NMAs serving
1231	   the same ARKs simultaneously (e.g., as mirrors or as competitors).

1233	   There might also be asymmetric but coordinated NMAs as in the
1234	   library-publisher example above.

1236	4.1.  Looking Up NMAs in a Globally Accessible File

1238	   This subsection describes a way to look up NMAs using a simple name
1239	   authority table represented as a plain text file.  For efficient
1240	   access the file may be stored in a local filesystem, but it needs to
1241	   be reloaded periodically to incorporate updates.  It is not expected
1242	   that the size of the file or frequency of update should impose an
1243	   undue maintenance or searching burden any time soon, for even
1244	   primitive linear search of a file with ten-thousand NAAs is a
1245	   subsecond operation on modern server machines.  The proposed file
1246	   strategy is similar to the /etc/hosts file strategy that supported
1247	   Internet host address lookup for a period of years before the advent
1248	   of DNS.

1250	   The name authority table file is updated on an ongoing basis and is
1251	   available for copying over the internet from a number of mirror sites
1252	   [NAANregistry].  The file contains comment lines (lines that begin
1253	   with `#') explaining the format and giving the file's modification
1254	   time, reloading address, and NAA registration instructions.

1256	5.  Generic ARK Service Definition

1258	   An ARK request's output is delivered information; examples include
1259	   the object itself, a policy declaration (e.g., a promise of support),
1260	   a descriptive metadata record, or an error message.  The experience
1261	   of object delivery is expected to be an evolving mix of information
1262	   that reflects changing service expectations and technology
1263	   requirements; contemporary examples include such things as an object
1264	   summary and component links formatted for human consumption.  ARK
1265	   services must be couched in high-level, protocol-independent terms if
1266	   persistence is to outlive today's networking infrastructural
1267	   assumptions.  The high-level ARK service definitions listed below are
1268	   followed in the next section by a concrete method (one of many
1269	   possible methods) for delivering these services with today's
1270	   technology.  Note that some services may be invoked in one operation,
1271	   such as when an '?info' inflection returns both a description and a
1272	   permanence declaration for an object.

1274	5.1.  Generic ARK Access Service (access, location)

1276	   Returns (a copy of) the object or a redirect to the same, although a
1277	   sensible object proxy may be substituted.  Examples of sensible
1278	   substitutes include,

1280	   o  a table of contents instead of a large complex document,
1281	   o  a home page instead of an entire web site hierarchy,

1283	   o  a rights clearance challenge before accessing protected data,

1285	   o  directions for access to an offline object (e.g., a book),

1287	   o  a description of an intangible object (a disease, an event), or

1289	   o  an applet acting as "player" for a large multimedia object.

1291	   May also return a discriminated list of alternate object locators.
1292	   If access is denied, returns an explanation of the object's current
1293	   (perhaps permanent) inaccessibility.

1295	5.1.1.  Generic Policy Service (permanence, naming, etc.)

1297	   Returns declarations of policy and support commitments for given
1298	   ARKs.  Declarations are returned in either a structured metadata
1299	   format or a human readable text format; sometimes one format may
1300	   serve both purposes.  Policy subareas may be addressed in separate
1301	   requests, but the following areas should be covered: object
1302	   permanence, object naming, object fragment addressing, and
1303	   operational service support.

1305	   The permanence declaration for an object is a rating defined with
1306	   respect to an identified permanence provider (guarantor), which will
1307	   be the NMA.  It may include the following aspects.

1309	      (a) "object availability" -- whether and how access to the object
1310	      is supported (e.g., online 24x7, or offline only),

1312	      (b) "identifier validity" -- under what conditions the identifier
1313	      will be or has been re-assigned,

1315	      (c) "content invariance" -- under what conditions the content of
1316	      the object is subject to change, and

1318	      (d) "change history" -- access to corrections, migrations, and
1319	      revisions, whether through links to the changed objects themselves
1320	      or through a document summarizing the change history

1322	   A recent approach to persistence statements, conceived independently
1323	   from ARKs, can be found at [PStatements], with ongoing work available
1324	   at [ARKagency].  An older approach to a permanence rating framework
1325	   is given in [NLMPerm], which identified the following "permanence
1326	   levels":

1328	      Not Guaranteed: No commitment has been made to retain this
1329	      resource.  It could become unavailable at any time.  Its
1330	      identifier could be changed.

1332	      Permanent: Dynamic Content: A commitment has been made to keep
1333	      this resource permanently available.  Its identifier will always
1334	      provide access to the resource.  Its content could be revised or
1335	      replaced.

1337	      Permanent: Stable Content: A commitment has been made to keep this
1338	      resource permanently available.  Its identifier will always
1339	      provide access to the resource.  Its content is subject only to
1340	      minor corrections or additions.

1342	      Permanent: Unchanging Content: A commitment has been made to keep
1343	      this resource permanently available.  Its identifier will always
1344	      provide access to the resource.  Its content will not change.

1346	   Naming policy for an object includes an historical description of the
1347	   NAA's (and its successor NAA's) policies regarding differentiation of
1348	   objects.  Since it is the NMA that responds to requests for policy
1349	   statements, it is useful for the NMA to be able to produce or
1350	   summarize these historical NAA documents.  Naming policy may include
1351	   the following aspects.

1353	      (i) "similarity" -- (or "unity") the limit, defined by the NAA, to
1354	      the level of dissimilarity beyond which two similar objects
1355	      warrant separate identifiers but before which they share one
1356	      single identifier, and

1358	      (ii) "granularity" -- the limit, defined by the NAA, to the level
1359	      of object subdivision beyond which sub-objects do not warrant
1360	      separately assigned identifiers but before which sub-objects are
1361	      assigned separate identifiers.

1363	   Subnaming policy for an object describes the qualifiers that the NMA,
1364	   in fulfilling its ongoing and evolving service obligations, allows as
1365	   extensions to an NAA-assigned ARK.  To the conceptual object that the
1366	   NAA named with an ARK, the NMA may add component access points and
1367	   derivatives (e.g., format migrations in aid of preservation) in order
1368	   to provide both basic and value-added services.

1370	   Addressing policy for an object includes a description of how, during
1371	   access, object components (e.g., paragraphs, sections) or views
1372	   (e.g., image conversions) may or may not be "addressed", in other
1373	   words, how the NMA permits arguments or parameters to modify the
1374	   object delivered as the result of an ARK request.  If supported,
1375	   these sorts of operations would provide things like byte-ranged
1376	   fragment delivery and open-ended format conversions, or any set of
1377	   possible transformations that would be too numerous to list or to
1378	   identify with separately assigned ARKs.

1380	   Operational service support policy includes a description of general
1381	   operational aspects of the NMA service, such as after-hours staffing
1382	   and trouble reporting procedures.

1384	5.1.2.  Generic Description Service

1386	   Returns a description of the object.  Descriptions are returned in a
1387	   structured metadata format, a human-readable text format, or in one
1388	   format that serves both purposes (such as human-readable HTML with
1389	   embedded machine-readable metadata, or perhaps YAML).  A description
1390	   must at a minimum answer the who, what, when, and where questions
1391	   ("where" being the long-term identifier as opposed to a transient
1392	   redirect target) concerning an expression of the object.  Standalone
1393	   descriptions should be accompanied by the modification date and
1394	   source of the description itself.  May also return discriminated
1395	   lists of ARKs that are related to the given ARK.

1397	5.2.  Overview of The HTTP URL Mapping Protocol (THUMP)

1399	   The HTTP URL Mapping Protocol (THUMP) is a way of taking a key (any
1400	   identifier) and asking such questions as, what information does this
1401	   identify and how permanent is it?  [THUMP] is in fact one specific
1402	   method under development for delivering ARK services.  The protocol
1403	   runs over HTTP to exploit the web browser's current pre-eminence as
1404	   user interface to the Internet.  THUMP is designed so that a person
1405	   can enter ARK requests directly into the location field of current
1406	   browser interfaces.  Because it runs over HTTP, THUMP can be
1407	   simulated and tested via keyboard-based interactions [RFC0854].

1409	   The asker (a person or client program) starts with an identifier,
1410	   such as an ARK or a URL.  The identifier reveals to the asker (or
1411	   allows the asker to infer) the Internet host name and port number of
1412	   a server system that responds to questions.  Here, this is just the
1413	   NMA that is obtained by inspection and possibly lookup based on the
1414	   ARK's NAAN.  The asker then sets up an HTTP session with the server
1415	   system, sends a question via a THUMP request (contained within an
1416	   HTTP request), receives an answer via a THUMP response (contained
1417	   within an HTTP response), and closes the session.  That concludes the
1418	   connected portion of the protocol.

1420	   A THUMP request is a string of characters beginning with a `?'
1421	   (question mark) that is appended to the identifier string.  The
1422	   resulting string is sent as an argument to HTTP's GET command.
1423	   Request strings too long for GET may be sent using HTTP's POST
1424	   command.  The two most common requests correspond to two degenerate
1425	   special cases.  First, a simple key with no request at all is the
1426	   same as an ordinary access request.  Thus a plain ARK entered into a
1427	   browser's location field behaves much like a plain URL, and returns
1428	   access to the primary identified object, for instance, an HTML
1429	   document.

1431	   The second special case is a minimal ARK description request string
1432	   consisting of just "?info".  For example, entering the string,

1434	           n2t.net/ark:67531/metadc107835?info

1436	   into the browser's location field directly precipitates a request for
1437	   a metadata record describing the object identified by ark:67531/
1438	   metadc107835.  The browser, unaware of THUMP, prepares and sends an
1439	   HTTP GET request in the same manner as for a URL.  THUMP is designed
1440	   so that the response (indicated by the returned HTTP content type) is
1441	   normally displayed, whether the output is structured for machine
1442	   processing (text/plain) or formatted for human consumption (text/
1443	   html).  In addition to '?info', this specification reserves both '?'
1444	   and '??' (originally older forms) for future use.

1446	   The following example THUMP session assumes metadata being returned
1447	   by a resolver (as server) to a browser client.  Each line has been
1448	   annotated to include a line number and whether it was the client or
1449	   server that sent it.  Without going into much depth, the session has
1450	   four pieces separated from each other by blank lines: the client's
1451	   piece (lines 1-3), the server's HTTP/THUMP response headers (4-7),
1452	   and the body of the server's response (8-13).  The first and last
1453	   lines (1 and 13) correspond to the client's steps to start the TCP
1454	   session and the server's steps to end it, respectively.

1456	    1  C: [opens session]
1457	       C: GET https://n2t.net/ark:67531/metadc107835?info HTTP/1.1
1458	       C:
1459	       S: HTTP/1.1 200 OK
1460	    5  S: Content-Type: text/plain
1461	       S: THUMP-Status: 0.6 200 OK
1462	       S:
1463	       S: erc:
1464	       S: who:   Austin, Larry
1465	   10  S: what:  A Study of Rhythm in Bach's Orgelbuechlein
1466	       S: when:  1952
1467	       S: where: https://digital.library.unt.edu/ark:/67531/metadc107835
1468	       S: erc-support:
1469	       S: who:   University of North Texas Libraries
1470	   15  S: what:  Permanent: Stable Content:
1471	       S: when:  20081203
1472	       S: where: https://digital.library.unt.edu/ark:/67531/
1473	       S: [closes session]

1475	   The first two server response lines (4-5) above are typical of HTTP.
1476	   The next line (6) is peculiar to THUMP, and indicates the THUMP
1477	   version and a normal return status.

1479	   The balance of the response consists of a single metadata record
1480	   (8-17) that comprises the ARK description service response.  The
1481	   returned record is in the format of an Electronic Resource Citation
1482	   [ERC], which is discussed in overview in the next section.  For now,
1483	   note that it contains four elements that answer the top priority
1484	   questions regarding an expression of the object: who played a major
1485	   role in expressing it, what the expression was called, when it was
1486	   created, and where the expression may be found (note that "where" is
1487	   preferably a persistent, citable identifier rather than an unstable
1488	   URL sometimes mistakenly referred to as a "location").  This quartet
1489	   of elements comes up again and again in ERCs.  Lines 13-17 contain a
1490	   minimal persistence statement.

1492	   Each segment in an ERC tells a different story relating to the
1493	   object, so although the same four questions (elements) appear in
1494	   each, the answers depend on the segment's story type.  While the
1495	   first segment tells the story of an expression of the object, the
1496	   second segment tells the story of the support commitment made to it:
1497	   who made the commitment, what the nature of the commitment was, when
1498	   it was made, and where a fuller explanation of the commitment may be
1499	   found.

1501	5.3.  The Electronic Resource Citation (ERC)

1503	   An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a
1504	   kind of object description that uses Dublin Core Kernel metadata
1505	   elements [DCKernel].  The ERC with Kernel elements provides a simple,
1506	   compact, and printable record for holding data associated with an
1507	   information resource.  As originally designed [Kernel], Kernel
1508	   metadata balances the needs for expressive power, very simple machine
1509	   processing, and direct human manipulation.  The ERC sense of
1510	   "citation" is not limited to the traditional referencing of a result
1511	   or information fixed in time on a printed page, but to a more general
1512	   kind of reference, both backward, to digital material that cannot be
1513	   known to be fixed in time (true of virtually all online information),
1514	   and forward, to material that is all the more valuable for improving
1515	   or evolving over time.

1517	   The previous section shows two limited examples of what is fully
1518	   described elsewhere [ERC].  The rest of this short section provides
1519	   some of the background and rationale for this record format.

1521	   A founding principle of Kernel metadata is that direct human contact
1522	   with metadata will be a necessary and sufficient condition for the
1523	   near term rapid development of metadata standards, systems, and
1524	   services.  Thus the machine-processable Kernel elements must only
1525	   minimally strain people's ability to read, understand, change, and
1526	   transmit ERCs without their relying on intermediation with
1527	   specialized software tools.  The basic ERC needs to be succinct,
1528	   transparent, and trivially parseable by software.

1530	   Borrowing from the data structuring format that underlies the
1531	   successful spread of email and web services, the ERC format uses
1532	   [ANVL], which is based on email and HTTP headers [RFC2822].  There is
1533	   a naturalness to ANVL's label-colon-value format (seen in the
1534	   previous section) that barely needs explanation to a person beginning
1535	   to enter ERC metadata.

1537	   While ANVL elements are expected at the top level and don't
1538	   themselves support hierarchy, the value of an ANVL element may be an
1539	   arbitrary encoded hierarchy of JSON or XML.  Typically, the name of
1540	   such an ANVL element ends in "json" or "xml", for example, "json" or
1541	   "geojson".  Care should be taken to escape structural characters that
1542	   appear in element names and values, specifically, line terminators
1543	   (both newlines ("\n") and carriage returns ("\r")) and, in element
1544	   names, colons (":").

1546	   Besides simplicity of ERC system implementation and data entry
1547	   mechanics, ERC semantics (what the record and its constituent parts
1548	   mean) must also be easy to explain.  ERC semantics are based on a
1549	   reformulation and extension of the Dublin Core [RFC5013] hypothesis,
1550	   which suggests that the fifteen Dublin Core metadata elements have a
1551	   key role to play in cross-domain resource description.  The ERC
1552	   design recognizes that the Dublin Core's primary contribution is the
1553	   international, interdisciplinary consensus that identified fifteen
1554	   semantic buckets (element categories), regardless of how they are
1555	   labeled.  The ERC then adds a definition for a record and some
1556	   minimal compliance rules.  In pursuing the limits of simplicity, the
1557	   ERC design combines and relabels some Dublin Core buckets to isolate
1558	   a tiny kernel (subset) of four elements for basic cross-domain
1559	   resource description.

1561	   For the cross-domain kernel, the ERC uses the four basic elements --
1562	   who, what, when, and where -- to pretend that every object in the
1563	   universe can have a uniform minimal description.  Each has a name or
1564	   other identifier, a locator (a means to access it), some responsible
1565	   person or party, and a date.  It doesn't matter what type of object
1566	   it is, or whether one plans to read it, interact with it, smoke it,
1567	   wear it, or navigate it.  Of course, this approach is flawed because
1568	   uniformity of description for some object types requires more
1569	   semantic contortion and sacrifice than for others.  That is why at
1570	   the beginning of this document, the ARK was said to be suited to
1571	   objects that accommodate reasonably regular electronic description.

1573	   While insisting on uniformity at the most basic level provides
1574	   powerful cross-domain leverage, the semantic sacrifice is great for
1575	   many applications.  So the ERC also permits a semantically rich and
1576	   nuanced description to co-exist in a record along with a basic
1577	   description.  In that way both sophisticated and naive recipients of
1578	   the record can extract the level of meaning from it that best suits
1579	   their needs and abilities.  Key to unlocking the richer description
1580	   is a controlled vocabulary of ERC record types (not explained in this
1581	   document) that permit knowledgeable recipients to apply defined sets
1582	   of additional assumptions to the record.

1584	5.4.  Advice to Web Clients

1586	   ARKs are envisaged to appear wherever durable object references are
1587	   planned.  Library cataloging records, literature citations, and
1588	   bibliographies are important examples.  In many of these places URLs
1589	   (Uniform Resource Locators) are currently used, and inside some of
1590	   those URLs are embedded URNs, Handles, and DOIs.  Unfortunately,
1591	   there's no suggestion of a way to probe for extra services that would
1592	   build confidence in those identifiers; in other words, there's no way
1593	   to tell whether any of those identifiers is any better managed than
1594	   the average URL.

1596	   ARKs are also envisaged to appear in hypertext links (where they are
1597	   not normally shown to users) and in rendered text (displayed or
1598	   printed).  A normal HTML link for which the URL is not displayed
1599	   looks like this.

1601	   <a href = "https://example.org/index.htm"> Click Here <a>

1603	   A URL with an embedded ARK invites access (via `?info') to extra
1604	   services:

1606	   <a href = "https://example.org/ark:14697/b12345x"> Click Here <a>

1608	   Using the [N2T] resolver to provide identifier-scheme-agnostic
1609	   protection against hostname instability, this ARK could be published
1610	   as:

1612	   <a href = "https://n2t.net/ark:14697/b12345x"> Click Here <a>

1614	   An NAA will typically make known the associations it creates by
1615	   publishing them in catalogs, actively advertizing them, or simply
1616	   leaving them on web sites for visitors (e.g., users, indexing
1617	   spiders) to stumble across in browsing.

1619	5.5.  Security Considerations

1621	   The ARK naming scheme poses no direct risk to computers and networks.
1622	   Implementors of ARK services need to be aware of security issues when
1623	   querying networks and filesystems for Name Mapping Authority
1624	   services, and the concomitant risks from spoofing and obtaining
1625	   incorrect information.  These risks are no greater for ARK mapping
1626	   authority discovery than for other kinds of service discovery.  For
1627	   example, recipients of ARKs with a specified host (NMA) should treat
1628	   it like a URL and be aware that the identified ARK service may no
1629	   longer be operational.

1631	   Apart from mapping authority discovery, ARK clients and servers
1632	   subject themselves to all the risks that accompany normal operation
1633	   of the protocols underlying mapping services (e.g., HTTP, Z39.50).
1634	   As specializations of such protocols, an ARK service may limit
1635	   exposure to the usual risks.  Indeed, ARK services may enhance a kind
1636	   of security by helping users identify long-term reliable references
1637	   to information objects.

1639	6.  References

1641	   [ANVL]     Kunze, J., Kahle, B., Masanes, J., and G. Mohr, "A Name-
1642	              Value Language", 2005,
1643	              <https://n2t.net/ark:/13030/c7x921j3h>.

1645	   [ARK]      Kunze, J., "Towards Electronic Persistence Using ARK
1646	              Identifiers", IWAW/ECDL Annual Workshop Proceedings 3rd,
1647	              August 2003, <https://n2t.net/ark:/13030/c7n00zt1z>.

1649	   [ARKagency]
1650	              Alliance, A., "ARK Maintenance Agency", 2021,
1651	              <https://arks.org>.

1653	   [DCKernel]
1654	              Initiative, D. C. M., "Kernel Metadata Working Group",
1655	              2001-2008, <https://dublincore.org/groups/kernel/>.

1657	   [DOI]      Foundation, I. D., "The Digital Object Identifier (DOI)
1658	              System", February 2001, <https://dx.doi.org/10.1000/203>.

1660	   [ERC]      Kunze, J. and A. Turner, "Kernel Metadata and Electronic
1661	              Resource Citations", October 2007,
1662	              <https://n2t.net/ark:/13030/c7sn0141m>.

1664	   [Handle]   Lannom, L., "Handle System Overview", ICSTI Forum No. 30,
1665	              April 1999, <https://eric.ed.gov/?id=ED450775>.

1667	   [Kernel]   Kunze, J., "A Metadata Kernel for Electronic Permanence",
1668	              Journal of Digital Information Vol 2, Issue 2,
1669	              ISSN 1368-7506, January 2002,
1670	              <https://n2t.net/ark:/13030/c7rr1pm49>.

1672	   [N2T]      Alliance, A., "Name-to-Thing Resolver", August 2006,
1673	              <https://n2t.net>.

1675	   [NAANregistry]
1676	              ARKs.org, "NAAN Registry", 2019,
1677	              <https://arks.org/e/pub/naan_registry.txt>.

1679	   [NAANrequest]
1680	              ARKs.org, "NAAN Request Form", 2018,
1681	              <https://n2t.net/e/naan_request>.

1683	   [NLMPerm]  Byrnes, M., "Permanence Levels and the Archives for NLM's
1684	              Permanent Web Documents", March 2005,
1685	              <https://www.nlm.nih.gov/pubs/techbull/ma05/
1686	              ma05_archive.html>.

1688	   [NOID]     Kunze, J., "Nice Opaque Identifiers", April 2006,
1689	              <https://metacpan.org/pod/distribution/Noid/noid>.

1691	   [PStatements]
1692	              Kunze, J., "Persistence statements: describing digital
1693	              stickiness", October 2016,
1694	              <https://n2t.net/ark:/13030/c7833mx7t>.

1696	   [PURL]     Shafer, K., "Introduction to Persistent Uniform Resource
1697	              Locators", 1996,
1698	              <https://www.internetsociety.org/inet96/proceedings/a4/
1699	              a4_1.htm>.

1701	   [RFC0854]  Postel, J. and J. Reynolds, "Telnet Protocol
1702	              Specification", STD 8, RFC 854, DOI 10.17487/RFC0854, May
1703	              1983, <https://www.rfc-editor.org/info/rfc854>.

1705	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
1706	              STD 13, RFC 1034, DOI 10.17487/RFC1034, November 1987,
1707	              <https://www.rfc-editor.org/info/rfc1034>.

1709	   [RFC2141]  Moats, R., "URN Syntax", RFC 2141, DOI 10.17487/RFC2141,
1710	              May 1997, <https://www.rfc-editor.org/info/rfc2141>.

1712	   [RFC2288]  Lynch, C., Preston, C., and R. Daniel, "Using Existing
1713	              Bibliographic Identifiers as Uniform Resource Names",
1714	              RFC 2288, DOI 10.17487/RFC2288, February 1998,
1715	              <https://www.rfc-editor.org/info/rfc2288>.

1717	   [RFC2611]  Daigle, L., van Gulik, D., Iannella, R., and P. Faltstrom,
1718	              "URN Namespace Definition Mechanisms", BCP 33, RFC 2611,
1719	              DOI 10.17487/RFC2611, June 1999,
1720	              <https://www.rfc-editor.org/info/rfc2611>.

1722	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1723	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
1724	              Transfer Protocol -- HTTP/1.1", RFC 2616,
1725	              DOI 10.17487/RFC2616, June 1999,
1726	              <https://www.rfc-editor.org/info/rfc2616>.

1728	   [RFC2822]  Resnick, P., Ed., "Internet Message Format", RFC 2822,
1729	              DOI 10.17487/RFC2822, April 2001,
1730	              <https://www.rfc-editor.org/info/rfc2822>.

1732	   [RFC2915]  Mealling, M. and R. Daniel, "The Naming Authority Pointer
1733	              (NAPTR) DNS Resource Record", RFC 2915,
1734	              DOI 10.17487/RFC2915, September 2000,
1735	              <https://www.rfc-editor.org/info/rfc2915>.

1737	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1738	              Resource Identifier (URI): Generic Syntax", STD 66,
1739	              RFC 3986, DOI 10.17487/RFC3986, January 2005,
1740	              <https://www.rfc-editor.org/info/rfc3986>.

1742	   [RFC5013]  Kunze, J. and T. Baker, "The Dublin Core Metadata Element
1743	              Set", RFC 5013, DOI 10.17487/RFC5013, August 2007,
1744	              <https://www.rfc-editor.org/info/rfc5013>.

1746	   [THUMP]    Gamiel, K. and J. Kunze, "The HTTP URL Mapping Protocol",
1747	              August 2007, <https://www.ietf.org/archive/id/draft-kunze-
1748	              thump-03.txt>.

1750	Appendix A.  ARK Maintenance Agency: arks.org

1752	   The ARK Maintenance Agency [ARKagency] at arks.org has several
1753	   functions.

1755	   o  To manage the registry of organizations that will be assigning
1756	      ARKs.  Organizations can request or update a NAAN by filling out a
1757	      form [NAANrequest].

1759	   o  To be a clearinghouse for information about ARKs, such as best
1760	      practices, introductory documentation, tutorials, community
1761	      forums, etc.  These supplemental resources help ARK implementor in
1762	      high-level applications across different sectors and disciplines,
1763	      and with a variety of metadata standards.

1765	   o  To be a locus of discussion about future versions of the ARK
1766	      specification.

1768	Appendix B.  Looking up NMAs Distributed via DNS

1770	   This subsection introduces an older method for looking up NMAs that
1771	   is based on the method for discovering URN resolvers described in
1772	   [RFC2915].  It relies on querying the DNS system already installed in
1773	   the background infrastructure of most networked computers.  A query
1774	   is submitted to DNS asking for a list of resolvers that match a given
1775	   NAAN.  DNS distributes the query to the particular DNS servers that
1776	   can best provide the answer, unless the answer can be found more
1777	   quickly in a local DNS cache as a side-effect of a recent query.
1778	   Responses come back inside Name Authority Pointer (NAPTR) records.
1779	   The normal result is one or more candidate NMAs.

1781	   In its full generality the [RFC2915] algorithm ambitiously
1782	   accommodates a complex set of preferences, orderings, protocols,
1783	   mapping services, regular expression rewriting rules, and DNS record
1784	   types.  This subsection proposes a drastic simplification of it for
1785	   the special case of ARK mapping authority discovery.  The simplified
1786	   algorithm is called Maptr.  It uses only one DNS record type (NAPTR)
1787	   and restricts most of its field values to constants.  The following
1788	   hypothetical excerpt from a DNS data file for the NAAN known as 12026
1789	   shows three example NAPTR records ready to use with the Maptr
1790	   algorithm.

1792	     12026.ark.arpa.
1793	     ;; US Library of Congress
1794	     ;;       order pref flags service regexp  replacement
1795	      IN NAPTR  0     0   "h"  "ark"   "USLC"  lhc.nlm.nih.gov:8080
1796	      IN NAPTR  0     0   "h"  "ark"   "USLC"  foobar.zaf.org
1797	      IN NAPTR  0     0   "h"  "ark"   "USLC"  sneezy.dopey.com

1799	   All the fields are held constant for Maptr except for the "flags",
1800	   "regexp", and "replacement" fields.  The "service" field contains the
1801	   constant value "ark" so that NAPTR records participating in the Maptr
1802	   algorithm will not be confused with other NAPTR records.  The "order"
1803	   and "pref" fields are held to 0 (zero) and otherwise ignored for now;
1804	   the algorithm may evolve to use these fields for ranking decisions
1805	   when usage patterns and local administrative needs are better
1806	   understood.

1808	   When a Maptr query returns a record with a flags field of "h" (for
1809	   host, a Maptr extension to the NAPTR flags), the replacement field
1810	   contains the NMA (host) of an ARK service provider.  When a query
1811	   returns a record with a flags field of "" (the empty string), the
1812	   client needs to submit a new query containing the domain name found
1813	   in the replacement field.  This second sort of record exploits the
1814	   distributed nature of DNS by redirecting the query to another domain
1815	   name.  It looks like this.

1817	     12345.ark.arpa.
1818	     ;; Digital Library Consortium
1819	     ;;       order pref flags service regexp replacement
1820	      IN NAPTR  0     0    ""  "ark"     ""   dlc.spct.org.

1822	   Here is the Maptr algorithm for ARK mapping authority discovery.  In
1823	   it replace <NAAN> with the NAAN from the ARK for which an NMA is
1824	   sought.

1826	   1.  Initialize the DNS query: type=NAPTR, query=<NAAN>.ark.arpa.

1828	   2.  Submit the query to DNS and retrieve (NAPTR) records, discarding
1829	       any record that does not have "ark" for the service field.

1831	   3.  All remaining records with a flags fields of "h" contain
1832	       candidate NMAs in their replacement fields.  Set them aside, if
1833	       any.

1835	   4.  Any record with an empty flags field ("") has a replacement field
1836	       containing a new domain name to which a subsequent query should
1837	       be redirected.  For each such record, set query=<replacement>
1838	       then go to step (2).  When all such records have been recursively
1839	       exhausted, go to step (5).

1841	   5.  All redirected queries have been resolved and a set of candidate
1842	       NMAs has been accumulated from steps (3).  If there are zero
1843	       NMAs, exit -- no mapping authority was found.  If there is one or
1844	       more NMA, choose one using any criteria you wish, then exit.

1846	   A Perl script that implements this algorithm is included here.

1848	   #!/depot/bin/perl

1850	   use Net::DNS;                           # include simple DNS package
1851	   my $qtype = "NAPTR";                    # initialize query type
1852	   my $naa = shift;                        # get NAAN script argument
1853	   my $mad = new Net::DNS::Resolver;       # mapping authority discovery

1855	   &maptr("$naa.ark.arpa");                # call maptr - that's it

1857	   sub maptr {                             # recursive maptr algorithm
1858	           my $dname = shift;              # domain name as argument
1859	           my ($rr, $order, $pref, $flags, $service, $regexp,
1860	                   $replacement);
1861	           my $query = $mad->query($dname, $qtype);
1862	           return                          # non-productive query
1863	                   if (! $query || ! $query->answer);
1864	           foreach $rr ($query->answer) {
1865	                   next                    # skip records of wrong type
1866	                           if ($rr->type ne $qtype);
1867	                   ($order, $pref, $flags, $service, $regexp,
1868	                           $replacement) = split(/\s/, $rr->rdatastr);
1869	                   if ($flags eq "") {
1870	                           &maptr($replacement);   # recurse
1871	                   } elsif ($flags eq "h") {
1872	                           print "$replacement\n"; # candidate NMA
1873	                   }
1874	           }
1875	   }
1876	   The global database thus distributed via DNS and the Maptr algorithm
1877	   can easily be seen to mirror the contents of the Name Authority
1878	   Table file described in the previous section.

1880	Authors' Addresses

1882	   John A. Kunze
1883	   California Digital Library
1884	   1111 Franklin Street
1885	   Oakland, CA  94607
1886	   USA

1888	   Email: jak@ucop.edu

1890	   Emmanuelle Bermes
1891	   Bibliotheque nationale de France
1892	   Quai Francois Mauriac
1893	   Paris  75706
1894	   France

1896	   Email: emmanuelle.bermes@bnf.fr