idnits 2.17.1 

draft-kunze-ark-08.txt:
-(62): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(289): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(352): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(399): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(419): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(442): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(444): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(450): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(452): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(806): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(840): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(844): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1021): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1056): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1064): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1077): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1206): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1230): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1401): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1403): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1404): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1414): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1416): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1419): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1443): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1444): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1445): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1469): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1485): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1502): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1512): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1561): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1667): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1673): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1676): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1677): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding
-(1797): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == There are 59 instances of lines with non-ascii characters in the
     document.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 41
     longer pages, the longest (page 2) being 63 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 42 pages


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 11 instances of too long lines in the document, the longest
     one being 7 characters in excess of 72.

  == There are 15 instances of lines with non-RFC2606-compliant FQDNs in the
     document.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 575 has weird spacing: '...eful to  remem...'

  == Line 795 has weird spacing: '... regexp  repla...'

  == Line 1898 has weird spacing: '...for the  purpo...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (31 July 2004) is 7202 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'MD5' is defined on line 1750, but no explicit
     reference was found in the text

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ARK'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DCORE'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DERC'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'DOI'

  ** Obsolete normative reference: RFC  822 (ref. 'EMHDRS') (Obsoleted by RFC
     2822)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ERC'

  ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref.
     'MD5')

  ** Obsolete normative reference: RFC 2915 (ref. 'NAPTR') (Obsoleted by RFC
     3401, RFC 3402, RFC 3403, RFC 3404)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'NLMPerm'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'PURL'

  -- Possible downref: Non-RFC (?) normative reference: ref. 'THUMP'

  ** Obsolete normative reference: RFC 2396 (ref. 'URI') (Obsoleted by RFC
     3986)

  ** Downref: Normative reference to an Informational RFC: RFC 2288 (ref.
     'URNBIB')

  ** Obsolete normative reference: RFC 2141 (ref. 'URNSYN') (Obsoleted by RFC
     8141)

  ** Obsolete normative reference: RFC 2611 (ref. 'URNNID') (Obsoleted by RFC
     3406)


     Summary: 14 errors (**), 0 flaws (~~), 10 warnings (==), 11 comments
     (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet-Draft: draft-kunze-ark-08.txt                          J. Kunze
3	ARK Identifier Scheme                    University of California (UCOP)
4	Expires 31 January 2005                                 R. P. C. Rodgers
5	                                         US National Library of Medicine
6	                                                            31 July 2004

8	                  The ARK Persistent Identifier Scheme

10	      (http://www.ietf.org/internet-drafts/draft-kunze-ark-08.txt)

12	Status of this Document

14	   This document is an Internet-Draft and is in full conformance with
15	   all provisions of Section 10 of RFC2026.

17	   Internet-Drafts are working documents of the Internet Engineering
18	   Task Force (IETF), its areas, and its working groups.  Note that
19	   other groups may also distribute working documents as Internet-
20	   Drafts.

22	   Internet-Drafts are draft documents valid for a maximum of six months
23	   and may be updated, replaced, or obsoleted by other documents at any
24	   time.  It is inappropriate to use Internet-Drafts as reference
25	   material or to cite them other than as ``work in progress.''

27	   The list of current Internet-Drafts can be accessed at
28	   http://www.ietf.org/ietf/1id-abstracts.txt

30	   The list of Internet-Draft Shadow Directories can be accessed at
31	   http://www.ietf.org/shadow.html.

33	   Distribution of this document is unlimited.  Please send comments to
34	   jak@ucop.edu.

36	   Copyright (C) The Internet Society (2004).  All Rights Reserved.

38	Abstract

40	   The ARK (Archival Resource Key) is a scheme intended to facilitate
41	   the persistent naming and retrieval of information objects.  It
42	   comprises an identifier syntax and three services.  An ARK has four
43	   components:

45	                    [http://NMAH/]ark:/NAAN/Name

47	   an optional and mutable Name Mapping Authority Hostport part (NMAH,
48	   where "hostport" is a hostname followed optionally by a colon and
49	   port number), the "ark:" label, the Name Assigning Authority Number
50	   (NAAN), and the assigned Name.  The NAAN and Name together form the
51	   immutable persistent identifier for the object.

53	   An ARK request is an ARK with a service request and a question mark
54	   appended to it.  Use of an ARK request proceeds in two steps.  First,
55	   the NMAH, if not specified, is discovered based on the NAAN.  Two
56	   discovery methods are proposed:  one is file based, the other based
57	   on the DNS NAPTR record.  Second, the ARK request is submitted to the
58	   NMAH.  Three ARK services are defined, gaining access to:  (1) the
59	   object (or a sensible substitute), (2) a description of the object
60	   (metadata), and (3) a description of the commitment made by the NMA
61	   regarding the persistence of the object (policy).  These services are
62	   defined initially to use the HTTP protocol.  When the NMAH is speci�
63	   fied, the ARK is a valid URL that can gain access to ARK services
64	   using an unmodified Web client.

66	1.  Introduction

68	   This document describes a scheme for the high-quality naming of
69	   information resources.  The scheme, called the Archival Resource Key
70	   (ARK), is well suited to long-term access and identification for any
71	   information resources that accommodate reasonably regular electronic
72	   description.  This includes digital documents, databases, software,
73	   and websites, as well as physical objects (such as books, bones, and
74	   statues) and intangible objects (chemicals, diseases, vocabulary
75	   terms, performances).  Hereafter the term "object" refers to an
76	   information resource.  The term ARK itself refers both to the scheme
77	   and to any single identifier that conforms to it.  A reasonably
78	   concise and accessible overview and rationale for the scheme is
79	   available at [ARK].

81	   Schemes for persistent identification of network-accessible objects
82	   are not new.  In the early 1990's, the design of the Uniform Resource
83	   Name [URNSYN] responded to the observed failure rate of URLs by
84	   articulating an indirect, non-hostname-based naming scheme and the
85	   need for responsible name management.  Meanwhile, promoters of the
86	   Digital Object Identifier [DOI] succeeded in building a community of
87	   providers around a mature software system that supports name
88	   management.  The Persistent Uniform Resource Locator [PURL] was a
89	   third scheme that has the unique advantage of working with unmodified
90	   web browsers.  The ARK scheme is a new approach.

92	   A founding principle of the ARK is that persistence is purely a
93	   matter of service.  Persistence is neither inherent in an object nor
94	   conferred on it by a particular naming syntax.  Rather, persistence
95	   is achieved through a provider's successful stewardship of objects
96	   and their identifiers.  The highest level of persistence will be
97	   reinforced by a provider's robust contingency, redundancy, and
98	   succession strategies.  It is further safeguarded to the extent that
99	   a provider's mission is shielded from marketplace and political
100	   instabilities.

102	1.1.  Three Reasons to Use ARKs

104	   The first requirement of an ARK is to give users a link from an
105	   object to a promise of stewardship for it.  That promise is a multi-
106	   faceted covenant that binds the word of an identified service
107	   provider to a specific set of responsibilities.  No one can tell if
108	   successful stewardship will take place because no one can predict the
109	   future.  Reasonable conjecture, however, may be based on past
110	   performance.  There must be a way to tie a promise of persistence to
111	   a provider's demonstrated or perceived ability -- its reputation --
112	   in that arena.  Provider reputations would then rise and fall as
113	   promises are observed variously to be kept and broken.  This is
114	   perhaps the best way we have for gauging the strength of any
115	   persistence promise.

117	   The second requirement of an ARK is to give users a link from an
118	   object to a description of it.  The problem with a naked identifier
119	   is that without a description real identification is incomplete.
120	   Identifiers common today are relatively opaque, though some contain
121	   ad hoc clues that reflect fleeting life cycle events such as the
122	   address of a short stay in a filesystem hierarchy.  Possession of
123	   both an identifier and an object is some improvement, but positive
124	   identification may still be elusive since the object itself might not
125	   include a matching identifier or might not carry evidence obvious
126	   enough to reveal its identity without significant research.  In
127	   either case, what is called for is a record bearing witness to the
128	   identifier's association with the object, as supported by a recorded
129	   set of object characteristics.  This descriptive record is partly an
130	   identification "receipt" with which users and archivists can verify
131	   an object's identity after brief inspection and a plausible match
132	   with recorded characteristics such as title and size.

134	   The final requirement of an ARK is to give users a link to the object
135	   itself (or to a copy) if at all possible.  Persistent access is the
136	   central duty of an ARK, with persistent identification playing a
137	   vital but supporting role.  Object access may not be feasible for
138	   various reasons, such as catastrophic loss of the object, a licensing
139	   agreement that keeps an archive "dark" for a period of years, or when
140	   an object's own lack of tangible existence precludes normal concepts
141	   of access (e.g., a vocabulary term might be accessed through its
142	   definition).  In such cases the ARK's identification role assumes a
143	   much higher profile.  But attempts to simplify the persistence
144	   problem by decoupling access from identification and concentrating
145	   exclusively on the latter are of questionable utility.  A perfect
146	   system for assigning forever unique identifiers might be created, but
147	   if it did so without reducing access failure rates, no one would be
148	   interested.  The central issue -- which may be summed up as the "HTTP
149	   404 Not Found" problem -- would not have been addressed.

151	1.2.  Organizing Support for ARKs

153	   An organization and the user community it serves can often be seen to
154	   struggle with two different areas of persistent identification: the
155	   Our Stuff problem and the Their Stuff problem.  In the Our Stuff
156	   problem, the organization wants its "own" objects to acquire
157	   persistent names.  It possesses or controls these objects, so our
158	   organization tackles the Our Stuff problem directly; being the
159	   responsible party, it can plan for, maintain, project, and make
160	   commitments about the objects.

162	   In the Their Stuff problem, the organization wants others' objects to
163	   acquire persistent names, in other words, objects that it does not
164	   own or control.  Some of these objects will be critically important
165	   to the organization but beyond its influence as far as persistence
166	   support is concerned.  As a result, creating and maintaining
167	   persistent identifiers for Their Stuff is difficult.

169	   Co-location of persistent access and identification services is
170	   natural.  Any organization that undertakes ongoing support of true
171	   persistent identification (which includes description) is well-served
172	   if it controls, owns, or otherwise has clear internal access to the
173	   identified objects, and this gives it an advantage if it wishes also
174	   to support persistent external access.  Conversely, persistent
175	   external access requires orderly internal collection management and
176	   all that that entails including monitoring, acquisition,
177	   verification, and change control over objects carrying identifiers
178	   persistent enough to support accountable record keeping practices;
179	   this covers the major prerequisite for external support of persistent
180	   identification.  Organizing ARK services under one roof thus tends to
181	   make sense.

183	   ARK support is not for everybody.  By requiring specific, revealed
184	   commitments to preservation, object access, and description, the bar
185	   for providing ARK services is high.  On the other hand, it would be
186	   hard to grant credence to a persistence promise from an organization
187	   that could not muster the minimum ARK services.  Not that there isn't
188	   a business model for an ARK-like, description-only service built on
189	   top of another organization's full complement of ARK services.  For
190	   example, there might be competition at the description level for
191	   abstracting and indexing a body of scientific literature archived in
192	   a combination of open and fee-based repositories.  Such a business
193	   would benefit more from persistence than it would directly support
194	   it.

196	1.3.  A Definition of Identifier

198	   Heretofore, persistence discussion has been hampered by a borrowed
199	   meaning for "identifier" that emerged as a side effect of defining
200	   the Uniform Resource Identifier in [URI]:

202	        (formerly)  An identifier is a sequence of characters with a
203	        restricted syntax ... that can act as a reference to something
204	        that has identity.

206	   The term works in context, but falters when employed for persistence.
207	   Troubling phrases arise, such as,

209	        "The goal is to create an identifier that does not break."

211	   As defined this kind of identifier "breaks" when it sustains damage
212	   to its character sequence, but really what breaks has to do with the
213	   identifier's reference role.  The following definition is proposed.

215	        (new definition)  An identifier is an association between a
216	        string (a sequence of characters) and an information resource.
217	        That association is made manifest by a record (e.g., a
218	        cataloging or other metadata record) that binds the identifier
219	        string to a set of identifying resource characteristics.

221	   The identifier (the association) must be vouched for by some sort of
222	   record.  In the complete absence of any testimony (e.g., metadata)
223	   regarding an association, a would-be identifier string is a
224	   meaningless sequence of characters.  To keep an externally visible
225	   but otherwise internal identifier string opaque to outsiders, for
226	   example, it suffices for an organization not to disclose the nature
227	   of its association.  For our immediate purpose, actual existence of
228	   an association record is more important than its authenticity.  If
229	   one is lucky an object carries its own identifier as part of itself
230	   (e.g., imprinted on the first page), but in processes such as
231	   resource discovery and retrieval the typical object is often unwieldy
232	   or unavailable (such as when licensing restrictions are in effect).
233	   A metadata record that includes the identifier string is the next
234	   best thing -- a conveniently manipulable surrogate that can act as
235	   both an association "receipt" and "declaration".

237	   It now makes sense to speak of preventing an identifier, as an
238	   association, from breaking.  Having said that, this document still
239	   (ab)uses the terms "ARK" and "identifier" as shorthands to refer to
240	   identifier strings, in other words, to sequences of characters.  Thus
241	   a discussion of ARK syntax refers to a string format, not an
242	   association format.  The context should make the meaning clear.

244	2.  ARK Anatomy

246	   An ARK is represented by a sequence of characters (a string) that
247	   contains the label, "ark:", optionally preceded by the beginning part
248	   of a URL.  Here is a diagrammed example.

250	               http://foobar.zaf.org/ark:/12025/654xz321
251	               \___________________/ \__/ \___/ \______/
252	                 (replaceable)        |     |      |
253	                      |         ARK Label   |    Name (assigned by the NAA)
254	                      |                     |
255	        Name Mapping Authority             Name Assigning Authority
256	               Hostport (NMAH)              Number (NAAN)

258	   The ARK syntax can be summarized,

260	                    [http://NMAH/]ark:/NAAN/Name

262	   where the NMAH part is in brackets to indicate that it is temporary,
263	   replaceable, and optional.

265	2.1.  The Name Mapping Authority Hostport (NMAH)

267	   Before the "ark:" label may appear an optional Name Mapping Authority
268	   Hostport (NMAH) that is a temporary address where ARK service
269	   requests may be sent.  It consists of "http://" (or any service
270	   specification valid for a URL) followed by an Internet hostname or
271	   hostport combination having the same format and semantics as the
272	   hostport part of a URL.  The most important thing about the NMAH is
273	   that it is "identity inert" from the point of view of object
274	   identification.  In other words, ARKs that differ only in the
275	   optional NMAH part identify the same object.  Thus, for example, the
276	   following three ARKs are synonyms for but one information resource:

278	               http://foobar.zaf.org/ark:/12025/654xz321
279	             http://sneezy.dopey.com/ark:/12025/654xz321
280	                                     ark:/12025/654xz321

282	   The NMAH part makes an ARK into an actionable URL.  Conversely, any
283	   URL whose path component begins with "ark:/" stands a reasonable
284	   chance of being an ARK (only because such URLs are not common), but
285	   further verification is still required (such as probing the URL for
286	   the three ARK services).

288	   The NMAH part is temporary, disposable, and replaceable.  Over time
289	   the NMAH will likely stop working and have to be replaced with a cur�
290	   rently active service provider.  This relies on a mapping authority
291	   discovery process, of which two alternate methods are outlined in a
292	   later section.  Meanwhile, a carefully chosen NMAH can be as durable
293	   as any Internet domain name, and so may last for a decade or longer.
294	   Users should be prepared, however, to refresh the NMAH because the
295	   one found in the URL form of the ARK may have stopped working.

297	   The above method for creating an actionable identifier from a basic
298	   ARK (prepending "http://" and an NMAH) is itself temporary.  Assuming
299	   that the reign of [HTTP] in information retrieval will end one day,
300	   ARKs will have to be converted into new kinds of actionable identi�
301	   fiers.  In any event, if ARKs see widespread use, web browsers would
302	   presumably evolve to perform this (currently simple) transformation
303	   automatically.

305	2.2.  The Name Assigning Authority Number (NAAN)

307	   The part of the ARK directly following the "ark:" is the Name
308	   Assigning Authority Number (NAAN) enclosed in `/' (slash) characters.
309	   This part is always required, as it identifies the organization that
310	   originally assigned the Name of the object.  It is used to discover a
311	   currently valid NMAH and to provide top-level partitioning of the
312	   space of all ARKs.  NAANs are registered in a manner similar to URN
313	   Namespaces, but they are pure numbers consisting of 5 digits or 9
314	   digits.  Thus, the first 100,000 registered NAAs fit compactly into
315	   the 5 digits, and if growth warrants, the next billion fit into the 9
316	   digit form.  In either case the fixed odd number of digits helps
317	   reduce the chances of finding a NAAN out of context and confusing it
318	   with nearby quantities such as 4-digit dates.

320	2.3.  The Name Part

322	   The final part of the ARK is the Name assigned by the NAA, and it is
323	   also required.  The Name is a string of visible ASCII characters and
324	   should be less than 128 bytes in length.  The length restriction
325	   keeps the ARK short enough to append ordinary ARK request strings
326	   without running into transport restrictions within HTTP GET requests.
327	   Characters may be letters, digits, or any of these six characters:

329	         =   @   $   _   *   +   #

331	   The following characters may also be used, but in limited ways:

333	         /   .   -   %

335	   The characters `/' and `.' are ignored if either appears as the last
336	   character of an ARK.  If used internally, they allow a name assigning
337	   authority to reveal object hierarchy and object variants as described
338	   in the next two sections.

340	   A `-' (hyphen) may appear in an ARK, but must be ignored in lexical
341	   comparisons.  The `%' character is reserved for %-encoding all other
342	   octets that would appear in the ARK string, in the same manner as for
343	   URIs [URI].  A %-encoded octet consists of a `%' followed by two hex
344	   digits; for example, "%7d" stands in for `}'.  Lower case hex digits
345	   are preferred to reduce the chances of false acronym recognition;
346	   thus it is better to use "%acT" instead of "%ACT".  The character `%'
347	   itself must be represented using "%25".  As with URNs, %-encoding
348	   permits ARKs to support legacy namespaces (e.g., ISBN, ISSN, SICI)
349	   that have less restricted character repertoires [URNBIB].

351	   The creation of names that include linguistically based constructs
352	   (having recognizable meaning from natural language) is strongly dis�
353	   couraged if long-term persistence is a naming priority.  Such names
354	   do not age or travel well.  Names that look more or less like numbers
355	   avoid common problems that defeat persistence and international
356	   acceptance.  The use of digits is highly recommended.  Mixing in non-
357	   vowel alphabetic characters is a relatively safe and easy way to
358	   achieve more compact names, although any character repertoire can
359	   work if potentially troublesome names will be discarded during a
360	   screening process.  More on naming considerations is given in a later
361	   section.

363	2.3.1.  Names that Reveal Object Hierarchy

365	   A name assigning authority may choose to reveal the presence of a
366	   hierarchical relationship between objects using the `/' (slash)
367	   character in the Name part of an ARK.  If the Name contains an
368	   internal slash, the piece to its left indicates a containing object.
369	   For example, publishing an ARK of the form,

371	                         ark:/12025/654/xz/321

373	   is equivalent to publishing three ARKs,

375	                         ark:/12025/654/xz/321
376	                         ark:/12025/654/xz
377	                         ark:/12025/654

379	   together with a declaration that the first object is contained in the
380	   second object, and that the second object is contained in the third.

382	   Revealing the presence of hierarchy is completely up to the assigning
383	   authority.  It is hard enough to commit to one object's name, let
384	   alone to three objects' names and to a specific, ongoing relatedness
385	   among them.  Thus, regardless of whether hierarchy was present ini�
386	   tially, the assigning authority, by not using slashes, reveals no
387	   shared inferences about hierarchical or other inter-relatedness in
388	   the following ARKs:

390	                         ark:/12025/654_xz_321
391	                         ark:/12025/654_xz
392	                         ark:/12025/654xz321
393	                         ark:/12025/654xz
394	                         ark:/12025/654

396	   Note that slashes around the ARK's NAAN (/12025/ in these examples)
397	   are not part of the ARK's Name and therefore do not indicate the
398	   existence of some sort of NAAN super object containing all objects in
399	   its namespace.  A slash must have at least one non-structural charac�
400	   ter (one that is neither a slash nor a period) on both sides in order
401	   for it to separate recognizable structural components.  So initial or
402	   final slashes may be removed, and double slashes may be converted
403	   into single slashes.

405	2.3.2.  Names that Reveal Object Variants

407	   A name assigning authority may choose to reveal the possible presence
408	   of variant objects using the `.' (period) character in the Name part
409	   of an ARK.  If the Name contains an internal period, the piece to its
410	   left is a base name and the piece to its right, and up to the end of
411	   the ARK or to the next period is a suffix.  A Name may have more than
412	   one suffix, for example,

414	                         ark:/12025/654.24
415	                         ark:/12025/xz4/654.24
416	                         ark:/12025/654.f55.g78.v20

418	   There are two main rules.  First, if two ARKs share the same base
419	   name but have different suffixes, the corresponding objects were con�
420	   sidered variants of each other (different formats, languages, ver�
421	   sions, etc.) by the assigning authority.  Thus, the following ARKs
422	   are variants of each other:

424	                         ark:/12025/654.f55.g78.v20
425	                         ark:/12025/654.321xz
426	                         ark:/12025/654.44

428	   Second, publishing an ARK with a suffix implies the existence of at
429	   least one variant identified by the ARK without its suffix.  The ARK
430	   otherwise permits no further assumptions about what variants might
431	   exist.  So publishing the ARK,

433	                         ark:/12025/654.f55.g78.v20

435	   is equivalent to publishing the four ARKs,

437	                         ark:/12025/654.f55.g78.v20
438	                         ark:/12025/654.f55.g78
439	                         ark:/12025/654.f55
440	                         ark:/12025/654

442	   Revealing the possibility of variants is completely up to the assign�
443	   ing authority.  It is hard enough to commit to one object's name, let
444	   alone to multiple variants' names and to a specific, ongoing related�
445	   ness among them.  The assigning authority is the sole arbiter of what
446	   constitutes a variant within its namespace, and whether to reveal
447	   that kind of relatedness by using periods within its names.

449	   A period must have at least one non-structural character (one that is
450	   neither a slash nor a period) on both sides in order for it to sepa�
451	   rate recognizable structural components.  So initial or final periods
452	   may be removed, and double periods may be converted into single peri�
453	   ods.  Multiple suffixes should be arranged in sorted order (pure
454	   ASCII collating sequence) at the end of an ARK.

456	2.3.3.  Hyphens are Ignored

458	   Hyphens are always ignored in ARKs.  Hyphens may be added to an ARK's
459	   Name part for readability, or during the formatting and wrapping of
460	   text lines, but (as in phone numbers) they are treated as if they
461	   were not present.  Thus, like the NMAH, hyphens are "identity inert"
462	   in comparing ARKs for equivalence.  For example, the following ARKs
463	   are equivalent for purposes of comparison and ARK service access:

465	                                    ark:/12025/65-4-xz-321
466	                    ark:sneezy.dopey.com/12025/654--xz32-1
467	                                    ark:/12025/654xz321

469	2.4.  Normalization and Lexical Equivalence

471	   To determine if two or more ARKs identify the same object, the ARKs
472	   are compared for lexical equivalence after first being normalized.
473	   Since ARK strings may appear in various forms (e.g., having different
474	   NMAHs), normalizing them minimizes the chances that comparing two ARK
475	   strings for equality will fail unless they actually identify
476	   different objects.  In a specified-host ARK (one having an NMAH), the
477	   NMAH never participates in such comparisons.

479	   Normalization of an ARK for the purpose of octet-by-octet equality
480	   comparison with another ARK consists of four steps.  First, any upper
481	   case letters in the "ark:" label and the two characters following a
482	   `%' are converted to lower case.  The case of all other letters in
483	   the ARK string must be preserved.  Second, any NMAH part is removed
484	   (everything from an initial "http://" up to the next slash) and all
485	   hyphens are removed.

487	   Third, structural characters (slash and period) are normalized.
488	   Initial and final occurrences are removed, and two structural
489	   characters in a row (e.g., // or ./) are replaced by the first
490	   character, iterating until each occurrence has at least one non-
491	   structural character on either side.  Finally, if there are any
492	   components with a period on the left and a slash on the right, either
493	   the component and the preceding period must be moved to the end of
494	   the Name part or the ARK must be thrown out as malformed.

496	   The fourth and final step is to arrange the suffixes in ASCII
497	   collating sequence (that is, to sort them) and to remove duplicate
498	   suffixes, if any.  It is also permissible to throw out ARKs for which
499	   the suffixes are not sorted.

501	   The resulting ARK string is now normalized.  Comparisons between
502	   normalized ARKs are case-sensitive, meaning that upper case letters
503	   are considered different from their lower case counterparts.

505	   To keep ARK string variation to a minimum, no reserved ARK characters
506	   should be %-encoded unless it is deliberately to conceal their
507	   reserved meanings.  No non-reserved ARK characters should ever be
508	   %-encoded.  Finally, no %-encoded character should ever appear in an
509	   ARK in its decoded form.

511	2.5.  Naming Considerations

513	   The ARK has different goals from the URI, so it has different
514	   character set requirements.  Because linguistic constructs imperil
515	   persistence, for ARKs non-ASCII character support is unimportant.
516	   ARKs and URIs share goals of transcribability and transportability
517	   within web documents, so characters are required to be visible, non-
518	   conflicting with HTML/XML syntax, and not subject to tampering during
519	   transmission across common transport gateways.  Add the goal of
520	   making an undelimited ARK recognizable in running prose, as in
521	   ark:/12025/=@_22*$, and certain punctuation characters (e.g., comma,
522	   period) end up being excluded from the ARK lest the end of a phrase
523	   or sentence be mistaken for part of the ARK.

525	   A valuable technique for provision of persistent objects is to try to
526	   arrange for the complete identifier to appear on, with, or near its
527	   retrieved object.  An object encountered at a moment in time when its
528	   discovery context has long since disappeared could then easily be
529	   traced back to its metadata, to alternate versions, to updates, etc.
530	   This has seen reasonable success, for example, in book publishing and
531	   software distribution.

533	   If persistence is the goal, a deliberate local strategy for
534	   systematic name assignment is crucial.  Names must be chosen with
535	   great care.  Poorly chosen and managed names will devastate any
536	   persistence strategy, and they do not discriminate based on naming
537	   scheme.  Whether a mistakenly re-assigned identifier is a URN, DOI,
538	   PURL, URL, or ARK, the damage -- failed access and confusion -- is
539	   not mitigated more in one scheme than in another.  Conversely, in-
540	   house efforts to manage names responsibly will go much further
541	   towards safeguarding persistence than any choice of naming scheme or
542	   name resolution technology.

544	   Hostnames appearing in any identifier meant to be persistent must be
545	   chosen with extra care.  The tendency in hostname selection has
546	   traditionally been to choose a token with recognizable attributes,
547	   such as a corporate brand, but that tendency wreaks havoc with
548	   persistence that is supposed to outlive brands, corporations, subject
549	   classifications, and natural language semantics (e.g., what did the
550	   three letters "gay" mean in 1958, 1978, and 1998?).  Today's
551	   recognized and correct attributes are tomorrow's stale or incorrect
552	   attributes.  In making hostnames (any names, actually) long-term
553	   persistent, it helps to eliminate recognizable attributes to the
554	   extent possible.  This affects selection of any name based on URLs,
555	   including PURLs and the explicitly disposable NMAHs.  There is no
556	   excuse for a provider that manages its internal names impeccably not
557	   to exercise the same care in choosing what could be an exceptionally
558	   durable hostname, especially if it would form the prefix for all the
559	   provider's URL-based external names.  Registering an opaque hostname
560	   in the ".org" or ".net" domain would not be a bad start.

562	   Dubious persistence speculation does not make selecting naming
563	   strategies any easier.  For example, despite rumors to the contrary,
564	   there are really no obvious reasons why the organizations registering
565	   DNS names, URN Namespaces, and DOI publisher IDs should have among
566	   them one that is intrinsically more fallible than the next.
567	   Moreover, it is a misconception that the demise of DNS and of HTTP
568	   need adversely affect the persistence of URLs.  At such a time,
569	   certainly URLs from the present day might not then be actionable by
570	   our present-day mechanisms, but resolution systems for future non-
571	   actionable URLs are no harder to imagine than resolution systems for
572	   present-day non-actionable URNs and DOIs.  There is no more stable a
573	   namespace than one that is dead and frozen, and that would then
574	   characterize the space of names bearing the "http://" prefix.  It is
575	   useful to  remember that just because hostnames have been carelessly
576	   chosen in their brief history does not mean that they are unsuitable
577	   in NMAHs (and URLs) intended for use in situations demanding the
578	   highest level of persistence available in the Internet environment.
579	   A well-planned name assignment strategy is everything.

581	3.  Assigners of ARKs

583	   A Name Assigning Authority (NAA) is an organization that creates (or
584	   delegates creation of) long-term associations between identifiers and
585	   information objects.  Examples of NAAs include national libraries,
586	   national archives, and publishers.  An NAA may arrange with an
587	   external organization for identifier assignment.  The US Library of
588	   Congress, for example, allows OCLC (the Online Computer Library
589	   Center, a major world cataloger of books) to create associations
590	   between Library of Congress call numbers (LCCNs) and the books that
591	   OCLC processes.  A cataloging record is generated that testifies to
592	   each association, and the identifier is included by the publisher,
593	   for example, in the front matter of a book.

595	   An NAA does not so much create an identifier as create an
596	   association.  The NAA first draws an unused identifier string from
597	   its namespace, which is the set of all identifiers under its control.
598	   It then records the assignment of the identifier to an information
599	   object having sundry witnessed characteristics, such as a particular
600	   author and modification date.  A namespace is usually reserved for an
601	   NAA by agreement with recognized community organizations (such as
602	   IANA and ISO) that all names containing a particular string be under
603	   its control.  In the ARK an NAA is represented by the Name Assigning
604	   Authority Number (NAAN).

606	   The ARK namespace reserved for an NAA is the set of names bearing its
607	   particular NAAN.  For example, all strings beginning with
608	   "ark:/12025/" are under control of the NAA registered under 12025,
609	   which might be the National Library of Finland.  Because each NAA has
610	   a different NAAN, names from one namespace cannot conflict with those
611	   from another.  Each NAA is free to assign names from its namespace
612	   (or delegate assignment) according to its own policies.  These
613	   policies must be documented in a manner similar to the declarations
614	   required for URN Namespace registration [URNNID].

616	   For now, registration of ARK NAAs is in a bootstrapping phase.  To
617	   register, please read about the mapping authority discovery file in
618	   the next section and send email to ark@cdlib.org.

620	4.  Finding a Name Mapping Authority

622	   In order to derive an actionable identifier (these days, a URL) from
623	   an ARK, a hostport (hostname or hostname plus port combination) for a
624	   working Name Mapping Authority (NMA) must be found.  An NMA is a
625	   service that is able to respond to the three basic ARK service
626	   requests.  Relying on registration and client-side discovery, NMAs
627	   make known which NAAs' identifiers they are willing to service.

629	   Upon encountering an ARK, a user (or client software) looks inside it
630	   for the optional NMAH part (the hostport of the NMA's ARK service).
631	   If it contains an NMAH that is working, this NMAH discovery step may
632	   be skipped; the NMAH effectively uses the beginning of an ARK to
633	   cache the results of a prior mapping authority discovery process.  If
634	   a new NMAH needs to found, the client looks inside the ARK again for
635	   the NAAN (Name Assigning Authority Number).  Querying a global
636	   database, it then uses the NAAN to look up all current NMAHs that
637	   service ARKs issued by the identified NAA.  The global database is
638	   key, and two specific methods for querying it are given in this
639	   section.

641	   In the interests of long-term persistence, however, ARK mechanisms
642	   are first defined in high-level, protocol-independent terms so that
643	   mechanisms may evolve and be replaced over time without compromising
644	   fundamental service objectives.  Either or both specific methods
645	   given here may eventually be supplanted by better methods since, by
646	   design, the ARK scheme does not depend on a particular method, but
647	   only on having some method to locate an active NMAH.

649	   At the time of issuance, at least one NMAH for an ARK should be
650	   prepared to service it.  That NMA may or may not be administered by
651	   the Name Assigning Authority (NAA) that created it.  Consider the
652	   following hypothetical example of providing long-term access to a
653	   cancer research journal.  The publisher wishes to turn a profit and
654	   the National Library of Medicine wishes to preserve the scholarly
655	   record.  An agreement might be struck whereby the publisher would act
656	   as the NAA and the national library would archive the journal issue
657	   when it appears, but without providing direct access for the first
658	   six months.  During the first six months of peak commercial
659	   viability, the publisher would retain exclusive delivery rights and
660	   would charge access fees.  Again, by agreement, both the library and
661	   the publisher would act as NMAs, but during that initial period the
662	   library would redirect requests for issues less than six months old
663	   to the publisher.  At the end of the waiting period, the library
664	   would then begin servicing requests for issues older than six months
665	   by tapping directly into its own archives.  Meanwhile, the publisher
666	   might routinely redirect incoming requests for older issues to the
667	   library.  Long-term access is thereby preserved, and so is the
668	   commercial incentive to publish content.

670	   There is never a requirement that an NAA also run an NMA service,
671	   although it seems not an unlikely scenario.  Over time NAAs and NMAs
672	   would come and go.  One NMA would succeed another, and there might be
673	   many NMAs serving the same ARKs simultaneously (e.g., as mirrors or
674	   as competitors).  There might also be asymmetric but coordinated NMAs
675	   as in the library-publisher example above.

677	4.1.  Looking Up NMAHs in a Globally Accessible File

679	   This subsection describes a way to look up NMAHs using a simple text
680	   file.  For efficient access the file may be stored in a local
681	   filesystem, but it needs to be reloaded periodically to incorporate
682	   updates.  It is not expected that the size of the file or frequency
683	   of update should impose an undue maintenance or searching burden any
684	   time soon, for even primitive linear search of a file with ten-
685	   thousand NAAs is a subsecond operation on modern server machines.
686	   The proposed file strategy is similar to the /etc/hosts file strategy
687	   that supported Internet host address lookup for a period of years
688	   before the advent of the Domain Name System [DNS].

690	   A copy of the current file (at the time of writing) appears in an
691	   appendix and is available on the web.  A minimal version of the file
692	   appears below.  Comment lines (lines that begin with `#') explain the
693	   format and give the file's modification time, reloading address, and
694	   NAA registration instructions.  There is even a Perl script that
695	   processes the file embedded in the file's comments.  Because this is
696	   still a proposed file, none of the values in it are real.

698	         #
699	         # Name Assigning Authority / Name Mapping Authority Lookup Table
700	         #       Last change:   2 June 2004
701	         #       Reload from:   http://ark.nlm.nih.gov/etc/natab
702	         #       Mirrored at:   http://ark.cdlib.org/natab
703	         #       To register:   mailto:ark@cdlib.org?Subject=naareg
704	         #       Process with:  Perl script at end of this file (optional)
705	         #
706	         # Each NAA appears at the beginning of a line with the NAA Number
707	         # first, a colon, and an ARK or URL to a statement of naming policy
708	         # (see http://ark.cdlib.org for an example).
709	         # All the NMA hostports that service an NAA are listed, one per
710	         # line, indented, after the corresponding NAA line.
711	         #
712	         #       National Library of Medicine
713	         12025:  http://www.nlm.nih.gov/xxx/naapolicy.html
714	                 ark.nlm.nih.gov USNLM
715	                 foobar.zaf.org UCSF
716	                 sneezy.dopey.com BIREME
717	         #
718	         #       Library of Congress
719	         12026:  http://www.loc.gov/xxx/naapolicy.html
720	                 foobar.zaf.org USLC
721	                 sneezy.dopey.com USLC
722	         #
723	         #       National Agriculture Library
724	         12027:  http://www.nal.gov/xxx/naapolicy.html
725	                 foobar.zaf.gov:80 USNAL
726	         #
727	         #       California Digital Library
728	         13030:  http://www.cdlib.org/inside/diglib/ark/
729	                 ark.cdlib.org CDL
730	         #
731	         #       World Intellectual Property Organization
732	         13038:  http://www.wipo.int/xxx/naapolicy.html
733	                 www.wipo.int WIPO
734	         #
735	         #       University of California San Diego
736	         20775:  http://library.ucsd.edu/xxx/naapolicy.html
737	                 ucsd.edu UCSD
738	         #
739	         #       University of California San Francisco
740	         29114:  http://library.ucsf.edu/xxx/naapolicy.html
741	                 ucsf.edu UCSF
742	         #
743	         #       University of California Berkeley
744	         28722:  http://library.berkeley.edu/xxx/naapolicy.html
745	                 berkeley.edu UCB
746	         #
747	         #       Rutgers University Libraries
748	         15230:  http://rci.rutgers.edu/xxx/naapolicy.html
749	                 rutgers.edu RUL
750	         #
751	         #--- end of data ---
752	         # The following Perl script takes an NAA as argument and outputs
753	         # the NMAs in this file listed under any matching NAA.
754	         #
755	         # my $naa = shift;
756	         # while (<>) {
757	         #       next if (! /^$naa:/);
758	         #       while (<>) {
759	         #               last if (! /^[#\s]./);
760	         #               print "$1\n" if (/^\s+(\S+)/);
761	         #       }
762	         # }
763	         #
764	         # Create a g/t/nroff-safe version of this table with the UNIX command,
765	         #
766	         #       expand natab | sed 's/\\/\\\e/g' > natab.roff
767	         #
768	         # end of file

770	4.2.  Looking up NMAHs Distributed via DNS

772	   This subsection introduces a method for looking up NMAHs that is
773	   based on the method for discovering URN resolvers described in
774	   [NAPTR].  It relies on querying the DNS system already installed in
775	   the background infrastructure of most networked computers.  A query
776	   is submitted to DNS asking for a list of resolvers that match a given
777	   NAAN.  DNS distributes the query to the particular DNS servers that
778	   can best provide the answer, unless the answer can be found more
779	   quickly in a local DNS cache as a side-effect of a recent query.
780	   Responses come back inside Name Authority Pointer (NAPTR) records.
781	   The normal result is one or more candidate NMAHs.

783	   In its full generality the [NAPTR] algorithm ambitiously accommodates
784	   a complex set of preferences, orderings, protocols, mapping services,
785	   regular expression rewriting rules, and DNS record types.  This
786	   subsection proposes a drastic simplification of it for the special
787	   case of ARK mapping authority discovery.  The simplified algorithm is
788	   called Maptr.  It uses only one DNS record type (NAPTR) and restricts
789	   most of its field values to constants.  The following hypothetical
790	   excerpt from a DNS data file for the NAAN known as 12026 shows three
791	   example NAPTR records ready to use with the Maptr algorithm.

793	       12026.ark.arpa.
794	       ;; US Library of Congress
795	       ;;       order pref flags service regexp  replacement
796	        IN NAPTR  0     0   "h"  "ark"   "USLC"  lhc.nlm.nih.gov:8080
797	        IN NAPTR  0     0   "h"  "ark"   "USLC"  foobar.zaf.org
798	        IN NAPTR  0     0   "h"  "ark"   "USLC"  sneezy.dopey.com

800	   All the fields are held constant for Maptr except for the "flags",
801	   "regexp", and "replacement" fields.  The "service" field contains the
802	   constant value "ark" so that NAPTR records participating in the Maptr
803	   algorithm will not be confused with other NAPTR records.  The "order"
804	   and "pref" fields are held to 0 (zero) and otherwise ignored for now;
805	   the algorithm may evolve to use these fields for ranking decisions
806	   when usage patterns and local administrative needs are better under�
807	   stood.

809	   When a Maptr query returns a record with a flags field of "h" (for
810	   hostport, a Maptr extension to the NAPTR flags), the replacement
811	   field contains the NMAH (hostport) of an ARK service provider.  When
812	   a query returns a record with a flags field of "" (the empty string),
813	   the client needs to submit a new query containing the domain name
814	   found in the replacement field.  This second sort of record exploits
815	   the distributed nature of DNS by redirecting the query to another
816	   domain name.  It looks like this.

818	       12345.ark.arpa.
819	       ;; Digital Library Consortium
820	       ;;       order pref flags service regexp replacement
821	        IN NAPTR  0     0    ""  "ark"     ""   dlc.spct.org.

823	   Here is the Maptr algorithm for ARK mapping authority discovery.  In
824	   it replace <NAAN> with the NAAN from the ARK for which an NMAH is
825	   sought.

827	        (1) Initialize the DNS query:  type=NAPTR,
828	        query=<NAAN>.ark.arpa.

830	        (2) Submit the query to DNS and retrieve (NAPTR) records, dis�
831	        carding any record that does not have "ark" for the service
832	        field.

834	        (3) All remaining records with a flags fields of "h" contain
835	        candidate NMAHs in their replacement fields.  Set them aside, if
836	        any.

838	        (4) Any record with an empty flags field ("") has a replacement
839	        field containing a new domain name to which a subsequent query
840	        should be redirected.  For each such record, set query=<replace�
841	        ment> then go to step (2).  When all such records have been
842	        recursively exhausted, go to step (5).

844	        (5) All redirected queries have been resolved and a set of can�
845	        didate NMAHs has been accumulated from steps (3).  If there are
846	        zero NMAHs, exit -- no mapping authority was found.  If there is
847	        one or more NMAH, choose one using any criteria you wish, then
848	        exit.

850	   A Perl script that implements this algorithm is included here.

852	     #!/depot/bin/perl

854	     use Net::DNS;                 # include simple DNS package
855	     my $qtype = "NAPTR";               # initialize query type
856	     my $naa = shift;              # get NAAN script argument
857	     my $mad = new Net::DNS::Resolver;  # mapping authority discovery

859	     &maptr("$naa.ark.arpa");      # call maptr - that's it

861	     sub maptr {                   # recursive maptr algorithm
862	          my $dname = shift;       # domain name as argument
863	          my ($rr, $order, $pref, $flags, $service, $regexp,
864	               $replacement);
865	          my $query = $mad->query($dname, $qtype);
866	          return                   # non-productive query
867	               if (! $query || ! $query->answer);
868	          foreach $rr ($query->answer) {
869	               next           # skip records of wrong type
870	                    if ($rr->type ne $qtype);
871	               ($order, $pref, $flags, $service, $regexp,
872	                    $replacement) = split(/\s/, $rr->rdatastr);
873	               if ($flags eq "") {
874	                    &maptr($replacement);    # recurse
875	               } elsif ($flags eq "h") {
876	                    print "$replacement\n";  # candidate NMAH
877	               }
878	          }
879	     }

881	   The global database thus distributed via DNS and the Maptr algorithm
882	   can easily be seen to mirror the contents of the Name Authority Table
883	   file described in the previous section.

885	5.  Generic ARK Service Definition

887	   An ARK request's output is delivered information; examples include
888	   the object itself, a policy declaration (e.g., a promise of support),
889	   a descriptive metadata record, or an error message.  ARK services
890	   must be couched in high-level, protocol-independent terms if
891	   persistence is to outlive today's networking infrastructural
892	   assumptions.  The high-level ARK service definitions listed below are
893	   followed in the next section by a concrete method (one of many
894	   possible methods) for delivering these services with today's
895	   technology.

897	5.1.  Generic ARK Access Service (access, location)

899	   Returns (a copy of) the object or a redirect to the same, although a
900	   sensible object proxy may be substituted.  Examples of sensible
901	   substitutes include,

903	     - a table of contents instead of a large complex document,
904	     - a home page instead of an entire web site hierarchy,
905	     - a rights clearance challenge before accessing protected data,
906	     - directions for access to an offline object (e.g., a book),
907	     - a description of an intangible object (a disease, an event), or
908	     - an applet acting as "player" for a large multimedia object.

910	   May also return a discriminated list of alternate object locators.
911	   If access is denied, returns an explanation of the object's current
912	   (perhaps permanent) inaccessibility.

914	5.2.  Generic Policy Service (permanence, naming, etc.)

916	   Returns declarations of policy and support commitments for given
917	   ARKs.  Declarations are returned in either a structured metadata
918	   format or a human readable text format; sometimes one format may
919	   serve both purposes.  Policy subareas may be addressed in separate
920	   requests, but the following areas should should be covered:  object
921	   permanence, object naming, object fragment addressing, and
922	   operational service support.

924	   The permanence declaration for an object is a rating defined with
925	   respect to an identified permanence provider (guarantor), and may
926	   include the following aspects.  One permanence rating framework is
927	   given in [NLMPerm].

929	        (a) "object availability" -- whether and how access to the
930	        object is supported (e.g., online 24x7, or offline only),

932	        (b) "identifier validity" -- under what conditions the
933	        identifier will be or has been re-assigned,

935	        (c) "content invariance" -- under what conditions the content of
936	        the object is subject to change, and

938	        (d) "change history" -- documentation, whether abbreviated or
939	        detailed, of any or all corrections, migrations, revisions, etc.

941	   Naming policy for an object includes an historical description of the
942	   NAA's (and its successor NAA's) policies regarding differentiation of
943	   objects.  It may include the following aspects.

945	        (e) "similarity" -- (or "unity") the limit, defined by the NAA,
946	        to the level of dissimilarity beyond which two similar objects
947	        warrant separate identifiers but before which they share one
948	        single identifier, and

950	        (f) "granularity" -- the limit, defined by the NAA, to the level
951	        of object subdivision beyond which sub-objects do not warrant
952	        separately assigned identifiers but before which sub-objects are
953	        assigned separate identifiers.

955	   Addressing policy for an object includes a description of how, during
956	   access, object components (e.g., paragraphs, sections) or views
957	   (e.g., image conversions) may or may not be "addressed", in other
958	   words, how the NMA permits arguments or parameters to modify the
959	   object delivered as the result of an ARK request.  If supported,
960	   these sorts of operations would provide things like byte-ranged
961	   fragment delivery and open-ended format conversions, or any set of
962	   possible transformations that would be too numerous to list or to
963	   identify with separately assigned ARKs.

965	   Operational service support policy includes a description of general
966	   operational aspects of the NMA service, such as after-hours staffing
967	   and trouble reporting procedures.

969	5.3.  Generic Description Service

971	   Returns a description of the object.  Descriptions are returned in
972	   either a structured metadata format or a human readable text format;
973	   sometimes one format may serve both purposes.  A description must at
974	   a minimum answer the who, what, when, and where questions concerning
975	   an expression of the object.  Standalone descriptions should be
976	   accompanied by the modification date and source of the description
977	   itself.  May also return discriminated lists of ARKs that are related
978	   to the given ARK.

980	6.  Overview of the Tiny HTTP URL Mapping Protocol (THUMP)

982	   The Tiny HTTP URL Mapping Protocol (THUMP) is a way of taking a key
983	   (a kind of identifier) and asking such questions as, what information
984	   does this identify and how permanent is it?  [THUMP] is in fact one
985	   specific method under development for delivering ARK services.  The
986	   protocol runs over HTTP to exploit the web browser's current pre-
987	   eminence as user interface to the Internet.  THUMP is designed so
988	   that a person can enter ARK requests directly into the location field
989	   of current browser interfaces.  Because it runs over HTTP, THUMP can
990	   be simulated and tested within keyboard-based [TELNET] sessions.

992	   The asker (a person or client program) starts with an identifier,
993	   such as an ARK or a URL.  The identifier reveals to the asker (or
994	   allows the asker to infer) the Internet host name and port number of
995	   a server system that responds to questions.  Here, this is just the
996	   NMAH that is obtained by inspection and possibly lookup based on the
997	   ARK's NAAN.  The asker then sets up an HTTP session with the server
998	   system, sends a question via a THUMP request (contained within an
999	   HTTP request), receives an answer via a THUMP response (contained
1000	   within an HTTP response), and closes the session.  That concludes the
1001	   connected portion of the protocol.

1003	   A THUMP request is a string of characters beginning with a `?'
1004	   (question mark) that is appended to the identifier string.  The
1005	   resulting string is sent as an argument to HTTP's GET command.
1006	   Request strings too long for GET may be sent using HTTP's POST
1007	   command.  The three most common requests correspond to three
1008	   degenerate special cases that keep the user's learning and typing
1009	   burden low.  First, a simple key with no request at all is the same
1010	   as an ordinary access request.  Thus a plain ARK entered into a
1011	   browser's location field behaves much like a plain URL, and returns
1012	   access to the primary identified object, for instance, an HTML
1013	   document.

1015	   The second special case is a minimal ARK description request string
1016	   consisting of just "?".  For example, entering the string,

1018	             ark.nlm.nih.gov/12025/psbbantu?

1020	   into the browser's location field directly precipitates a request for
1021	   a metadata record describing the object identified by ark:/12025/psb�
1022	   bantu.  The browser, unaware of THUMP, prepares and sends an HTTP GET
1023	   request in the same manner as for a URL.  THUMP is designed so that
1024	   the response (indicated by the returned HTTP content type) is nor�
1025	   mally displayed, whether the output is structured for machine pro�
1026	   cessing (text/plain) or formatted for human consumption (text/html).

1028	   In the following example THUMP session, each line has been annotated
1029	   to include a line number and whether it was the client or server that
1030	   sent it.  Without going into much depth, the session has four pieces
1031	   separated from each other by blank lines:  the client's piece (lines
1032	   1-3), the server's HTTP/THUMP response headers (4-7), and the body of
1033	   the server's response (8-17).  The first and last lines (1 and 17)
1034	   correspond to the client's steps to start the TCP session and the
1035	   server's steps to end it, respectively.

1037	      1  C: [opens session]
1038	         C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu? HTTP/1.1
1039	         C:
1040	         S: HTTP/1.1 200 OK
1041	      5  S: Content-Type: text/plain
1042	         S: THUMP-Status: 0.1 200 OK
1043	         S:
1044	         S: |set: NLM | 12025/psbbantu? | 20030731
1045	         S:         | http://ark.nlm.nih.gov/ark:/12025/psbbantu?
1046	     10  S: here: 1 | 1 | 1
1047	         S:
1048	         S: erc:
1049	         S: who:    Lederberg, Joshua
1050	         S: what:   Studies of Human Families for Genetic Linkage
1051	     15  S: when:   1974
1052	         S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1053	         S: [closes session]

1055	   The first two server response lines (4-5) above are typical of HTTP.
1056	   The next line (6) is peculiar to THUMP, and indicates the THUMP ver�
1057	   sion and a normal return status.  The balance of the response con�
1058	   sists of a record set header (lines 8-10) and a single metadata
1059	   record (12-16) that comprises the ARK description service response.
1060	   The record set header identifies (8-9) who created the set, what its
1061	   title is, when it was created, and where an automated process can
1062	   access the set; it ends in a line (10) whose respective sub-elements
1063	   indicate that here in this communication the recipient can expect to
1064	   find 1 record, starting at the record numbered 1, from a set consist�
1065	   ing of a total of 1 record (i.e., here is the entire set, consisting
1066	   of exactly one record).

1068	   The returned record (12-16) is in the format of an Electronic
1069	   Resource Citation [ERC], which is discussed in more detail in the
1070	   next section.  For now, note that it contains four elements that
1071	   answer the top priority questions regarding an expression of the
1072	   object:  who played a major role in expressing it, what the expres�
1073	   sion was called, when is was created, and where the expression may be
1074	   found.  This quartet of elements comes up again and again in ERCs.

1076	   The third degenerate special case of an ARK request (and no other
1077	   cases will be described in this document) is the string "??", corre�
1078	   sponding to a minimal permanence policy request.  It can be seen in
1079	   use appended to an ARK (on line 2) in the example session that fol�
1080	   lows.

1082	      1  C: [opens session]
1083	         C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu?? HTTP/1.1
1084	         C:
1085	         S: HTTP/1.1 200 OK
1086	      5  S: Content-Type: text/plain
1087	         S: THUMP-Status: 0.1 200 OK
1088	         S:
1089	         S: |set: NLM | 12025/psbbantu?? | 20030731
1090	         S:         | http://ark.nlm.nih.gov/ark:/12025/psbbantu??
1091	     10  S: here: 1 | 1 | 1
1092	         S:
1093	         S: erc:
1094	         S: who:    Lederberg, Joshua
1095	         S: what:   Studies of Human Families for Genetic Linkage
1096	     15  S: when:   1974
1097	         S: where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1098	         S: erc-support:
1099	         S: who:    USNLM
1100	         S: what:   Permanent, Unchanging Content
1101	     20  S: when:   20010421
1102	         S: where:  http://ark.nlm.nih.gov/yy22948
1103	         S: [closes session]

1105	   Again, a single metadata record (lines 12-21) is returned, but it
1106	   consists of two segments.  The first segment (12-16) gives the same
1107	   basic citation information as in the previous example.  It is
1108	   returned in order to establish context for the persistence declara�
1109	   tion in the second segment (17-21).

1111	   Each segment in an ERC tells a different story relating to the
1112	   object, so although the same four questions (elements) appear in
1113	   each, the answers depend on the segment's story type.  While the
1114	   first segment tells the story of an expression of the object, the
1115	   second segment tells the story of the support commitment made to it:
1116	   who made the commitment, what the nature of the commitment was, when
1117	   it was made, and where a fuller explanation of the commitment may be
1118	   found.

1120	7.  Overview of Electronic Resource Citations (ERCs)

1122	   An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a
1123	   simple, compact, and printable record designed to hold data
1124	   associated with an information resource.  By design, the ERC is a
1125	   metadata format that balances the needs for expressive power, very
1126	   simple machine processing, and direct human manipulation.

1128	   A founding principle of the ERC is that direct human contact with
1129	   metadata will be a necessary and sufficient condition for the near
1130	   term rapid development of metadata standards, systems, and services.
1131	   Thus the machine-processable ERC format must only minimally strain
1132	   people's ability to read, understand, change, and transmit ERCs
1133	   without their relying on intermediation with specialized software
1134	   tools.  The basic ERC needs to be succinct, transparent, and
1135	   trivially parseable by software.

1137	   In the current Internet, it is natural seriously to consider using
1138	   XML as an exchange format because of predictions that it will obviate
1139	   many ad hoc formats and programs, and unify much of the world's
1140	   information under one reliable data structuring discipline that is
1141	   easy to generate, verify, parse, and render.  It appears, however,
1142	   that XML is still only catching on after years of standards work and
1143	   implementation experience.  The reasons for it are unclear, but for
1144	   now very simple XML interpretation is still out of reach.  Another
1145	   important caution is that XML structures are hard on the eyeballs,
1146	   taking up an amount of display (and page) space that significantly
1147	   exceeds that of traditional formats.  Until these conflicts with ERC
1148	   principle are resolved, XML is not a first choice for representing
1149	   ERCs.  Borrowing instead from the data structuring format that
1150	   underlies the successful spread of email and web services, the first
1151	   ERC format is based on email and HTTP headers (RFC822) [EMHDRS].
1152	   There is a naturalness to its label-colon-value format (seen in the
1153	   previous section) that barely needs explanation to a person beginning
1154	   to enter ERC metadata.

1156	   Besides simplicity of ERC system implementation and data entry
1157	   mechanics, ERC semantics (what the record and its constituent parts
1158	   mean) must also be easy to explain.  ERC semantics are based on a
1159	   reformulation and extension of the Dublin Core [DCORE] hypothesis,
1160	   which suggests that the fifteen Dublin Core metadata elements have a
1161	   key role to play in cross-domain resource description.  The ERC
1162	   design recognizes that the Dublin Core's primary contribution is the
1163	   international, interdisciplinary consensus that identified fifteen
1164	   semantic buckets (element categories), regardless of how they are
1165	   labeled.  The ERC then adds a definition for a record and some
1166	   minimal compliance rules.  In pursuing the limits of simplicity, the
1167	   ERC design combines and relabels some Dublin Core buckets to isolate
1168	   a tiny kernel (subset) of four elements for basic cross-domain
1169	   resource description.

1171	   For the cross-domain kernel, the ERC uses the four basic elements --
1172	   who, what, when, and where -- to pretend that every object in the
1173	   universe can have a uniform minimal description.  Each has a name or
1174	   other identifier, a location, some responsible person or party, and a
1175	   date.  It doesn't matter what type of object it is, or whether one
1176	   plans to read it, interact with it, smoke it, wear it, or navigate
1177	   it.  Of course, this approach is flawed because uniformity of
1178	   description for some object types requires more semantic contortion
1179	   and sacrifice than for others.  That is why at the beginning of this
1180	   document, the ARK was said to be suited to objects that accommodate
1181	   reasonably regular electronic description.

1183	   While insisting on uniformity at the most basic level provides
1184	   powerful cross-domain leverage, the semantic sacrifice is great for
1185	   many applications.  So the ERC also permits a semantically rich and
1186	   nuanced description to co-exist in a record along with a basic
1187	   description.  In that way both sophisticated and naive recipients of
1188	   the record can extract the level of meaning from it that best suits
1189	   their needs and abilities.  Key to unlocking the richer description
1190	   is a controlled vocabulary of ERC record types (not explained in this
1191	   document) that permit knowledgeable recipients to apply defined sets
1192	   of additional assumptions to the record.

1194	7.1.  ERC Syntax

1196	   An ERC record is a sequence of metadata elements ending in a blank
1197	   line.  An element consists of a label, a colon, and an optional
1198	   value.  Here is an example of a record with five elements.

1200	          erc:
1201	          who: Gibbon, Edward
1202	          what: The Decline and Fall of the Roman Empire
1203	          when: 1781
1204	          where: http://www.ccel.org/g/gibbon/decline/

1206	   A long value may be folded (continued) onto the next line by insert�
1207	   ing a newline and indenting the next line.  A value can be thus
1208	   folded across multiple lines.  Here are two example elements, each
1209	   folded across four lines.

1211	          who/created: University of California, San Francisco, AIDS
1212	               Program at San Francisco General Hospital | University
1213	               of California, San Francisco, Center for AIDS Prevention
1214	               Studies
1215	          what/Topic:
1216	                Heart Attack | Heart Failure
1217	               | Heart
1218	                                Diseases

1220	   An element value folded across several lines is treated as if the
1221	   lines were joined together on one long line.  For example, the second
1222	   element from the previous example is considered equivalent to

1224	          what/Topic: Heart Attack | Heart Failure | Heart Diseases

1226	   An element value may contain multiple values, each one separated from
1227	   the next by a `|' (pipe) character.  The element from the previous
1228	   example contains three values.

1230	   For annotation purposes, any line beginning with a `#' (hash) charac�
1231	   ter is treated as if it were not present; this is a "comment" line (a
1232	   feature not available in email or HTTP headers).  For example, the
1233	   following element is spread across four lines and contains two val�
1234	   ues:

1236	          what/Topic:
1237	               Heart Attack
1238	          #    | Heart Failure  -- hold off until next review cycle
1239	               | Heart Diseases

1241	7.2.  ERC Stories

1243	   An ERC record is organized into one or more distinct segments, where
1244	   where each segment tells a story about a different aspect of the
1245	   information resource.  A segment boundary occurs whenever a segment
1246	   label (an element beginning with "erc") is encountered.  The basic
1247	   label "erc:" introduces the story of an object's expression (e.g.,
1248	   its publication, installation, or performance).  The label "erc-
1249	   about:" introduces the story of an object's content (what it is
1250	   about) and "erc-support:" introduces the story of a support
1251	   commitment made to it.  A story segment that concerns the ERC itself
1252	   is introduced by the label "erc-from:".  It is an important segment
1253	   that tells the story of the ERC's provenance.  Elements beginning
1254	   with "erc" are reserved for segment labels and their associated story
1255	   types.  From an earlier example, here is an ERC with two segments.

1257	         erc:
1258	         who:    Lederberg, Joshua
1259	         what:   Studies of Human Families for Genetic Linkage
1260	         when:   1974
1261	         where:  http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf
1262	         erc-support:
1263	         who:    NIH/NLM/LHNCBC
1264	         what:   Permanent, Unchanging Content
1265	         # Note to ops staff:  date needs verification.
1266	         when:   2001 04 21
1267	         where:  http://ark.nlm.nih.gov/yy22948

1269	   Segment stories are told according to journalistic tradition.  While
1270	   any number of pertinent elements may appear in a segment, priority is
1271	   placed on answering the questions who, what, when, and where at the
1272	   beginning of each segment so that readers can make the most important
1273	   selection or rejection decisions as soon as possible.  To make things
1274	   simple, the listed ordering of the questions is maintained in each
1275	   segment (as it happens most people who have been exposed to this
1276	   story telling technique are already familiar with the above order�
1277	   ing).

1279	   The four questions are answered by using corresponding element
1280	   labels.  The four element labels can be re-used in each story seg�
1281	   ment, but their meaning changes depending on the segment (the story
1282	   type) in which they appear.  In the example above, "who" is first
1283	   used to name a document's author and subsequently used to name the
1284	   permanence guarantor (provider).  Similarly, "when" first lists the
1285	   date of object creation and in the next segment lists the date of a
1286	   commitment decision.  Four labels appearing across three segments
1287	   effectively map to twelve semantically distinct elements.  Distinct
1288	   element meanings are mapped to Dublin Core elements in a later sec�
1289	   tion.

1291	7.3.  The ERC Anchoring Story

1293	   Each ERC contains an anchoring story.  It is usually the first
1294	   segment labeled "erc:" and it concerns an "anchoring" expression of
1295	   the object.  An "anchoring" expression is the one that a provider
1296	   deemed the most suitable basic referent given the audience and
1297	   application for which it produced the ERC.  If it sounds like the
1298	   provider has great latitude in choosing its anchoring expression, it
1299	   is because it does.  A typical anchoring story in an ERC for a born-
1300	   digital document would be the story of the document's release on a
1301	   web site; such a document would then be the anchoring expression.

1303	   An anchoring story need not be the central descriptive goal of an ERC
1304	   record.  For example, a museum provider may create an ERC for a
1305	   digitized photograph of a painting but choose to anchor it in the
1306	   story of the original painting instead of the story of the electronic
1307	   likeness; although the ERC may through other segments prove to be
1308	   centrally concerned with describing the electronic likeness, the
1309	   provider may have chosen this particular anchoring story in order to
1310	   make the ERC visible in a way that is most natural to patrons (who
1311	   would find the Mona Lisa under da Vinci sooner than they would find
1312	   it under the name of the person who snapped the photograph or scanned
1313	   the image).  In another example, a provider that creates an ERC for a
1314	   dramatic play as an abstract work has the task of describing a piece
1315	   of intangible intellectual property.  To anchor this abstract object
1316	   in the concrete world, if only through a derivative expression, it
1317	   makes sense for the provider to choose a suitable printed edition of
1318	   the play as the anchoring object expression (to describe in the
1319	   anchoring story) of the ERC.

1321	   The anchoring story has special rules designed to keep ERC processing
1322	   simple and predictable.  Each of the four basic elements (who, what,
1323	   when, and where) must be present, unless a best effort to supply it
1324	   fails.  In the event of failure, the element still appears but a
1325	   special value (described later) is used to explain the missing value.
1326	   While the requirement that each of the four elements be present only
1327	   applies to the anchoring story segment, as usual these elements
1328	   appear at the beginning of the segment and may only be used in the
1329	   prescribed order.  A minimal ERC would normally consist of just an
1330	   anchoring story and the element quartet, as illustrated in the next
1331	   example.

1333	         erc:
1334	         who:   National Research Council
1335	         what:  The Digital Dilemma
1336	         when:  2000
1337	         where: http://books.nap.edu/html/digital%5Fdilemma

1339	   A minimal ERC can be abbreviated so that it resembles a traditional
1340	   compact bibliographic citation that is nonetheless completely machine
1341	   processable.  The required elements and ordering makes it possible to
1342	   eliminate the element labels, as shown here.

1344	         erc: National Research Council | The Digital Dilemma | 2000
1345	                | http://books.nap.edu/html/digital%5Fdilemma

1347	7.4.  ERC Elements

1349	   As mentioned, the four basic ERC elements (who, what, when, and
1350	   where) take on different specific meanings depending on the story
1351	   segment in which they are used.  By appearing in each segment, albeit
1352	   in different guises, the four elements serve as a valuable mnemonic
1353	   device -- a kind of checklist -- for constructing minimal story
1354	   segments from scratch.  Again, it is only in the anchoring segment
1355	   that all four elements are mandatory.

1357	   Here are some mappings between ERC elements and Dublin Core [DCORE]
1358	   elements.

1360	          Segment     ERC Element     Equivalent Dublin Core Element
1361	         ---------    -----------     ------------------------------
1362	            erc          who          Creator/Contributor/Publisher
1363	            erc          what                Title
1364	            erc          when                Date
1365	            erc          where               Identifier
1366	         erc-about       who                  <none>
1367	         erc-about       what                Subject
1368	         erc-about       when                Coverage (temporal)
1369	         erc-about       where               Coverage (spatial)

1371	   The basic element labels may also be qualified to add nuances to the
1372	   semantic categories that they identify.  Elements are qualified by
1373	   appending a `/' (slash) and a qualifier term.  Often qualifier terms
1374	   appear as the past tense form of a verb because it makes re-using
1375	   qualifiers among elements easier.

1377	         who/published:  ...
1378	         when/published: ...
1379	         where/published: ...

1381	   Using past tense verbs for qualifiers also reminds providers and
1382	   recipients that element values contain transient assertions that may
1383	   have been true once, but that tend to become less true over time.
1384	   Recipients that don't understand the meaning of a qualifier can fall
1385	   back onto the semantic category (bucket) designated by the unquali�
1386	   fied element label.  Inevitably recipients (people and software) will
1387	   have diverse abilities in understanding elements and qualifiers.

1389	   Any number of other elements and qualifiers may be used in conjunc�
1390	   tion with the quartet of basic segment questions.  The only semantic
1391	   requirement is that they pertain to the segment's story.  Also, it is
1392	   only the four basic elements that change meaning depending on their
1393	   segment context.  All other elements have meaning independent of the
1394	   segment in which they appear.  If an element label stripped of its
1395	   qualifier is still not recognized by the recipient, a second fall
1396	   back position is to ignore it and rely on the four basic elements.

1398	   Elements may be either Canonical, Provisional, or Local.  Canonical
1399	   elements are officially recognized via a registry as part of the
1400	   metadata vernacular.  All elements, qualifiers, and segment labels
1401	   used in this document up until now belong to that vernacular.  Provi�
1402	   sional elements are also officially recognized via the registry, but
1403	   have only been proposed for inclusion in the vernacular.  To be pro�
1404	   moted to the vernacular, a provisional element passes through a vet�
1405	   ting process during which its documentation must be in order and its
1406	   community acceptance demonstrated.  Local elements are any elements
1407	   not officially recognized in the registry.  The registry [DERC] is a
1408	   work in progress.

1410	   Local elements can be immediately distinguishable from Canonical or
1411	   Provisional elements because all terms that begin with an upper case
1412	   letter are reserved for spontaneous local use.  No term beginning
1413	   with an upper case letter will ever be assigned Canonical or Provi�
1414	   sional status, so it should be safe to use such terms for local pur�
1415	   poses.  Any recipient of external ERCs containing such terms will
1416	   understand them to be part of the originating provider's local meta�
1417	   data dialect.  Here's an example ERC with three segments, one local
1418	   element, and two local qualifiers.  The segment boundaries have been
1419	   emphasized by comment lines (which, as before, are ignored by proces�
1420	   sors).

1422	         erc:
1423	         who: Bullock, TH | Achimowicz, JZ | Duckrow, RB
1424	                 | Spencer, SS | Iragui-Madoz, VJ
1425	         what: Bicoherence of intracranial EEG in sleep,
1426	                 wakefulness and seizures
1427	         when: 1997 12 00
1428	         where: http://cogprints.soton.ac.uk/%{
1429	                 documents/disk0/00/00/01/22/index.html %}
1430	         in: EEG Clin Neurophysiol | 1997 12 00 | v103, i6, p661-678
1431	         IDcode: cog00000122
1432	         # ---- new segment ----
1433	         erc-about:
1434	         what/Subcategory: Bispectrum | Nonlinearity | Epilepsy
1435	                 | Cooperativity | Subdural | Hippocampus | Higher moment
1436	         # ---- new segment ----
1437	         erc-from:
1438	         who: NIH/NLM/NCBI
1439	         what: pm9546494
1440	         when/Reviewed: 1998 04 18 021600
1441	         where: http://ark.nlm.nih.gov/12025/pm9546494?

1443	   The local element "IDcode" immediately precedes the "erc-about" seg�
1444	   ment, which itself contains an element with the local qualifier "Sub�
1445	   category".  The second to last element also carries the local quali�
1446	   fier "Reviewed".  Finally, what might be a provisional element "in"
1447	   appears near the end of the first segment.  It might have been pro�
1448	   posed as a way to complete a citation for an object originally
1449	   appearing inside another object (such as an article appearing in a
1450	   journal or an encyclopedia).

1452	7.5.  ERC Element Values

1454	   ERC element values tend to be straightforward strings.  If the
1455	   provider intends something special for an element, it will so
1456	   indicate with markers at the beginning of its value string.  The
1457	   markers are designed to be uncommon enough that they would not likely
1458	   occur in normal data except by deliberate intent.  Markers can only
1459	   occur near the beginning of a string, and once any octet of non-
1460	   marker data has been encountered, no further marker processing is
1461	   done for the element value.  In the absence of markers the string is
1462	   considered pure data; this has been the case with all the examples
1463	   seen thus far.  The fullest form of an element value with all three
1464	   optional markers in place looks like this.

1466	         VALUE =    [markup_flags]    (:ccode)    ,    DATA

1468	   In processing, the first non-whitespace character of an ERC element
1469	   value is examined.  An initial `[' is reserved to introduce a brack�
1470	   eted set of markup flags (not described in this document) that ends
1471	   with `]'.  If ERC data is machine-generated, each value string may be
1472	   preceded by "[]" to prevent any of its data from being mistaken for
1473	   markup flags.  Once past the optional markup, the remaining value may
1474	   optionally begin with a controlled code.  A controlled code always
1475	   has the form "(:ccode)", for example,

1477	         who: (:unkn) Anonymous
1478	         what: (:791) Bee Stings

1480	   Any string after such a code is taken to be an uncontrolled (e.g.,
1481	   natural language) equivalent.  The code "unkn" indicates a conven�
1482	   tional explanation for a missing value (stating that the value is
1483	   unknown).  The remainder of the string makes an equivalent statement
1484	   in a form that the provider deemed most suitable to its (probably
1485	   human) audience.  The code "791" could be a fixed numeric topic iden�
1486	   tifier within an unspecified topic vocabulary.  Any code may be
1487	   ignored by those that do not understand it.

1489	   There are several codes to explain different ways in which a required
1490	   element's value may go missing.

1492	         (:unkn)   unknown (e.g., Anonymous, Inconnue)
1493	         (:unav)   value unavailable indefinitely
1494	         (:unac)   temporarily inaccessible
1495	         (:unap)   not applicable, makes no sense
1496	         (:unas)   value unassigned (e.g., Untitled)
1497	         (:none)   never had a value, never will
1498	         (:null)   explicitly empty
1499	         (:unal)   unallowed, suppressed intentionally

1501	   Once past an optional controlled code, the remaining string value is
1502	   subjected to one final test.  If the first next non-whitespace char�
1503	   acter is a `,' (comma), it indicates that the string value is "sort-
1504	   friendly".  This means that the value is (a) laid out with an
1505	   inverted word order useful for sorting items having comparably laid
1506	   out element values (items might be the containing ERC records) and
1507	   (b) that the value may contain other commas that indicate inversion
1508	   points should it become necessary to recover the value in natural
1509	   word order.  Typically, this feature is used to express Western-style
1510	   personal names in family-name-given-name order.  It can also be used
1511	   wherever natural word order might make sorting tricky, such as when
1512	   data contains titles or corporate names.  Here are some example ele�
1513	   ments.

1515	         who:   ,  van Gogh, Vincent
1516	         who:,Howell, III, PhD, 1922-1987, Thurston
1517	         who:, Acme Rocket Factory, Inc., The
1518	         who:, Mao Tse Tung
1519	         who:, McCartney, Paul, Sir,
1520	         what:, Health and Human Services, United States Government
1521	                 Department of, The,

1523	   There are rules to use in recovering a copy of the value in natural
1524	   word order, if desired.  The above example strings have the following
1525	   natural word order values, respectively.

1527	         Vincent van Gogh
1528	         Thurston Howell, III, PhD, 1922-1987
1529	         The Acme Rocket Factory, Inc.
1530	         Mao Tse Tung
1531	         Sir Paul McCartney
1532	         The United States Government Department of Health and Human Services

1534	7.6.  ERC Element Encoding and Dates

1536	   Some characters that need to appear in ERC element values might
1537	   conflict with special characters used for structuring ERCs, so there
1538	   needs to be a way to include them as literal characters that are
1539	   protected from special interpretation.  This is accomplished through
1540	   an encoding mechanism that resembles the %-encoding familiar to [URI]
1541	   handlers.

1543	   The ERC encoding mechanism also uses `%', but instead of taking two
1544	   following hexadecimal digits, it takes one non-alphanumeric character
1545	   or two alphabetic characters that cannot be mistaken for hex digits.
1546	   It is designed not to be confused with normal web-style %-encoding.
1547	   In particular it can be decoded without risking unintended decoding
1548	   of normal %-encoded data (which would introduce errors).  Here are
1549	   the one-character (non-alphanumeric) ERC encoding extensions.

1551	         ERC       Purpose
1552	         ---     ------------------------------------------------
1553	         %!      decodes to the element separator `|'
1554	         %%      decodes to a percent sign `%'
1555	         %.      decodes to a comma `,'
1556	         %_      a non-character used as syntax shim
1557	         %{      a non-character that begins an expansion block
1558	         %}      a non-character that ends an expansion block

1560	   One particularly useful construct in ERC element values is the pair
1561	   of special encoding markers ("%{" and "%}") that indicates a "expan�
1562	   sion" block.  Whatever string of characters they enclose will be
1563	   treated as if none of the contained whitespace (SPACEs, TABs, New�
1564	   lines) were present.  This comes in handy for writing long, multi-
1565	   part URLs in a readable way.  For example, the value in
1566	         where: http://foo.bar.org/node%{
1567	                    ? db = foo
1568	                    & start = 1
1569	                    & end = 5
1570	                    & buf = 2
1571	                    & query = foo + bar + zaf
1572	                %}

1574	   is decoded into an equivalent element, but with a correct and intact
1575	   URL:

1577	     where:
1578	      http://foo.bar.org/node?db=foo&start=1&end=5&buf=2&query=foo+bar+zaf

1580	   In a parting word about ERC element values, a commonly recurring
1581	   value type is a date, possibly followed by a time.  ERC dates take on
1582	   one of the following forms:

1584	         1999                (four digit year)
1585	         2000 12 29          (year, month, day)
1586	         2000 12 29 235955   (year, month, day, hour, minute, second)

1588	   21 Spring 31 1st quarter      25 Spring (so. hemisphere) 22 Summer 32
1589	   2nd quarter       26 Summer (so. hemisphere) 23 Fall        33 3rd
1590	   quarter      27 Fall (so. hemisphere) 24 Winter 34 4th quar�
1591	   ter      28 Winter (so. hemisphere) In dates, all internal whitespace
1592	   is squeezed out to achieve a normalized form suitable for lexical
1593	   comparison and sorting.  This means that the following dates

1595	         2000 12 29 235955           (recommended for readability)
1596	         2000 12 29 23 59 55
1597	         20001229 23 59 55
1598	         20001229235955              (normalized date and time)

1600	   are all equivalent.  The first form is recommended for readability.
1601	   The last form (shortest and easiest to compute with) is the normal�
1602	   ized form.  Hyphens and commas are reserved to create date ranges and
1603	   lists, for example,

1605	         1996-2000                   (a range of four years)
1606	         1952, 1957, 1969            (a list of three years)
1607	         1952, 1958-1967, 1985       (a mixed list of dates and ranges)
1608	         20001229-20001231           (a range of three days)

1610	7.7.  ERC Stub Records and Internal Support

1612	   The ERC design introduces the concept of a "stub" record, which is an
1613	   incomplete ERC record intended to be supplemented with additional
1614	   elements before being released as a standalone ERC record.  A stub
1615	   ERC record has no minimum required elements.  It is just a group of
1616	   elements that does not begin with "erc:" but otherwise conforms to
1617	   the ERC record syntax.

1619	   ERC stubs may be useful in supporting internal procedures using the
1620	   ERC syntax.  Often they rely on the convenience and accuracy of
1621	   automatically supplied elements, even the basic ones.  To be ready
1622	   for external use, however, an ERC stub must be transformed into a
1623	   complete ERC record having the usual required elements.  An ERC stub
1624	   record can be convenient for metadata embedded in a document, where
1625	   elements such as location, modification date, and size -- which one
1626	   would not omit from an externalized record -- are omitted simply
1627	   because they are much better supplied by a computation.  A separate
1628	   local administrative procedure, not defined for ERC's in general,
1629	   would effect the promotion of stubs into complete records.

1631	   While the ERC is a general-purpose container for exchange of resource
1632	   descriptions, it does not dictate how records must be internally
1633	   stored, laid out, or assembled by data providers or recipients.
1634	   Arbitrary internal descriptive frameworks can support ERCs simply by
1635	   mapping (e.g., on demand) local records to the ERC container format
1636	   and making them available for export.  Therefore, to support ERCs
1637	   there is no need for a data provider to convert internal data to be
1638	   stored in an ERC format.  On the other hand, any provider (such as
1639	   one just getting started in the business of resource description) may
1640	   choose to store and manipulate local data natively in the ERC format.

1642	8.  Advice to Web Clients

1644	   This section offers some advice to web client software developers.
1645	   It is hard to write about because it tries to anticipate a series of
1646	   events that might lead to native web browser support for ARKs.

1648	   ARKs are envisaged to appear wherever durable object references are
1649	   planned.  Library cataloging records, literature citations, and
1650	   bibliographies are important examples.  In many of these places URLs
1651	   (Uniform Resource Locators) currently stand in, and URNs, DOIs, and
1652	   PURLs have been proposed as alternatives.

1654	   The strings representing ARKs are also envisaged to appear in some of
1655	   the places where URLs currently appear:  in hypertext links (where
1656	   they are not normally shown to users) and in rendered text (displayed
1657	   or printed).  Internet search engines, for example, tend to include
1658	   both actionable and manifest links when listing each item found.  A
1659	   normal HTML link for which the URL is not displayed looks like this.

1661	          <a href = "http://foo.bar.org/index.htm"> Click Here <a>

1663	   The same link with an ARK instead of a URL:

1665	          <a href = "ark:/14697/b12345x"> Click Here <a>

1667	   Web browsers would in general require a small modification to recog�
1668	   nize and convert this ARK, via mapping authority discovery, to the
1669	   URL form.

1671	          <a href = "http://a.b.org/ark:/14697/b12345x"> Click Here <a>

1673	   A browser that knows how to make that conversion could also automati�
1674	   cally detect and replace a non-working NMAH.

1676	   An NAA will typically make known the associations it creates by pub�
1677	   lishing them in catalogs, actively advertizing them, or simply leav�
1678	   ing them on web sites for visitors (e.g., users, indexing spiders) to
1679	   stumble across in browsing.

1681	9.  Security Considerations

1683	   The ARK naming scheme poses no direct risk to computers and networks.
1684	   Implementors of ARK services need to be aware of security issues when
1685	   querying networks and filesystems for Name Mapping Authority
1686	   services, and the concomitant risks from spoofing and obtaining
1687	   incorrect information.  These risks are no greater for ARK mapping
1688	   authority discovery than for other kinds of service discovery.  For
1689	   example, recipients of ARKs with a specified hostport (NMAH) should
1690	   treat it like a URL and be aware that the identified ARK service may
1691	   no longer be operational.

1693	   Apart from mapping authority discovery, ARK clients and servers
1694	   subject themselves to all the risks that accompany normal operation
1695	   of the protocols underlying mapping services (e.g., HTTP, Z39.50).
1696	   As specializations of such protocols, an ARK service may limit
1697	   exposure to the usual risks.  Indeed, ARK services may enhance a kind
1698	   of security by helping users identify long-term reliable references
1699	   to information objects.

1701	10.  Authors' Addresses

1703	   John A. Kunze
1704	   California Digital Library
1705	   University of California, Office of the President
1706	   415 20th St, 4th Floor
1707	   Oakland, CA  94612-3550, USA

1709	   Fax:   +1 510-893-5212
1710	   EMail: jak@ucop.edu

1712	   R. P. C. Rodgers
1713	   US National Library of Medicine
1714	   8600 Rockville Pike, Bldg. 38A
1715	   Bethesda, MD  20894, USA
1716	   Fax:   +1 301-496-0673
1717	   EMail: rodgers@nlm.nih.gov

1719	11.  References

1721	   [ARK]      J. Kunze, "Towards Electronic Persistence Using ARK
1722	              Identifiers", Proceedings of the 3rd ECDL Workshop on Web
1723	              Archives, August 2003, (PDF)
1724	              http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze

1726	   [DCORE]    Dublin Core Metadata Initiative, "Dublin Core Metadata
1727	              Element Set, Version 1.1:  Reference Description", July
1728	              1999, http://dublincore.org/documents/dces/.

1730	   [DERC]     J. Kunze, "Dictionary of the ERC", work in progress.

1732	   [DNS]      P.V. Mockapetris, "Domain Names - Concepts and
1733	              Facilities", RFC 1034, November 1987.

1735	   [DOI]      International DOI Foundation, "The Digital Object
1736	              Identifier (DOI) System", February 2001,
1737	              http://dx.doi.org/10.1000/203.

1739	   [EMHDRS]   D. Crocker, "Standard for the format of ARPA Internet text
1740	              messages", RFC 822, August 1982.

1742	   [ERC]      J. Kunze, "A Metadata Kernel for Electronic Permanence",
1743	              Journal of Digital Information, Vol 2, Issue 2, January
1744	              2002, ISSN 1368-7506, (PDF)
1745	              http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/

1747	   [HTTP]     R. Fielding, et al, "Hypertext Transfer Protocol --
1748	              HTTP/1.1", RFC 2616, June 1999.

1750	   [MD5]      R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321,
1751	              April 1992.

1753	   [NAPTR]    M. Mealling, Daniel, R., "The Naming Authority Pointer
1754	              (NAPTR) DNS Resource Record", RFC 2915, September 2000.

1756	   [NLMPerm]  M. Byrnes, "Defining NLM's Commitment to the Permanence of
1757	              Electronic Information", ARL 212:8-9, October 2000,
1758	              http://www.arl.org/newsltr/212/nlm.html

1760	   [PURL]     K. Shafer, et al, "Introduction to Persistent Uniform
1761	              Resource Locators", 1996,
1762	              http://purl.oclc.org/OCLC/PURL/INET96

1764	   [TELNET]   J. Postel, J.K. Reynolds, "Telnet Protocol Specification",
1765	              RFC 854, May 1983.

1767	   [THUMP]    J. Kunze, "The HTTP URL Mapping Protocol", work in
1768	              progress.

1770	   [URI]      T. Berners-Lee, et al, "Uniform Resource Identifiers
1771	              (URI): Generic Syntax", RFC 2396, August 1998.

1773	   [URNBIB]   C. Lynch, et al, "Using Existing Bibliographic Identifiers
1774	              as Uniform Resource Names", RFC 2288, February 1998.

1776	   [URNSYN]   R. Moats, "URN Syntax", RFC 2141, May 1997.

1778	   [URNNID]   L. Daigle, et al, "URN Namespace Definition Mechanisms",
1779	              RFC 2611, June 1999.

1781	12.  Appendix:  ARK Implementations

1783	   Currently, the primary implementation activity is at the California
1784	   Digital Library (CDL),

1786	         http://ark.cdlib.org/

1788	   housed at the University of California Office of the President, where
1789	   over 150,000 ARKs have been assigned to objects that the CDL owns or
1790	   controls.  Some experimentation in ARKs is taking place at WIPO and
1791	   at the University of California San Diego.

1793	   The US National Library of Medicine (NLM) also has an experimental,
1794	   prototype ARK service under development.  It is being made available
1795	   for purposes of demonstrating various aspects of the ARK system, but
1796	   is subject to temporary or permanent withdrawal (without notice)
1797	   depending upon the circumstances of the small research group respon�
1798	   sible for making it available.  It is described at:

1800	         http://ark.nlm.nih.gov/

1802	   Comments and feedback may be addressed to rodgers@nlm.nih.gov.

1804	13.  Appendix:  Current ARK Name Authority Table

1806	   This appendix contains a copy of the Name Authority Table (a file) at
1807	   the time of writing.  It may be loaded into a local filesystem (e.g.,
1808	   /etc/natab) for use in mapping NAAs (Name Assigning Authorities) to
1809	   NMAHs (Name Mapping Authority Hostports).  It contains Perl code that
1810	   can be copied into a standalone script that processes the table (as a
1811	   file).  Because this is still a proposed file, none of the values in
1812	   it are real.

1814	     #
1815	     # Name Assigning Authority / Name Mapping Authority Lookup Table
1816	     #       Last change:   2 June 2004
1817	     #       Reload from:   http://ark.nlm.nih.gov/etc/natab
1818	     #       Mirrored at:   http://ark.cdlib.org/natab
1819	     #       To register:   mailto:ark@cdlib.org?Subject=naareg
1820	     #       Process with:  Perl script at end of this file (optional)
1821	     #
1822	     # Each NAA appears at the beginning of a line with the NAA Number
1823	     # first, a colon, and an ARK or URL to a statement of naming policy
1824	     # (see http://ark.cdlib.org for an example).
1825	     # All the NMA hostports that service an NAA are listed, one per
1826	     # line, indented, after the corresponding NAA line.
1827	     #
1828	     #       National Library of Medicine
1829	     12025:  http://www.nlm.nih.gov/xxx/naapolicy.html
1830	             ark.nlm.nih.gov USNLM
1831	             foobar.zaf.org UCSF
1832	             sneezy.dopey.com BIREME
1833	     #
1834	     #       Library of Congress
1835	     12026:  http://www.loc.gov/xxx/naapolicy.html
1836	             foobar.zaf.org USLC
1837	             sneezy.dopey.com USLC
1838	     #
1839	     #       National Agriculture Library
1840	     12027:  http://www.nal.gov/xxx/naapolicy.html
1841	             foobar.zaf.gov:80 USNAL
1842	     #
1843	     #       California Digital Library
1844	     13030:  http://www.cdlib.org/inside/diglib/ark/
1845	             ark.cdlib.org CDL
1846	     #
1847	     #       World Intellectual Property Organization
1848	     13038:  http://www.wipo.int/xxx/naapolicy.html
1849	             www.wipo.int WIPO
1850	     #
1851	     #       University of California San Diego
1852	     20775:  http://library.ucsd.edu/xxx/naapolicy.html
1853	             ucsd.edu UCSD
1854	     #
1855	     #       University of California San Francisco
1856	     29114:  http://library.ucsf.edu/xxx/naapolicy.html
1857	             ucsf.edu UCSF
1858	     #
1859	     #       University of California Berkeley
1860	     28722:  http://library.berkeley.edu/xxx/naapolicy.html
1861	             berkeley.edu UCB
1862	     #
1863	     #       Rutgers University Libraries
1864	     15230:  http://rci.rutgers.edu/xxx/naapolicy.html
1865	             rutgers.edu RUL
1866	     #
1867	     #--- end of data ---
1868	     # The following Perl script takes an NAA as argument and outputs
1869	     # the NMAs in this file listed under any matching NAA.
1870	     #
1871	     # my $naa = shift;
1872	     # while (<>) {
1873	     #       next if (! /^$naa:/);
1874	     #       while (<>) {
1875	     #               last if (! /^[#\s]./);
1876	     #               print "$1\n" if (/^\s+(\S+)/);
1877	     #       }
1878	     # }
1879	     #
1880	     # Create a g/t/nroff-safe version of this table with the UNIX command,
1881	     #
1882	     #       expand natab | sed 's/\\/\\\e/g' > natab.roff
1883	     #
1884	     # end of file

1886	14.  Copyright Notice

1888	   Copyright (C) The Internet Society (2004).  All Rights Reserved.

1890	   This document and translations of it may be copied and furnished to
1891	   others, and derivative works that comment on or otherwise explain it
1892	   or assist in its implementation may be prepared, copied, published
1893	   and distributed, in whole or in part, without restriction of any
1894	   kind, provided that the above copyright notice and this paragraph are
1895	   included on all such copies and derivative works.  However, this
1896	   document itself may not be modified in any way, such as by removing
1897	   the copyright notice or references to the Internet Society or other
1898	   Internet organizations, except as needed for the  purpose of
1899	   developing Internet standards in which case the procedures for
1900	   copyrights defined in the Internet Standards process must be
1901	   followed, or as required to translate it into languages other than
1902	   English.

1904	   The limited permissions granted above are perpetual and will not be
1905	   revoked by the Internet Society or its successors or assigns.

1907	   This document and the information contained herein is provided on an
1908	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
1909	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
1910	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
1911	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
1912	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1914	   The IETF invites any interested party to bring to its attention any
1915	   copyrights, patents or patent applications, or other proprietary
1916	   rights which may cover technology that may be required to practice
1917	   this standard.  Please address the information to the IETF Executive
1918	   Director.

1920	Expires 31 January 2005
1921	                           Table of Contents

1923	Status of this Document  . . . . . . . . . . . . . . . . . . . . . .   1
1924	Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   1
1925	1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .   3
1926	1.1.  Three Reasons to Use ARKs  . . . . . . . . . . . . . . . . . .   3
1927	1.2.  Organizing Support for ARKs  . . . . . . . . . . . . . . . . .   4
1928	1.3.  A Definition of Identifier . . . . . . . . . . . . . . . . . .   5
1929	2.  ARK Anatomy  . . . . . . . . . . . . . . . . . . . . . . . . . .   6
1930	2.1.  The Name Mapping Authority Hostport (NMAH) . . . . . . . . . .   7
1931	2.2.  The Name Assigning Authority Number (NAAN) . . . . . . . . . .   7
1932	2.3.  The Name Part  . . . . . . . . . . . . . . . . . . . . . . . .   8
1933	2.3.1.  Names that Reveal Object Hierarchy . . . . . . . . . . . . .   8
1934	2.3.2.  Names that Reveal Object Variants  . . . . . . . . . . . . .   9
1935	2.3.3.  Hyphens are Ignored  . . . . . . . . . . . . . . . . . . . .  10
1936	2.4.  Normalization and Lexical Equivalence  . . . . . . . . . . . .  11
1937	2.5.  Naming Considerations  . . . . . . . . . . . . . . . . . . . .  11
1938	3.  Assigners of ARKs  . . . . . . . . . . . . . . . . . . . . . . .  13
1939	4.  Finding a Name Mapping Authority . . . . . . . . . . . . . . . .  14
1940	4.1.  Looking Up NMAHs in a Globally Accessible File . . . . . . . .  15
1941	4.2.  Looking up NMAHs Distributed via DNS . . . . . . . . . . . . .  17
1942	5.  Generic ARK Service Definition . . . . . . . . . . . . . . . . .  19
1943	5.1.  Generic ARK Access Service (access, location)  . . . . . . . .  20
1944	5.2.  Generic Policy Service (permanence, naming, etc.)  . . . . . .  20
1945	5.3.  Generic Description Service  . . . . . . . . . . . . . . . . .  21
1946	6.  Overview of the Tiny HTTP URL Mapping Protocol (THUMP) . . . . .  21
1947	7.  Overview of Electronic Resource Citations (ERCs) . . . . . . . .  24
1948	7.1.  ERC Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .  26
1949	7.2.  ERC Stories  . . . . . . . . . . . . . . . . . . . . . . . . .  27
1950	7.3.  The ERC Anchoring Story  . . . . . . . . . . . . . . . . . . .  28
1951	7.4.  ERC Elements . . . . . . . . . . . . . . . . . . . . . . . . .  29
1952	7.5.  ERC Element Values . . . . . . . . . . . . . . . . . . . . . .  31
1953	7.6.  ERC Element Encoding and Dates . . . . . . . . . . . . . . . .  33
1954	7.7.  ERC Stub Records and Internal Support  . . . . . . . . . . . .  34
1955	8.  Advice to Web Clients  . . . . . . . . . . . . . . . . . . . . .  35
1956	9.  Security Considerations  . . . . . . . . . . . . . . . . . . . .  36
1957	10.  Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . .  36
1958	11.  References  . . . . . . . . . . . . . . . . . . . . . . . . . .  37
1959	12.  Appendix:  ARK Implementations  . . . . . . . . . . . . . . . .  38
1960	13.  Appendix:  Current ARK Name Authority Table . . . . . . . . . .  38
1961	14.  Copyright Notice  . . . . . . . . . . . . . . . . . . . . . . .  40