idnits 2.17.1 draft-kunze-ark-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2122. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 2114), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 36. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document seems to lack an RFC 3979 Section 5, para. 1 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 2 IPR Disclosure Acknowledgement. ** The document seems to lack an RFC 3979 Section 5, para. 3 IPR Disclosure Invitation. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 45 longer pages, the longest (page 2) being 63 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 46 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 4 instances of too long lines in the document, the longest one being 5 characters in excess of 72. ** The abstract seems to contain references ([Qualifier]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 8 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 771 has weird spacing: '...eful to remem...' == Line 940 has weird spacing: '... regexp repla...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (19 February 2005) is 6998 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'Qualifier' is mentioned on line 401, but not defined == Unused Reference: 'MD5' is defined on line 1944, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ANVL' -- Possible downref: Non-RFC (?) normative reference: ref. 'ARK' -- Possible downref: Non-RFC (?) normative reference: ref. 'DCORE' -- Possible downref: Non-RFC (?) normative reference: ref. 'DERC' -- Possible downref: Non-RFC (?) normative reference: ref. 'DOI' -- Possible downref: Non-RFC (?) normative reference: ref. 'ERC' -- Possible downref: Non-RFC (?) normative reference: ref. 'Handle' ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. 'MD5') ** Obsolete normative reference: RFC 2915 (ref. 'NAPTR') (Obsoleted by RFC 3401, RFC 3402, RFC 3403, RFC 3404) -- Possible downref: Non-RFC (?) normative reference: ref. 'NLMPerm' -- Possible downref: Non-RFC (?) normative reference: ref. 'NOID' -- Possible downref: Non-RFC (?) normative reference: ref. 'PURL' ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) -- Possible downref: Non-RFC (?) normative reference: ref. 'TEMPER' -- Possible downref: Non-RFC (?) normative reference: ref. 'THUMP' ** Obsolete normative reference: RFC 2396 (ref. 'URI') (Obsoleted by RFC 3986) ** Downref: Normative reference to an Informational RFC: RFC 2288 (ref. 'URNBIB') ** Obsolete normative reference: RFC 2141 (ref. 'URNSYN') (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2611 (ref. 'URNNID') (Obsoleted by RFC 3406) Summary: 22 errors (**), 0 flaws (~~), 9 warnings (==), 16 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet-Draft: draft-kunze-ark-09.txt J. Kunze 3 ARK Identifier Scheme University of California (UCOP) 4 Expires 19 August 2005 R. P. C. Rodgers 5 US National Library of Medicine 6 19 February 2005 8 The ARK Persistent Identifier Scheme 10 Status of this Document 12 By submitting this Internet-Draft, each author represents that any 13 applicable patent or other IPR claims of which he or she is aware 14 have been disclosed, or will be disclosed, and any of which he or she 15 become aware will be disclosed, in accordance with RFC 3668. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Distribution of this document is unlimited. Please send comments to 34 jak@ucop.edu. 36 Copyright (C) The Internet Society (2005). All Rights Reserved. 38 Abstract 40 The ARK (Archival Resource Key) naming scheme is designed to 41 facilitate the high-quality and persistent identification of 42 information objects. A founding principle of the ARK is that 43 persistence is purely a matter of service and is neither inherent in 44 an object nor conferred on it by a particular naming syntax. The best 45 that an identifier can do is to lead users to the services that 46 support persistence. The term ARK itself refers both to the scheme 47 and to any single identifier that conforms to it. An ARK has five 48 components: 50 [http://NMAH/]ark:/NAAN/Name[Qualifier] 52 an optional and mutable Name Mapping Authority Hostport, the "ark:" 53 label, the Name Assigning Authority Number (NAAN), the assigned Name, 54 and an optional and possibly mutable Qualifier supported by the NMA. 55 The NAAN and Name together form the immutable persistent identifier 56 for the object. An ARK is a special kind of URL that connects users 57 to three things: the named object, its metadata, and the provider's 58 promise about its persistence. When entered into the location field 59 of a Web browser, the ARK leads the user to the named object. That 60 same ARK, followed by a single question mark ('?'), returns a brief 61 metadata record that is both human- and machine-readable. When the 62 ARK is followed by dual question marks ('??'), the returned metadata 63 contains a commitment statement from the current provider. Tools 64 exist for minting, binding, and resolving ARKs. 66 1. Introduction 68 This document describes a scheme for the high-quality naming of 69 information resources. The scheme, called the Archival Resource Key 70 (ARK), is well suited to long-term access and identification of any 71 information resources that accommodate reasonably regular electronic 72 description. This includes digital documents, databases, software, 73 and websites, as well as physical objects (books, bones, statues, 74 etc.) and intangible objects (chemicals, diseases, vocabulary terms, 75 performances). Hereafter the term "object" refers to an information 76 resource. The term ARK itself refers both to the scheme and to any 77 single identifier that conforms to it. A reasonably concise and 78 accessible overview and rationale for the scheme is available at 79 [ARK]. 81 Schemes for persistent identification of network-accessible objects 82 are not new. In the early 1990's, the design of the Uniform Resource 83 Name [URNSYN] responded to the observed failure rate of URLs by 84 articulating an indirect, non-hostname-based naming scheme and the 85 need for responsible name management. Meanwhile, promoters of the 86 Digital Object Identifier [DOI] succeeded in building a community of 87 providers around a mature software system [Handle] that supports name 88 management. The Persistent Uniform Resource Locator [PURL] was 89 another scheme that has the unique advantage of working with 90 unmodified web browsers. ARKs represent an approach that attempts to 91 build on the strengths and to avoid the weaknesses of the other 92 schemes. 94 A founding principle of the ARK is that persistence is purely a 95 matter of service. Persistence is neither inherent in an object nor 96 conferred on it by a particular naming syntax. Nor is the technique 97 of name indirection - upon which URNs, Handles, DOIs, and PURLs are 98 founded - of central importance. Name indirection is an ancient and 99 well-understood practice; new mechanisms for it keep appearing and 100 distracting practitioner attention, with the Domain Name System [DNS] 101 being a particularly dazzling and elegant example. What is often 102 forgotten is that maintenance of an indirection table is the 103 overwhelming and unavoidable cost to the organization providing 104 persistence, and that cost is equivalent across naming schemes. That 105 indirection has always been a native part the web while being so 106 lightly utilized for the persistence of web-based objects is an 107 indication of how unsuited most organizations are to the task of 108 table maintenance and to the overall challenge of digital permanence. 110 Persistence is achieved through a provider's successful stewardship 111 of objects and their identifiers. The highest level of persistence 112 will be reinforced by a provider's robust contingency, redundancy, 113 and succession strategies. It is further safeguarded to the extent 114 that a provider's mission is shielded from marketplace and political 115 instabilities. These are by far the major challenges confronting 116 persistence providers, and no identifier scheme has any direct impact 117 on them. 119 Given the limited ability of any naming scheme to positively 120 contribute to the considerable undertaking of digital permanence, it 121 is legitimate to ask whether a given scheme might itself actually 122 become a liability as the provider carries objects and infrastructure 123 into the technologically evolving future. It is in response to this 124 question that the ARK scheme tries to be simple, transparent, and 125 free of proprietary components, vendor relationships, and special- 126 purpose global infrastructure. 128 1.1. Three Reasons to Use ARKs 130 The first requirement of an ARK is to give users a link from an 131 object to a promise of stewardship for it. That promise is a multi- 132 faceted covenant that binds the word of an identified service 133 provider to a specific set of responsibilities. No one can tell if 134 successful stewardship will take place because no one can predict the 135 future. Reasonable conjecture, however, may be based on past 136 performance. There must be a way to tie a promise of persistence to 137 a provider's demonstrated or perceived ability - its reputation - in 138 that arena. Provider reputations would then rise and fall as 139 promises are observed variously to be kept and broken. This is 140 perhaps the best way we have for gauging the strength of any 141 persistence promise. Note that over time, current providers have 142 nothing to do with the intentions of the original assigners of names. 144 The second requirement of an ARK is to give users a link from an 145 object to a description of it. The problem with a naked identifier 146 is that without a description real identification is incomplete. 147 Identifiers common today are relatively opaque, though some contain 148 ad hoc clues that reflect brief life cycle periods such as the 149 address of a short stay in a filesystem hierarchy. Possession of 150 both an identifier and an object is some improvement, but positive 151 identification may still be uncertain since the object itself might 152 not include a matching identifier or might not carry evidence obvious 153 enough to reveal its identity without significant research. In 154 either case, what is called for is a record bearing witness to the 155 identifier's association with the object, as supported by a recorded 156 set of object characteristics. This descriptive record is partly an 157 identification "receipt" with which users and archivists can verify 158 an object's identity after brief inspection and a plausible match 159 with recorded characteristics such as title and size. 161 The final requirement of an ARK is to give users a link to the object 162 itself (or to a copy) if at all possible. Persistent access is the 163 central duty of an ARK. Persistent identification plays a vital 164 supporting role but, strictly speaking, it can be construed as no 165 more than a record attesting to the original assignment of a never- 166 reassigned identifier. Object access may not be feasible for various 167 reasons, such as catastrophic loss of the object, a licensing 168 agreement that keeps an archive "dark" for a period of years, or when 169 an object's own lack of tangible existence confuses normal concepts 170 of access (e.g., a vocabulary term might be accessed through its 171 definition). In such cases the ARK's identification role assumes a 172 much higher profile. But attempts to simplify the persistence 173 problem by decoupling access from identification and concentrating 174 exclusively on the latter are of questionable utility. A perfect 175 system for assigning forever unique identifiers might be created, but 176 if it did so without reducing access failure rates, no one would be 177 interested. The central issue - which may be summed up as the "HTTP 178 404 Not Found" problem - would not have been addressed. 180 1.2. Organizing Support for ARKs 182 An organization and the user community it serves can often be seen to 183 struggle with two different areas of persistent identification: the 184 Our Stuff problem and the Their Stuff problem. In the Our Stuff 185 problem, we in the organization want our own objects to acquire 186 persistent names. Since we possess or control these objects, our 187 organization tackles the Our Stuff problem directly. Whether or not 188 the objects are named by ARKs, our organization is the responsible 189 party, so it can plan for, maintain, and make commitments about the 190 objects. 192 In the Their Stuff problem, we in the organization want others' 193 objects to acquire persistent names. These are objects that we do 194 not own or control, but some of which are critically important to us. 195 But because they are beyond our influence as far as support is 196 concerned, creating and maintaining persistent identifiers for Their 197 Stuff is not especially purposeful or feasible for us to do. There 198 is little that we can do about someone else's stuff except encourage 199 them to find or become providers of persistence services. 201 Co-location of persistent access and identification services is 202 natural. Any organization that undertakes ongoing support of true 203 persistent identification (which includes description) is well-served 204 if it controls, owns, or otherwise has clear internal access to the 205 identified objects, and this gives it an advantage if it wishes also 206 to support persistent access to outsiders. Conversely, persistent 207 access to outsiders requires orderly internal collection management 208 procedures that include monitoring, acquisition, verification, and 209 change control over objects, which in turn requires object 210 identifiers persistent enough to support auditable record keeping 211 practices. 213 Although, organizing ARK services under one roof thus tends to make 214 sense, object hosting can successfully be separated from name 215 mapping. An example is when a name mapping authority centrally 216 provides uniform resolution services via a protocol gateway on behalf 217 of organizations that host objects behind a variety of access 218 protocols. It is also reasonable to build value-added description 219 services that rely on the underlying services of a set of mapping 220 authorities. 222 Supporting ARKs is not for every organization. By requiring 223 specific, revealed commitments to preservation, to object access, and 224 to description, the bar for providing ARK services is higher than for 225 some other identifier schemes. On the other hand, it would be hard 226 to grant credence to a persistence promise from an organization that 227 could not muster the minimum ARK services. Not that there isn't a 228 business model for an ARK-like, description-only service built on top 229 of another organization's full complement of ARK services. For 230 example, there might be competition at the description level for 231 abstracting and indexing a body of scientific literature archived in 232 a combination of open and fee-based repositories. The description- 233 only service would have no direct commitment to the objects, but 234 would act as intermediary, forwarding commitment statements from 235 object hosting to requestors. 237 1.3. Definition of Identifier 239 An identifier is not a string of character data - an identifier is an 240 association between a string of data and an object. This abstraction 241 is necessary because without it a string is just data. It's nonsense 242 to talk about a string's breaking, or about its being strong, 243 maintained, and authentic. But as a representative of an 244 association, a string can do, metaphorically, the things that we 245 expect of it. 247 Without regard to whether an object is physical, digital, or 248 conceptual, to identify it is to claim an association between it and 249 a representative string, such as "Jane" or "ISBN 0596000278". What 250 gives a claim credibility is a set of verifiable assertions, or 251 metadata, about the object, such as age, height, title, or number of 252 pages. In other words, the association is made manifest by a record 253 (e.g., a cataloging or other metadata record) that vouches for it. 255 In the complete absence of any testimony (metadata) regarding an 256 association, a would-be identifier string is a meaningless sequence 257 of characters. To keep an externally visible but otherwise internal 258 string from being perceived as an identifier by outsiders, for 259 example, it suffices for an organization not to disclose the nature 260 of its association. For our immediate purpose, actual existence of 261 an association record is more important than its authenticity or 262 verifiability, which are outside the scope of this specification. 264 It is a gift to the identification process if an object carries its 265 own name as an inseparable part of itself, such as an identifier 266 imprinted on the first page of a document or embedded in a data 267 structure element of a digital document header. In cases where the 268 object is large, unwieldy, or unavailable (such as when licensing 269 restrictions are in effect), a metadata record that includes the 270 identifier string will usually suffice. That record becomes a 271 conveniently manipulable object surrogate, acting as both an 272 association "receipt" and "declaration". 274 Note that our definition of identifier extends the one in use for 275 Uniform Resource Identifiers [URI]. The present document still 276 sometimes (ab)uses the terms "ARK" and "identifier" as shorthand for 277 the string part of an identifier, but the context should make the 278 meaning clear. 280 2. ARK Anatomy 282 An ARK is represented by a sequence of characters (a string) that 283 contains the label, "ark:", optionally preceded by the beginning part 284 of a URL. Here is a diagrammed example. 286 http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff 287 \___________________/ \__/ \___/ \______/ \____________/ 288 (replaceable) | | | Qualifier 289 | ARK Label | | (NMA-supported) 290 | | | 291 Name Mapping Authority | Name (NAA-assigned) 292 Hostport (NMAH) | 293 Name Assigning Authority Number (NAAN) 295 The ARK syntax can be summarized, 297 [http://NMAH/]ark:/NAAN/Name[Qualifier] 299 where the NMAH and Qualifier parts are in brackets to indicate that 300 they are optional. 302 2.1. The Name Mapping Authority Hostport (NMAH) 304 Before the "ark:" label may appear an optional Name Mapping Authority 305 Hostport (NMAH) that is a temporary address where ARK service 306 requests may be sent. It consists of "http://" (or any service 307 specification valid for a URL) followed by an Internet hostname or 308 hostport combination having the same format and semantics as the 309 hostport part of a URL. The most important thing about the NMAH is 310 that it is "identity inert" from the point of view of object 311 identification. In other words, ARKs that differ only in the 312 optional NMAH part identify the same object. Thus, for example, the 313 following three ARKs are synonyms for just one information object: 315 http://loc.gov/ark:/12025/654xz321 316 http://rutgers.edu/ark:/12025/654xz321 317 ark:/12025/654xz321 319 Strictly speaking, in the realm of digital objects, these ARKs may 320 lead over time to somewhat different or diverging instances of the 321 originally named object. In an ideal world, divergence of persistent 322 objects is not desirable, but it is widely believed that digital 323 preservation efforts will inevitably lead to alterations in some 324 original objects (e.g, a format migration in order to preserve the 325 ability to display a document). If any of those objects are held 326 redundantly in more than one organization (a common preservation 327 strategy), chances are small that all holding organizations will 328 perform the same precise transformations and all maintain the same 329 object metadata. More significant divergence would be expected when 330 the holding organizations serve different audiences or compete with 331 each other. 333 The NMAH part makes an ARK into an actionable URL. As with many 334 internet parameters, it is helpful to approach the NMAH being liberal 335 in what you accept and conservative in what you propose. From the 336 recipient's point of view, the NMAH part should be treated as 337 temporary, disposable, and replaceable. From the NMA's point of 338 view, it should be chosen with the greatest concern for longevity. A 339 carefully chosen NMAH should be at least as permanent as the 340 providing organization's own hostname. In the case of a national or 341 university library, for example, there is no reason why the NMAH 342 should not be considerably more permanent than soft-funded proxy 343 hostnames such as hdl.handle.net, dx.doi.org, and purl.org. In 344 general and over time, however, it is not unexpected for an NMAH 345 eventually to stop working and require replacement with the NMAH of a 346 currently active service provider. 348 This replacement relies on a mapping authority "resolver" discovery 349 process, of which two alternate methods are outlined in a later 350 section. The ARK, URN, Handle, and DOI schemes all use a resolver 351 discovery model that sooner or later requires matching the original 352 assigning authority with a current provider servicing that 353 authority's named objects; once found, the resolver at that provider 354 performs what amounts to a redirect to a place where the object is 355 currently held. All the schemes rely on the ongoing functionality of 356 currently mainstream technologies such as the Domain Name System 357 [DNS] and web browsers. The Handle and DOI schemes in addition 358 require that the Handle protocol layer and global server grid be 359 available at all times. 361 The practice of prepending "http://" and an NMAH to an ARK is a way 362 of creating an actionable identifier by a method that is itself 363 temporary. Assuming that infrastructure supporting [HTTP] 364 information retrieval will no longer be available one day, ARKs will 365 then have to be converted into new kinds of actionable identifiers. 367 By that time, if ARKs see widespread use, web browsers would 368 presumably evolve to perform this (currently simple) transformation 369 automatically. 371 2.2. The ARK Label Part - ark: 373 The label part distinguishes an ARK from an ordinary identifier. In 374 a URL found in the wild, the string, "ark:/", indicates that the URL 375 stands a reasonable chance of being an ARK. If the context warrants, 376 verification that it actually is an ARK can be done by testing it for 377 existence of the three ARK services. 379 Since nothing about an identifier syntax directly affects 380 persistence, the "ark:" label (like "urn:", "doi:", and "hdl:") 381 cannot tell you whether the identifier is persistent or whether the 382 object is available. It does tell you that the original Name 383 Assigning Authority (NAA) had some sort of hopes for it, but it 384 doesn't tell you whether that NAA is still in existence, or whether a 385 decade ago it ceased to have any responsibility for providing 386 persistence, or whether it ever had any responsibility beyond naming. 388 Only a current provider can say for certain what sort of commitment 389 it intends, and the ARK label suggests that you can query the NMAH 390 directly to find out exactly what kind of persistence is promised. 391 Even if what is promised is impersistence (i.e., a short-term 392 identifier), saying so is valuable information to the recipient. 393 Thus an ARK is a high-functioning identifier in the sense that it 394 provides access to the object, the metadata, and a commitment 395 statement, even if the commitment is explicitly very weak. 397 2.3. The Name Assigning Authority Number (NAAN) 399 Recalling that the general form of the ARK is, 401 [http://NMAH/]ark:/NAAN/Name[Qualifier] 403 the part of the ARK directly following the "ark:" is the Name 404 Assigning Authority Number (NAAN) enclosed in `/' (slash) characters. 405 This part is always required, as it identifies the organization that 406 originally assigned the Name of the object. It is used to discover a 407 currently valid NMAH and to provide top-level partitioning of the 408 space of all ARKs. NAANs are registered in a manner similar to URN 409 Namespaces, but they are pure numbers consisting of 5 digits or 9 410 digits. Thus, the first 100,000 registered NAAs fit compactly into 411 the 5 digits, and if growth warrants, the next billion fit into the 9 412 digit form. In either case the fixed odd numbers of digits helps 413 reduce the chances of finding a NAAN out of context and confusing it 414 with nearby quantities such as 4-digit dates. 416 2.4. The Name Part 418 The part of the ARK just after the NAAN is the Name assigned by the 419 NAA, and it is also required. Semantic opaqueness in the Name part 420 is strongly encouraged in order to reduce an ARK's vulnerability to 421 era- and language-specific change. Identifier strings containing 422 linguistic fragments can create support difficulties down the road. 423 No matter how appropriate or even meaningless they are today, such 424 fragments may one day create confusion, give offense, or infringe on 425 a trademark as the semantic environment around us and our communities 426 evolves. 428 Names that look more or less like numbers avoid common problems that 429 defeat persistence and international acceptance. The use of digits 430 is highly recommended. Mixing in non-vowel alphabetic characters a 431 couple at a time is a relatively safe and easy way to achieve a 432 denser namespace (more possible names for a given length of the name 433 string). Such names have a chance of aging and traveling well. 434 Tools exists that mint, bind, and resolve opaque identifiers, with or 435 without check characters [NOID]. More on naming considerations is 436 given in a subsequent section. 438 2.5. The Qualifier Part 440 The part of the ARK following the NAA-assigned Name is an optional 441 Qualifier. It is a string that extends the base ARK in order to 442 create a kind of service entry point into the object named by the 443 NAA. At the discretion of the providing NMA, such a service entry 444 point permits an ARK to support access to individual hierarchical 445 components and subcomponents of an object, and to variants (versions, 446 languages, formats) of components. A Qualifier may be invented by 447 the NAA or by any NMA servicing the object. 449 In form, the Qualifier is a ComponentPath, or a VariantPath, or a 450 ComponentPath followed by a VariantPath. A VariantPath is introduced 451 and subdivided by the reserved character `.', and a ComponentPath is 452 introduced and subdivided by the reserved character `/'. In this 453 example, 455 http://foobar.zaf.org/ark:/12025/654xz321/s3/f8.05v.tiff 457 the string "/s3/f8" is a ComponentPath and the string ".05v.tiff" is 458 a VariantPath. The ARK Qualifier is a formalization of some current 459 mainstream URL syntax conventions, but in ARKs the formalization 460 specifically reserves meanings that permit recipients to make strong 461 inferences about logical subobject containment and equivalence solely 462 from the form of the received identifiers and without having to 463 inspect metadata records in order to discover such relationships. 464 NMAs are free not to disclose any of these relationships merely by 465 avoiding the reserved characters above. Hierarchical components and 466 variants are discussed further in the next two sections. 468 The Qualifier, if present, differs from the Name in several important 469 respects. First, a Qualifier may have been assigned either by the 470 NAA or later by the NMA. The assignment of a Qualifier by an NMA 471 effectively amounts to an act of publishing a service entry point 472 within the conceptual object originally named by the NAA. For our 473 purposes, an ARK extended with a Qualifier assigned by an NMA will be 474 called an NMA-qualified ARK. 476 Second, a Qualifier assignment on the part of an NMA is made in 477 fulfillment of its service obligations and may reflect changing 478 service expectations and technology requirements. NMA-qualified ARKs 479 could therefore be transient, even if the base, unqualified ARK is 480 persistent. For example, it would be reasonable for an NMA to 481 support access to an image object through an actionable ARK that is 482 considered persistent even if the experience of that access changes 483 as linking, labeling, and presentation conventions evolve and as 484 format and security standards are updated. For an image "thumbnail", 485 that NMA could also support an NMA-qualified ARK that is considered 486 impersistent because the thumbnail will be replaced with higher 487 resolution images as network bandwidth and CPU speeds increase. At 488 the same time, for an originally scanned, high-resolution master, the 489 NMA could publish an NMA-qualfied ARK that is itself considered 490 persistent. Of course, the NMA must be able to return its separate 491 commitments to unqualified, NAA-assigned ARKs, to NMA-qualified ARKs, 492 and to any NAA-qualified ARKs that it supports. 494 A third difference between a Qualifier and a Name concerns the 495 semantic opaqueness constraint. When an NMA-qualified ARK is to be 496 used as a transient service entry point into a persistent object, the 497 priority given to semantic opaqueness observed by the NAA in the Name 498 part may be relaxed by the NMA in the Qualifier part. If service 499 priorities in the Qualifier take precedence over persistence, short- 500 term usability considerations may recommend somewhat semantically 501 laden Qualifier strings. 503 Finally, not only is the set of Qualifiers supported by an NMA 504 mutable, but different NMAs may support different Qualifier sets for 505 the same NAA-identified object. In this regard the NMAs act 506 independently of each other and of the NAA. 508 The next two sections describe how ARK syntax may be used to declare, 509 or to avoid declaring, certain kinds of relatedness among qualified 510 ARKs. 512 2.5.1. ARKs that Reveal Object Hierarchy 514 An NAA or NMA may choose to reveal the presence of a hierarchical 515 relationship between objects using the `/' (slash) character after 516 the Name part of an ARK. Some authorities will choose not to 517 disclose this information, while others will go ahead and disclose so 518 that manipulators of large sets of ARKs can infer object 519 relationships by simple identifier inspection; for example, this 520 makes it possible for a system to present a collapsed view of a large 521 search result set. 523 If the ARK contains an internal slash after the NAAN, the piece to 524 its left indicates a containing object. For example, publishing an 525 ARK of the form, 527 ark:/12025/654/xz/321 529 is equivalent to publishing three ARKs, 531 ark:/12025/654/xz/321 532 ark:/12025/654/xz 533 ark:/12025/654 535 together with a declaration that the first object is contained in the 536 second object, and that the second object is contained in the third. 538 Revealing the presence of hierarchy is completely up to the assigning 539 authority. It is hard enough to commit to one object's name, let 540 alone to three objects' names and to a specific, ongoing relatedness 541 among them. Thus, regardless of whether hierarchy was present 542 initially, the assigning authority, by not using slashes, reveals no 543 shared inferences about hierarchical or other inter-relatedness in 544 the following ARKs: 546 ark:/12025/654_xz_321 547 ark:/12025/654_xz 548 ark:/12025/654xz321 549 ark:/12025/654xz 550 ark:/12025/654 552 Note that slashes around the ARK's NAAN (/12025/ in these examples) 553 are not part of the ARK's Name and therefore do not indicate the 554 existence of some sort of NAAN super object containing all objects in 555 its namespace. A slash must have at least one non-structural 556 character (one that is neither a slash nor a period) on both sides in 557 order for it to separate recognizable structural components. So 558 initial or final slashes may be removed, and double slashes may be 559 converted into single slashes. 561 2.5.2. ARKs that Reveal Object Variants 563 An NAA or NMA may choose to reveal the possible presence of variant 564 objects or object components using the `.' (period) character after 565 the Name part of an ARK. Some authorities will choose not to 566 disclose this information, while others will go ahead and disclose so 567 that manipulators of large sets of ARKs can infer object 568 relationships by simple identifier inspection; for example, this 569 makes it possible for a system to present a collapsed view of a large 570 search result set. 572 If the ARK contains an internal period after Name, the piece to its 573 left is a base name and the piece to its right, and up to the end of 574 the ARK or to the next period is a suffix. A Name may have more than 575 one suffix, for example, 577 ark:/12025/654.24 578 ark:/12025/xz4/654.24 579 ark:/12025/654.20v.78g.f55 581 There are two main rules. First, if two ARKs share the same base 582 name but have different suffixes, the corresponding objects were 583 considered variants of each other (different formats, languages, 584 versions, etc.) by the assigning authority. Thus, the following ARKs 585 are variants of each other: 587 ark:/12025/654.20v.78g.f55 588 ark:/12025/654.321xz 589 ark:/12025/654.44 591 Second, publishing an ARK with a suffix implies the existence of at 592 least one variant identified by the ARK without its suffix. The ARK 593 otherwise permits no further assumptions about what variants might 594 exist. So publishing the ARK, 596 ark:/12025/654.20v.78g.f55 598 is equivalent to publishing the four ARKs, 600 ark:/12025/654.20v.78g.f55 601 ark:/12025/654.20v.78g 602 ark:/12025/654.20v 603 ark:/12025/654 605 Revealing the possibility of variants is completely up to the 606 assigning authority. It is hard enough to commit to one object's 607 name, let alone to multiple variants' names and to a specific, 608 ongoing relatedness among them. The assigning authority is the sole 609 arbiter of what constitutes a variant within its namespace, and 610 whether to reveal that kind of relatedness by using periods within 611 its names. 613 A period must have at least one non-structural character (one that is 614 neither a slash nor a period) on both sides in order for it to 615 separate recognizable structural components. So initial or final 616 periods may be removed, and double periods may be converted into 617 single periods. Multiple suffixes should be arranged in sorted order 618 (pure ASCII collating sequence) at the end of an ARK. 620 2.6. Character Repertoires 622 The Name and Qualifier parts are strings of visible ASCII characters 623 and should be less than 128 bytes in length. The length restriction 624 keeps the ARK short enough to append ordinary ARK request strings 625 without running into transport restrictions (e.g., within HTTP GET 626 requests). Characters may be letters, digits, or any of these six 627 characters: 629 = # * + @ _ $ 631 The following characters may also be used, but their meanings are 632 reserved: 634 % - . / 636 The characters `/' and `.' are ignored if either appears as the last 637 character of an ARK. If used internally, they allow a name assigning 638 authority to reveal object hierarchy and object variants as 639 previously described. 641 Hyphens are considered to be insignificant and are always ignored in 642 ARKs. A `-' (hyphen) may appear in an ARK for readability, or it may 643 have crept in during the formatting and wrapping of text, but it must 644 be ignored in lexical comparisons. As in a telephone number, hyphens 645 have no meaning in an ARK. It is always safe for an NMA that 646 receives an ARK to remove any hyphens found in it. As a result, like 647 the NMAH, hyphens are "identity inert" in comparing ARKs for 648 equivalence. For example, the following ARKs are equivalent for 649 purposes of comparison and ARK service access: 651 ark:/12025/65-4-xz-321 652 ark:sneezy.dopey.com/12025/654--xz32-1 653 ark:/12025/654xz321 655 The `%' character is reserved for %-encoding all other octets that 656 would appear in the ARK string, in the same manner as for URIs [URI]. 657 A %-encoded octet consists of a `%' followed by two hex digits; for 658 example, "%7d" stands in for `}'. Lower case hex digits are 659 preferred to reduce the chances of false acronym recognition; thus it 660 is better to use "%acT" instead of "%ACT". The character `%' itself 661 must be represented using "%25". As with URNs, %-encoding permits 662 ARKs to support legacy namespaces (e.g., ISBN, ISSN, SICI) that have 663 less restricted character repertoires [URNBIB]. 665 2.7. Normalization and Lexical Equivalence 667 To determine if two or more ARKs identify the same object, the ARKs 668 are compared for lexical equivalence after first being normalized. 669 Since ARK strings may appear in various forms (e.g., having different 670 NMAHs), normalizing them minimizes the chances that comparing two ARK 671 strings for equality will fail unless they actually identify 672 different objects. In a specified-host ARK (one having an NMAH), the 673 NMAH never participates in such comparisons. 675 Normalization of an ARK for the purpose of octet-by-octet equality 676 comparison with another ARK consists of four steps. First, any upper 677 case letters in the "ark:" label and the two characters following a 678 `%' are converted to lower case. The case of all other letters in 679 the ARK string must be preserved. Second, any NMAH part is removed 680 (everything from an initial "http://" up to the next slash) and all 681 hyphens are removed. 683 Third, structural characters (slash and period) are normalized. 684 Initial and final occurrences are removed, and two structural 685 characters in a row (e.g., // or ./) are replaced by the first 686 character, iterating until each occurrence has at least one non- 687 structural character on either side. Finally, if there are any 688 components with a period on the left and a slash on the right, either 689 the component and the preceding period must be moved to the end of 690 the Name part or the ARK must be thrown out as malformed. 692 The fourth and final step is to arrange the suffixes in ASCII 693 collating sequence (that is, to sort them) and to remove duplicate 694 suffixes, if any. It is also permissible to throw out ARKs for which 695 the suffixes are not sorted. 697 The resulting ARK string is now normalized. Comparisons between 698 normalized ARKs are case-sensitive, meaning that upper case letters 699 are considered different from their lower case counterparts. 701 To keep ARK string variation to a minimum, no reserved ARK characters 702 should be %-encoded unless it is deliberately to conceal their 703 reserved meanings. No non-reserved ARK characters should ever be 704 %-encoded. Finally, no %-encoded character should ever appear in an 705 ARK in its decoded form. 707 2.8. Naming Considerations 709 The ARK has different goals from the URI, so it has different 710 character set requirements. Because linguistic constructs imperil 711 persistence, for ARKs non-ASCII character support is unimportant. 712 ARKs and URIs share goals of transcribability and transportability 713 within web documents, so characters are required to be visible, non- 714 conflicting with HTML/XML syntax, and not subject to tampering during 715 transmission across common transport gateways. Add the goal of 716 making an undelimited ARK recognizable in running prose, as in 717 ark:/12025/=@_22*$, and certain punctuation characters (e.g., comma, 718 period) end up being excluded from the ARK lest the end of a phrase 719 or sentence be mistaken for part of the ARK. 721 A valuable technique for provision of persistent objects is to try to 722 arrange for the complete identifier to appear on, with, or near its 723 retrieved object. An object encountered at a moment in time when its 724 discovery context has long since disappeared could then easily be 725 traced back to its metadata, to alternate versions, to updates, etc. 726 This has seen reasonable success, for example, in book publishing and 727 software distribution. 729 If persistence is the goal, a deliberate local strategy for 730 systematic name assignment is crucial. Names must be chosen with 731 great care. Poorly chosen and managed names will devastate any 732 persistence strategy, and they do not discriminate based on naming 733 scheme. Whether a mistakenly re-assigned identifier is a URN, DOI, 734 PURL, URL, or ARK, the damage - failed access and confusion - is not 735 mitigated more in one scheme than in another. Conversely, in-house 736 efforts to manage names responsibly will go much further towards 737 safeguarding persistence than any choice of naming scheme or name 738 resolution technology. 740 Hostnames appearing in any identifier meant to be persistent must be 741 chosen with extra care. The tendency in hostname selection has 742 traditionally been to choose a token with recognizable attributes, 743 such as a corporate brand, but that tendency wreaks havoc with 744 persistence that is supposed to outlive brands, corporations, subject 745 classifications, and natural language semantics (e.g., what did the 746 three letters "gay" mean in 1958, 1978, and 1998?). Today's 747 recognized and correct attributes are tomorrow's stale or incorrect 748 attributes. In making hostnames (any names, actually) long-term 749 persistent, it helps to eliminate recognizable attributes to the 750 extent possible. This affects selection of any name based on URLs, 751 including PURLs and the explicitly disposable NMAHs. There is no 752 excuse for a provider that manages its internal names impeccably not 753 to exercise the same care in choosing what could be an exceptionally 754 durable hostname, especially if it would form the prefix for all the 755 provider's URL-based external names. Registering an opaque hostname 756 in the ".org" or ".net" domain would not be a bad start. 758 Dubious persistence speculation does not make selecting naming 759 strategies any easier. For example, despite rumors to the contrary, 760 there are really no obvious reasons why the organizations registering 761 DNS names, URN Namespaces, and DOI publisher IDs should have among 762 them one that is intrinsically more fallible than the next. 763 Moreover, it is a misconception that the demise of DNS and of HTTP 764 need adversely affect the persistence of URLs. At such a time, 765 certainly URLs from the present day might not then be actionable by 766 our present-day mechanisms, but resolution systems for future non- 767 actionable URLs are no harder to imagine than resolution systems for 768 present-day non-actionable URNs and DOIs. There is no more stable a 769 namespace than one that is dead and frozen, and that would then 770 characterize the space of names bearing the "http://" prefix. It is 771 useful to remember that just because hostnames have been carelessly 772 chosen in their brief history does not mean that they are unsuitable 773 in NMAHs (and URLs) intended for use in situations demanding the 774 highest level of persistence available in the Internet environment. 775 A well-planned name assignment strategy is everything. 777 3. Assigners of ARKs 779 A Name Assigning Authority (NAA) is an organization that creates (or 780 delegates creation of) long-term associations between identifiers and 781 information objects. Examples of NAAs include national libraries, 782 national archives, and publishers. An NAA may arrange with an 783 external organization for identifier assignment. The US Library of 784 Congress, for example, allows OCLC (the Online Computer Library 785 Center, a major world cataloger of books) to create associations 786 between Library of Congress call numbers (LCCNs) and the books that 787 OCLC processes. A cataloging record is generated that testifies to 788 each association, and the identifier is included by the publisher, 789 for example, in the front matter of a book. 791 An NAA does not so much create an identifier as create an 792 association. The NAA first draws an unused identifier string from 793 its namespace, which is the set of all identifiers under its control. 794 It then records the assignment of the identifier to an information 795 object having sundry witnessed characteristics, such as a particular 796 author and modification date. A namespace is usually reserved for an 797 NAA by agreement with recognized community organizations (such as 798 IANA and ISO) that all names containing a particular string be under 799 its control. In the ARK an NAA is represented by the Name Assigning 800 Authority Number (NAAN). 802 The ARK namespace reserved for an NAA is the set of names bearing its 803 particular NAAN. For example, all strings beginning with 804 "ark:/12025/" are under control of the NAA registered under 12025, 805 which might be the National Library of Finland. Because each NAA has 806 a different NAAN, names from one namespace cannot conflict with those 807 from another. Each NAA is free to assign names from its namespace 808 (or delegate assignment) according to its own policies. These 809 policies must be documented in a manner similar to the declarations 810 required for URN Namespace registration [URNNID]. 812 For now, registration of ARK NAAs is in a bootstrapping phase. To 813 register, please read about the mapping authority discovery file in 814 the next section and send email to ark@cdlib.org. 816 4. Finding a Name Mapping Authority 818 In order to derive an actionable identifier (these days, a URL) from 819 an ARK, a hostport (hostname or hostname plus port combination) for a 820 working Name Mapping Authority (NMA) must be found. An NMA is a 821 service that is able to respond to the three basic ARK service 822 requests. Relying on registration and client-side discovery, NMAs 823 make known which NAAs' identifiers they are willing to service. 825 Upon encountering an ARK, a user (or client software) looks inside it 826 for the optional NMAH part (the hostport of the NMA's ARK service). 827 If it contains an NMAH that is working, this NMAH discovery step may 828 be skipped; the NMAH effectively uses the beginning of an ARK to 829 cache the results of a prior mapping authority discovery process. If 830 a new NMAH needs to found, the client looks inside the ARK again for 831 the NAAN (Name Assigning Authority Number). Querying a global 832 database, it then uses the NAAN to look up all current NMAHs that 833 service ARKs issued by the identified NAA. The global database is 834 key, and two specific methods for querying it are given in this 835 section. 837 In the interests of long-term persistence, however, ARK mechanisms 838 are first defined in high-level, protocol-independent terms so that 839 mechanisms may evolve and be replaced over time without compromising 840 fundamental service objectives. Either or both specific methods 841 given here may eventually be supplanted by better methods since, by 842 design, the ARK scheme does not depend on a particular method, but 843 only on having some method to locate an active NMAH. 845 At the time of issuance, at least one NMAH for an ARK should be 846 prepared to service it. That NMA may or may not be administered by 847 the Name Assigning Authority (NAA) that created it. Consider the 848 following hypothetical example of providing long-term access to a 849 cancer research journal. The publisher wishes to turn a profit and 850 the National Library of Medicine wishes to preserve the scholarly 851 record. An agreement might be struck whereby the publisher would act 852 as the NAA and the national library would archive the journal issue 853 when it appears, but without providing direct access for the first 854 six months. During the first six months of peak commercial 855 viability, the publisher would retain exclusive delivery rights and 856 would charge access fees. Again, by agreement, both the library and 857 the publisher would act as NMAs, but during that initial period the 858 library would redirect requests for issues less than six months old 859 to the publisher. At the end of the waiting period, the library 860 would then begin servicing requests for issues older than six months 861 by tapping directly into its own archives. Meanwhile, the publisher 862 might routinely redirect incoming requests for older issues to the 863 library. Long-term access is thereby preserved, and so is the 864 commercial incentive to publish content. 866 Although it will be common for an NAA also to run an NMA service, it 867 is never a requirement. Over time NAAs and NMAs will come and go. 868 One NMA will succeed another, and there might be many NMAs serving 869 the same ARKs simultaneously (e.g., as mirrors or as competitors). 870 There might also be asymmetric but coordinated NMAs as in the 871 library-publisher example above. 873 4.1. Looking Up NMAHs in a Globally Accessible File 875 This subsection describes a way to look up NMAHs using a simple name 876 authority table represented as a plain text file. For efficient 877 access the file may be stored in a local filesystem, but it needs to 878 be reloaded periodically to incorporate updates. It is not expected 879 that the size of the file or frequency of update should impose an 880 undue maintenance or searching burden any time soon, for even 881 primitive linear search of a file with ten-thousand NAAs is a 882 subsecond operation on modern server machines. The proposed file 883 strategy is similar to the /etc/hosts file strategy that supported 884 Internet host address lookup for a period of years before the advent 885 of DNS. 887 The name authority table file is updated on an ongoing basis and is 888 available for copying over the internet from the California Digital 889 Library at http://www.cdlib.org/inside/diglib/ark/natab and from a 890 number of mirror sites. The file contains comment lines (lines that 891 begin with `#') explaining the format and giving the file's 892 modification time, reloading address, and NAA registration 893 instructions. There is even a Perl script that processes the file 894 embedded in the file's comments. As of February 2005, currently 895 registered Name Assigning Authorities are: 897 12025 National Library of Medicine 898 12026 Library of Congress 899 12027 National Agriculture Library 900 13030 California Digital Library 901 13038 World Intellectual Property Organization 902 20775 University of California San Diego 903 29114 University of California San Francisco 904 28722 University of California Berkeley 905 15230 Rutgers University Libraries 906 13960 Internet Archive 907 64269 Digital Curation Centre 908 62624 New York University Libraries 909 67531 University of North Texas Libraries 910 27927 Ithaka Electronic-Archiving Initiative 912 A snapshot of the name authority table file appears in an appendix. 914 4.2. Looking up NMAHs Distributed via DNS 916 This subsection introduces a method for looking up NMAHs that is 917 based on the method for discovering URN resolvers described in 918 [NAPTR]. It relies on querying the DNS system already installed in 919 the background infrastructure of most networked computers. A query 920 is submitted to DNS asking for a list of resolvers that match a given 921 NAAN. DNS distributes the query to the particular DNS servers that 922 can best provide the answer, unless the answer can be found more 923 quickly in a local DNS cache as a side-effect of a recent query. 925 Responses come back inside Name Authority Pointer (NAPTR) records. 926 The normal result is one or more candidate NMAHs. 928 In its full generality the [NAPTR] algorithm ambitiously accommodates 929 a complex set of preferences, orderings, protocols, mapping services, 930 regular expression rewriting rules, and DNS record types. This 931 subsection proposes a drastic simplification of it for the special 932 case of ARK mapping authority discovery. The simplified algorithm is 933 called Maptr. It uses only one DNS record type (NAPTR) and restricts 934 most of its field values to constants. The following hypothetical 935 excerpt from a DNS data file for the NAAN known as 12026 shows three 936 example NAPTR records ready to use with the Maptr algorithm. 938 12026.ark.arpa. 939 ;; US Library of Congress 940 ;; order pref flags service regexp replacement 941 IN NAPTR 0 0 "h" "ark" "USLC" lhc.nlm.nih.gov:8080 942 IN NAPTR 0 0 "h" "ark" "USLC" foobar.zaf.org 943 IN NAPTR 0 0 "h" "ark" "USLC" sneezy.dopey.com 945 All the fields are held constant for Maptr except for the "flags", 946 "regexp", and "replacement" fields. The "service" field contains the 947 constant value "ark" so that NAPTR records participating in the Maptr 948 algorithm will not be confused with other NAPTR records. The "order" 949 and "pref" fields are held to 0 (zero) and otherwise ignored for now; 950 the algorithm may evolve to use these fields for ranking decisions 951 when usage patterns and local administrative needs are better 952 understood. 954 When a Maptr query returns a record with a flags field of "h" (for 955 hostport, a Maptr extension to the NAPTR flags), the replacement 956 field contains the NMAH (hostport) of an ARK service provider. When 957 a query returns a record with a flags field of "" (the empty string), 958 the client needs to submit a new query containing the domain name 959 found in the replacement field. This second sort of record exploits 960 the distributed nature of DNS by redirecting the query to another 961 domain name. It looks like this. 963 12345.ark.arpa. 964 ;; Digital Library Consortium 965 ;; order pref flags service regexp replacement 966 IN NAPTR 0 0 "" "ark" "" dlc.spct.org. 968 Here is the Maptr algorithm for ARK mapping authority discovery. In 969 it replace with the NAAN from the ARK for which an NMAH is 970 sought. 972 (1) Initialize the DNS query: type=NAPTR, 973 query=.ark.arpa. 975 (2) Submit the query to DNS and retrieve (NAPTR) records, 976 discarding any record that does not have "ark" for the service 977 field. 979 (3) All remaining records with a flags fields of "h" contain 980 candidate NMAHs in their replacement fields. Set them aside, if 981 any. 983 (4) Any record with an empty flags field ("") has a replacement 984 field containing a new domain name to which a subsequent query 985 should be redirected. For each such record, set 986 query= then go to step (2). When all such records 987 have been recursively exhausted, go to step (5). 989 (5) All redirected queries have been resolved and a set of 990 candidate NMAHs has been accumulated from steps (3). If there 991 are zero NMAHs, exit - no mapping authority was found. If there 992 is one or more NMAH, choose one using any criteria you wish, 993 then exit. 995 A Perl script that implements this algorithm is included here. 997 #!/depot/bin/perl 999 use Net::DNS; # include simple DNS package 1000 my $qtype = "NAPTR"; # initialize query type 1001 my $naa = shift; # get NAAN script argument 1002 my $mad = new Net::DNS::Resolver; # mapping authority discovery 1004 &maptr("$naa.ark.arpa"); # call maptr - that's it 1006 sub maptr { # recursive maptr algorithm 1007 my $dname = shift; # domain name as argument 1008 my ($rr, $order, $pref, $flags, $service, $regexp, 1009 $replacement); 1010 my $query = $mad->query($dname, $qtype); 1011 return # non-productive query 1012 if (! $query || ! $query->answer); 1013 foreach $rr ($query->answer) { 1014 next # skip records of wrong type 1015 if ($rr->type ne $qtype); 1016 ($order, $pref, $flags, $service, $regexp, 1017 $replacement) = split(/\s/, $rr->rdatastr); 1018 if ($flags eq "") { 1019 &maptr($replacement); # recurse 1020 } elsif ($flags eq "h") { 1021 print "$replacement\n"; # candidate NMAH 1022 } 1023 } 1024 } 1026 The global database thus distributed via DNS and the Maptr algorithm 1027 can easily be seen to mirror the contents of the Name Authority Table 1028 file described in the previous section. 1030 5. Generic ARK Service Definition 1032 An ARK request's output is delivered information; examples include 1033 the object itself, a policy declaration (e.g., a promise of support), 1034 a descriptive metadata record, or an error message. The experience 1035 of object delivery is expected to be an evolving mix of information 1036 that reflects changing service expectations and technology 1037 requirements; contemporary examples include such things as an object 1038 summary and component links formatted for human consumption. ARK 1039 services must be couched in high-level, protocol-independent terms if 1040 persistence is to outlive today's networking infrastructural 1041 assumptions. The high-level ARK service definitions listed below are 1042 followed in the next section by a concrete method (one of many 1043 possible methods) for delivering these services with today's 1044 technology. 1046 5.1. Generic ARK Access Service (access, location) 1048 Returns (a copy of) the object or a redirect to the same, although a 1049 sensible object proxy may be substituted. Examples of sensible 1050 substitutes include, 1052 - a table of contents instead of a large complex document, 1053 - a home page instead of an entire web site hierarchy, 1054 - a rights clearance challenge before accessing protected data, 1055 - directions for access to an offline object (e.g., a book), 1056 - a description of an intangible object (a disease, an event), or 1057 - an applet acting as "player" for a large multimedia object. 1059 May also return a discriminated list of alternate object locators. 1060 If access is denied, returns an explanation of the object's current 1061 (perhaps permanent) inaccessibility. 1063 5.2. Generic Policy Service (permanence, naming, etc.) 1065 Returns declarations of policy and support commitments for given 1066 ARKs. Declarations are returned in either a structured metadata 1067 format or a human readable text format; sometimes one format may 1068 serve both purposes. Policy subareas may be addressed in separate 1069 requests, but the following areas should should be covered: object 1070 permanence, object naming, object fragment addressing, and 1071 operational service support. 1073 The permanence declaration for an object is a rating defined with 1074 respect to an identified permanence provider (guarantor), which will 1075 be the NMA. It may include the following aspects. 1077 (a) "object availability" - whether and how access to the object 1078 is supported (e.g., online 24x7, or offline only), 1080 (b) "identifier validity" - under what conditions the identifier 1081 will be or has been re-assigned, 1083 (c) "content invariance" - under what conditions the content of 1084 the object is subject to change, and 1086 (d) "change history" - access to corrections, migrations, and 1087 revisions, whether through links to the changed objects 1088 themselves or through a document summarizing the change history 1090 One approach to a permanence rating framework, conceived 1091 independently from ARKs, is given in [NLMPerm]. Under ongoing 1092 development and limited deployment at the US National Library of 1093 Medicine, it identifies the following "permanence levels": 1095 Not Guaranteed: No commitment has been made to retain this 1096 resource. It could become unavailable at any time. Its 1097 identifier could be changed. 1099 Permanent: Dynamic Content: A commitment has been made to keep 1100 this resource permanently available. Its identifier will always 1101 provide access to the resource. Its content could be revised or 1102 replaced. 1104 Permanent: Stable Content: A commitment has been made to keep 1105 this resource permanently available. Its identifier will always 1106 provide access to the resource. Its content is subject only to 1107 minor corrections or additions. 1109 Permanent: Unchanging Content: A commitment has been made to 1110 keep this resource permanently available. Its identifier will 1111 always provide access to the resource. Its content will not 1112 change. 1114 Naming policy for an object includes an historical description of the 1115 NAA's (and its successor NAA's) policies regarding differentiation of 1116 objects. Since it the NMA who responds to requests for policy 1117 statements, it is useful for the NMA to be able to produce or 1118 summarize these historical NAA documents. Naming policy may include 1119 the following aspects. 1121 (i) "similarity" - (or "unity") the limit, defined by the NAA, 1122 to the level of dissimilarity beyond which two similar objects 1123 warrant separate identifiers but before which they share one 1124 single identifier, and 1126 (ii) "granularity" - the limit, defined by the NAA, to the level 1127 of object subdivision beyond which sub-objects do not warrant 1128 separately assigned identifiers but before which sub-objects are 1129 assigned separate identifiers. 1131 Subnaming policy for an object describes the qualifiers that the NMA, 1132 in fulfilling its ongoing and evolving service obligations, allows as 1133 extensions to an NAA-assigned ARK. To the conceptual object that the 1134 NAA named with an ARK, the NMA may add component access points and 1135 derivatives (e.g., format migrations in aid of preservation) in order 1136 to provide both basic and value-added services. 1138 Addressing policy for an object includes a description of how, during 1139 access, object components (e.g., paragraphs, sections) or views 1140 (e.g., image conversions) may or may not be "addressed", in other 1141 words, how the NMA permits arguments or parameters to modify the 1142 object delivered as the result of an ARK request. If supported, 1143 these sorts of operations would provide things like byte-ranged 1144 fragment delivery and open-ended format conversions, or any set of 1145 possible transformations that would be too numerous to list or to 1146 identify with separately assigned ARKs. 1148 Operational service support policy includes a description of general 1149 operational aspects of the NMA service, such as after-hours staffing 1150 and trouble reporting procedures. 1152 5.3. Generic Description Service 1154 Returns a description of the object. Descriptions are returned in 1155 either a structured metadata format or a human readable text format; 1156 sometimes one format may serve both purposes. A description must at 1157 a minimum answer the who, what, when, and where questions concerning 1158 an expression of the object. Standalone descriptions should be 1159 accompanied by the modification date and source of the description 1160 itself. May also return discriminated lists of ARKs that are related 1161 to the given ARK. 1163 6. Overview of the Tiny HTTP URL Mapping Protocol (THUMP) 1165 The Tiny HTTP URL Mapping Protocol (THUMP) is a way of taking a key 1166 (a kind of identifier) and asking such questions as, what information 1167 does this identify and how permanent is it? [THUMP] is in fact one 1168 specific method under development for delivering ARK services. The 1169 protocol runs over HTTP to exploit the web browser's current pre- 1170 eminence as user interface to the Internet. THUMP is designed so 1171 that a person can enter ARK requests directly into the location field 1172 of current browser interfaces. Because it runs over HTTP, THUMP can 1173 be simulated and tested within keyboard-based [TELNET] sessions. 1175 The asker (a person or client program) starts with an identifier, 1176 such as an ARK or a URL. The identifier reveals to the asker (or 1177 allows the asker to infer) the Internet host name and port number of 1178 a server system that responds to questions. Here, this is just the 1179 NMAH that is obtained by inspection and possibly lookup based on the 1180 ARK's NAAN. The asker then sets up an HTTP session with the server 1181 system, sends a question via a THUMP request (contained within an 1182 HTTP request), receives an answer via a THUMP response (contained 1183 within an HTTP response), and closes the session. That concludes the 1184 connected portion of the protocol. 1186 A THUMP request is a string of characters beginning with a `?' 1187 (question mark) that is appended to the identifier string. The 1188 resulting string is sent as an argument to HTTP's GET command. 1189 Request strings too long for GET may be sent using HTTP's POST 1190 command. The three most common requests correspond to three 1191 degenerate special cases that keep the user's learning and typing 1192 burden low. First, a simple key with no request at all is the same 1193 as an ordinary access request. Thus a plain ARK entered into a 1194 browser's location field behaves much like a plain URL, and returns 1195 access to the primary identified object, for instance, an HTML 1196 document. 1198 The second special case is a minimal ARK description request string 1199 consisting of just "?". For example, entering the string, 1201 ark.nlm.nih.gov/12025/psbbantu? 1203 into the browser's location field directly precipitates a request for 1204 a metadata record describing the object identified by 1205 ark:/12025/psbbantu. The browser, unaware of THUMP, prepares and 1206 sends an HTTP GET request in the same manner as for a URL. THUMP is 1207 designed so that the response (indicated by the returned HTTP content 1208 type) is normally displayed, whether the output is structured for 1209 machine processing (text/plain) or formatted for human consumption 1210 (text/html). 1212 In the following example THUMP session, each line has been annotated 1213 to include a line number and whether it was the client or server that 1214 sent it. Without going into much depth, the session has four pieces 1215 separated from each other by blank lines: the client's piece (lines 1216 1-3), the server's HTTP/THUMP response headers (4-7), and the body of 1217 the server's response (8-17). The first and last lines (1 and 17) 1218 correspond to the client's steps to start the TCP session and the 1219 server's steps to end it, respectively. 1221 1 C: [opens session] 1222 C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu? HTTP/1.1 1223 C: 1224 S: HTTP/1.1 200 OK 1225 5 S: Content-Type: text/plain 1226 S: THUMP-Status: 0.1 200 OK 1227 S: 1228 S: |set: NLM | 12025/psbbantu? | 20030731 1229 S: | http://ark.nlm.nih.gov/ark:/12025/psbbantu? 1230 10 S: here: 1 | 1 | 1 1231 S: 1232 S: erc: 1233 S: who: Lederberg, Joshua 1234 S: what: Studies of Human Families for Genetic Linkage 1235 15 S: when: 1974 1236 S: where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf 1237 S: [closes session] 1239 The first two server response lines (4-5) above are typical of HTTP. 1240 The next line (6) is peculiar to THUMP, and indicates the THUMP 1241 version and a normal return status. The balance of the response 1242 consists of a record set header (lines 8-10) and a single metadata 1243 record (12-16) that comprises the ARK description service response. 1244 The record set header identifies (8-9) who created the set, what its 1245 title is, when it was created, and where an automated process can 1246 access the set; it ends in a line (10) whose respective sub-elements 1247 indicate that here in this communication the recipient can expect to 1248 find 1 record, starting at the record numbered 1, from a set 1249 consisting of a total of 1 record (i.e., here is the entire set, 1250 consisting of exactly one record). 1252 The returned record (12-16) is in the format of an Electronic 1253 Resource Citation [ERC], which is discussed in more detail in the 1254 next section. For now, note that it contains four elements that 1255 answer the top priority questions regarding an expression of the 1256 object: who played a major role in expressing it, what the 1257 expression was called, when is was created, and where the expression 1258 may be found. This quartet of elements comes up again and again in 1259 ERCs. 1261 The third degenerate special case of an ARK request (and no other 1262 cases will be described in this document) is the string "??", 1263 corresponding to a minimal permanence policy request. It can be seen 1264 in use appended to an ARK (on line 2) in the example session that 1265 follows. 1267 1 C: [opens session] 1268 C: GET http://ark.nlm.nih.gov/ark:/12025/psbbantu?? HTTP/1.1 1269 C: 1270 S: HTTP/1.1 200 OK 1271 5 S: Content-Type: text/plain 1272 S: THUMP-Status: 0.1 200 OK 1273 S: 1274 S: |set: NLM | 12025/psbbantu?? | 20030731 1275 S: | http://ark.nlm.nih.gov/ark:/12025/psbbantu?? 1276 10 S: here: 1 | 1 | 1 1277 S: 1278 S: erc: 1279 S: who: Lederberg, Joshua 1280 S: what: Studies of Human Families for Genetic Linkage 1281 15 S: when: 1974 1282 S: where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf 1283 S: erc-support: 1284 S: who: USNLM 1285 S: what: Permanent, Unchanging Content 1286 20 S: when: 20010421 1287 S: where: http://ark.nlm.nih.gov/yy22948 1288 S: [closes session] 1290 Again, a single metadata record (lines 12-21) is returned, but it 1291 consists of two segments. The first segment (12-16) gives the same 1292 basic citation information as in the previous example. It is 1293 returned in order to establish context for the persistence 1294 declaration in the second segment (17-21). 1296 Each segment in an ERC tells a different story relating to the 1297 object, so although the same four questions (elements) appear in 1298 each, the answers depend on the segment's story type. While the 1299 first segment tells the story of an expression of the object, the 1300 second segment tells the story of the support commitment made to it: 1301 who made the commitment, what the nature of the commitment was, when 1302 it was made, and where a fuller explanation of the commitment may be 1303 found. 1305 7. Overview of Electronic Resource Citations (ERCs) 1307 An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a 1308 simple, compact, and printable record designed to hold data 1309 associated with an information resource. By design, the ERC is a 1310 metadata format that balances the needs for expressive power, very 1311 simple machine processing, and direct human manipulation. 1313 A founding principle of the ERC is that direct human contact with 1314 metadata will be a necessary and sufficient condition for the near 1315 term rapid development of metadata standards, systems, and services. 1316 Thus the machine-processable ERC format must only minimally strain 1317 people's ability to read, understand, change, and transmit ERCs 1318 without their relying on intermediation with specialized software 1319 tools. The basic ERC needs to be succinct, transparent, and 1320 trivially parseable by software. 1322 In the current Internet, it is natural seriously to consider using 1323 XML as an exchange format because of predictions that it will obviate 1324 many ad hoc formats and programs, and unify much of the world's 1325 information under one reliable data structuring discipline that is 1326 easy to generate, verify, parse, and render. It appears, however, 1327 that XML is still only catching on after years of standards work and 1328 implementation experience. The reasons for it are unclear, but for 1329 now very simple XML interpretation is still out of reach. Another 1330 important caution is that XML structures are hard on the eyeballs, 1331 taking up an amount of display (and page) space that significantly 1332 exceeds that of traditional formats. Until these conflicts with ERC 1333 principle are resolved, XML is not a first choice for representing 1334 ERCs. Borrowing instead from the data structuring format that 1335 underlies the successful spread of email and web services, the first 1336 ERC format uses [ANVL], which is based on email and HTTP headers 1337 [RFC822]. There is a naturalness to ANVL's label-colon-value format 1338 (seen in the previous section) that barely needs explanation to a 1339 person beginning to enter ERC metadata. 1341 Besides simplicity of ERC system implementation and data entry 1342 mechanics, ERC semantics (what the record and its constituent parts 1343 mean) must also be easy to explain. ERC semantics are based on a 1344 reformulation and extension of the Dublin Core [DCORE] hypothesis, 1345 which suggests that the fifteen Dublin Core metadata elements have a 1346 key role to play in cross-domain resource description. The ERC 1347 design recognizes that the Dublin Core's primary contribution is the 1348 international, interdisciplinary consensus that identified fifteen 1349 semantic buckets (element categories), regardless of how they are 1350 labeled. The ERC then adds a definition for a record and some 1351 minimal compliance rules. In pursuing the limits of simplicity, the 1352 ERC design combines and relabels some Dublin Core buckets to isolate 1353 a tiny kernel (subset) of four elements for basic cross-domain 1354 resource description. 1356 For the cross-domain kernel, the ERC uses the four basic elements - 1357 who, what, when, and where - to pretend that every object in the 1358 universe can have a uniform minimal description. Each has a name or 1359 other identifier, a location, some responsible person or party, and a 1360 date. It doesn't matter what type of object it is, or whether one 1361 plans to read it, interact with it, smoke it, wear it, or navigate 1362 it. Of course, this approach is flawed because uniformity of 1363 description for some object types requires more semantic contortion 1364 and sacrifice than for others. That is why at the beginning of this 1365 document, the ARK was said to be suited to objects that accommodate 1366 reasonably regular electronic description. 1368 While insisting on uniformity at the most basic level provides 1369 powerful cross-domain leverage, the semantic sacrifice is great for 1370 many applications. So the ERC also permits a semantically rich and 1371 nuanced description to co-exist in a record along with a basic 1372 description. In that way both sophisticated and naive recipients of 1373 the record can extract the level of meaning from it that best suits 1374 their needs and abilities. Key to unlocking the richer description 1375 is a controlled vocabulary of ERC record types (not explained in this 1376 document) that permit knowledgeable recipients to apply defined sets 1377 of additional assumptions to the record. 1379 7.1. ERC Syntax 1381 An ERC record is a sequence of metadata elements ending in a blank 1382 line. An element consists of a label, a colon, and an optional 1383 value. Here is an example of a record with five elements. 1385 erc: 1386 who: Gibbon, Edward 1387 what: The Decline and Fall of the Roman Empire 1388 when: 1781 1389 where: http://www.ccel.org/g/gibbon/decline/ 1391 A long value may be folded (continued) onto the next line by 1392 inserting a newline and indenting the next line. A value can be thus 1393 folded across multiple lines. Here are two example elements, each 1394 folded across four lines. 1396 who/created: University of California, San Francisco, AIDS 1397 Program at San Francisco General Hospital | University 1398 of California, San Francisco, Center for AIDS Prevention 1399 Studies 1400 what/Topic: 1401 Heart Attack | Heart Failure 1402 | Heart 1403 Diseases 1405 An element value folded across several lines is treated as if the 1406 lines were joined together on one long line. For example, the second 1407 element from the previous example is considered equivalent to 1409 what/Topic: Heart Attack | Heart Failure | Heart Diseases 1411 An element value may contain multiple values, each one separated from 1412 the next by a `|' (pipe) character. The element from the previous 1413 example contains three values. 1415 For annotation purposes, any line beginning with a `#' (hash) 1416 character is treated as if it were not present; this is a "comment" 1417 line (a feature not available in email or HTTP headers). For 1418 example, the following element is spread across four lines and 1419 contains two values: 1421 what/Topic: 1422 Heart Attack 1423 # | Heart Failure -- hold off until next review cycle 1424 | Heart Diseases 1426 7.2. ERC Stories 1428 An ERC record is organized into one or more distinct segments, where 1429 where each segment tells a story about a different aspect of the 1430 information resource. A segment boundary occurs whenever a segment 1431 label (an element beginning with "erc") is encountered. The basic 1432 label "erc:" introduces the story of an object's expression (e.g., 1433 its publication, installation, or performance). The label "erc- 1434 about:" introduces the story of an object's content (what it is 1435 about) and "erc-support:" introduces the story of a support 1436 commitment made to it. A story segment that concerns the ERC itself 1437 is introduced by the label "erc-from:". It is an important segment 1438 that tells the story of the ERC's provenance. Elements beginning 1439 with "erc" are reserved for segment labels and their associated story 1440 types. From an earlier example, here is an ERC with two segments. 1442 erc: 1443 who: Lederberg, Joshua 1444 what: Studies of Human Families for Genetic Linkage 1445 when: 1974 1446 where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf 1447 erc-support: 1448 who: NIH/NLM/LHNCBC 1449 what: Permanent, Unchanging Content 1450 # Note to ops staff: date needs verification. 1451 when: 2001 04 21 1452 where: http://ark.nlm.nih.gov/yy22948 1454 Segment stories are told according to journalistic tradition. While 1455 any number of pertinent elements may appear in a segment, priority is 1456 placed on answering the questions who, what, when, and where at the 1457 beginning of each segment so that readers can make the most important 1458 selection or rejection decisions as soon as possible. To make things 1459 simple, the listed ordering of the questions is maintained in each 1460 segment (as it happens most people who have been exposed to this 1461 story telling technique are already familiar with the above 1462 ordering). 1464 The four questions are answered by using corresponding element 1465 labels. The four element labels can be re-used in each story 1466 segment, but their meaning changes depending on the segment (the 1467 story type) in which they appear. In the example above, "who" is 1468 first used to name a document's author and subsequently used to name 1469 the permanence guarantor (provider). Similarly, "when" first lists 1470 the date of object creation and in the next segment lists the date of 1471 a commitment decision. Four labels appearing across three segments 1472 effectively map to twelve semantically distinct elements. Distinct 1473 element meanings are mapped to Dublin Core elements in a later 1474 section. 1476 7.3. The ERC Anchoring Story 1478 Each ERC contains an anchoring story. It is usually the first 1479 segment labeled "erc:" and it concerns an "anchoring" expression of 1480 the object. An "anchoring" expression is the one that a provider 1481 deemed the most suitable basic referent given the audience and 1482 application for which it produced the ERC. If it sounds like the 1483 provider has great latitude in choosing its anchoring expression, it 1484 is because it does. A typical anchoring story in an ERC for a born- 1485 digital document would be the story of the document's release on a 1486 web site; such a document would then be the anchoring expression. 1488 An anchoring story need not be the central descriptive goal of an ERC 1489 record. For example, a museum provider may create an ERC for a 1490 digitized photograph of a painting but choose to anchor it in the 1491 story of the original painting instead of the story of the electronic 1492 likeness; although the ERC may through other segments prove to be 1493 centrally concerned with describing the electronic likeness, the 1494 provider may have chosen this particular anchoring story in order to 1495 make the ERC visible in a way that is most natural to patrons (who 1496 would find the Mona Lisa under da Vinci sooner than they would find 1497 it under the name of the person who snapped the photograph or scanned 1498 the image). In another example, a provider that creates an ERC for a 1499 dramatic play as an abstract work has the task of describing a piece 1500 of intangible intellectual property. To anchor this abstract object 1501 in the concrete world, if only through a derivative expression, it 1502 makes sense for the provider to choose a suitable printed edition of 1503 the play as the anchoring object expression (to describe in the 1504 anchoring story) of the ERC. 1506 The anchoring story has special rules designed to keep ERC processing 1507 simple and predictable. Each of the four basic elements (who, what, 1508 when, and where) must be present, unless a best effort to supply it 1509 fails. In the event of failure, the element still appears but a 1510 special value (described later) is used to explain the missing value. 1511 While the requirement that each of the four elements be present only 1512 applies to the anchoring story segment, as usual these elements 1513 appear at the beginning of the segment and may only be used in the 1514 prescribed order. A minimal ERC would normally consist of just an 1515 anchoring story and the element quartet, as illustrated in the next 1516 example. 1518 erc: 1519 who: National Research Council 1520 what: The Digital Dilemma 1521 when: 2000 1522 where: http://books.nap.edu/html/digital%5Fdilemma 1524 A minimal ERC can be abbreviated so that it resembles a traditional 1525 compact bibliographic citation that is nonetheless completely machine 1526 processable. The required elements and ordering makes it possible to 1527 eliminate the element labels, as shown here. 1529 erc: National Research Council | The Digital Dilemma | 2000 1530 | http://books.nap.edu/html/digital%5Fdilemma 1532 7.4. ERC Elements 1534 As mentioned, the four basic ERC elements (who, what, when, and 1535 where) take on different specific meanings depending on the story 1536 segment in which they are used. By appearing in each segment, albeit 1537 in different guises, the four elements serve as a valuable mnemonic 1538 device - a kind of checklist - for constructing minimal story 1539 segments from scratch. Again, it is only in the anchoring segment 1540 that all four elements are mandatory. 1542 Here are some mappings between ERC elements and Dublin Core [DCORE] 1543 elements. 1545 Segment ERC Element Equivalent Dublin Core Element 1546 --------- ----------- ------------------------------ 1547 erc who Creator/Contributor/Publisher 1548 erc what Title 1549 erc when Date 1550 erc where Identifier 1551 erc-about who 1552 erc-about what Subject 1553 erc-about when Coverage (temporal) 1554 erc-about where Coverage (spatial) 1556 The basic element labels may also be qualified to add nuances to the 1557 semantic categories that they identify. Elements are qualified by 1558 appending a `/' (slash) and a qualifier term. Often qualifier terms 1559 appear as the past tense form of a verb because it makes re-using 1560 qualifiers among elements easier. 1562 who/published: ... 1563 when/published: ... 1564 where/published: ... 1566 Using past tense verbs for qualifiers also reminds providers and 1567 recipients that element values contain transient assertions that may 1568 have been true once, but that tend to become less true over time. 1569 Recipients that don't understand the meaning of a qualifier can fall 1570 back onto the semantic category (bucket) designated by the 1571 unqualified element label. Inevitably recipients (people and 1572 software) will have diverse abilities in understanding elements and 1573 qualifiers. 1575 Any number of other elements and qualifiers may be used in 1576 conjunction with the quartet of basic segment questions. The only 1577 semantic requirement is that they pertain to the segment's story. 1578 Also, it is only the four basic elements that change meaning 1579 depending on their segment context. All other elements have meaning 1580 independent of the segment in which they appear. If an element label 1581 stripped of its qualifier is still not recognized by the recipient, a 1582 second fall back position is to ignore it and rely on the four basic 1583 elements. 1585 Elements may be either Canonical, Provisional, or Local. Canonical 1586 elements are officially recognized via a registry as part of the 1587 metadata vernacular. All elements, qualifiers, and segment labels 1588 used in this document up until now belong to that vernacular. 1589 Provisional elements are also officially recognized via the registry, 1590 but have only been proposed for inclusion in the vernacular. To be 1591 promoted to the vernacular, a provisional element passes through a 1592 vetting process during which its documentation must be in order and 1593 its community acceptance demonstrated. Local elements are any 1594 elements not officially recognized in the registry. The registry 1595 [DERC] is a work in progress. 1597 Local elements can be immediately distinguishable from Canonical or 1598 Provisional elements because all terms that begin with an upper case 1599 letter are reserved for spontaneous local use. No term beginning 1600 with an upper case letter will ever be assigned Canonical or 1601 Provisional status, so it should be safe to use such terms for local 1602 purposes. Any recipient of external ERCs containing such terms will 1603 understand them to be part of the originating provider's local 1604 metadata dialect. Here's an example ERC with three segments, one 1605 local element, and two local qualifiers. The segment boundaries have 1606 been emphasized by comment lines (which, as before, are ignored by 1607 processors). 1609 erc: 1610 who: Bullock, TH | Achimowicz, JZ | Duckrow, RB 1611 | Spencer, SS | Iragui-Madoz, VJ 1612 what: Bicoherence of intracranial EEG in sleep, 1613 wakefulness and seizures 1614 when: 1997 12 00 1615 where: http://cogprints.soton.ac.uk/%{ 1616 documents/disk0/00/00/01/22/index.html %} 1617 in: EEG Clin Neurophysiol | 1997 12 00 | v103, i6, p661-678 1618 IDcode: cog00000122 1619 # ---- new segment ---- 1620 erc-about: 1621 what/Subcategory: Bispectrum | Nonlinearity | Epilepsy 1622 | Cooperativity | Subdural | Hippocampus | Higher moment 1623 # ---- new segment ---- 1624 erc-from: 1625 who: NIH/NLM/NCBI 1626 what: pm9546494 1627 when/Reviewed: 1998 04 18 021600 1628 where: http://ark.nlm.nih.gov/12025/pm9546494? 1630 The local element "IDcode" immediately precedes the "erc-about" 1631 segment, which itself contains an element with the local qualifier 1632 "Subcategory". The second to last element also carries the local 1633 qualifier "Reviewed". Finally, what might be a provisional element 1634 "in" appears near the end of the first segment. It might have been 1635 proposed as a way to complete a citation for an object originally 1636 appearing inside another object (such as an article appearing in a 1637 journal or an encyclopedia). 1639 7.5. ERC Element Values 1641 ERC element values tend to be straightforward strings. If the 1642 provider intends something special for an element, it will so 1643 indicate with markers at the beginning of its value string. The 1644 markers are designed to be uncommon enough that they would not likely 1645 occur in normal data except by deliberate intent. Markers can only 1646 occur near the beginning of a string, and once any octet of non- 1647 marker data has been encountered, no further marker processing is 1648 done for the element value. In the absence of markers the string is 1649 considered pure data; this has been the case with all the examples 1650 seen thus far. The fullest form of an element value with all three 1651 optional markers in place looks like this. 1653 VALUE = [markup_flags] (:ccode) , DATA 1655 In processing, the first non-whitespace character of an ERC element 1656 value is examined. An initial `[' is reserved to introduce a 1657 bracketed set of markup flags (not described in this document) that 1658 ends with `]'. If ERC data is machine-generated, each value string 1659 may be preceded by "[]" to prevent any of its data from being 1660 mistaken for markup flags. Once past the optional markup, the 1661 remaining value may optionally begin with a controlled code. A 1662 controlled code always has the form "(:ccode)", for example, 1664 who: (:unkn) Anonymous 1665 what: (:791) Bee Stings 1667 Any string after such a code is taken to be an uncontrolled (e.g., 1668 natural language) equivalent. The code "unkn" indicates a 1669 conventional explanation for a missing value (stating that the value 1670 is unknown). The remainder of the string makes an equivalent 1671 statement in a form that the provider deemed most suitable to its 1672 (probably human) audience. The code "791" could be a fixed numeric 1673 topic identifier within an unspecified topic vocabulary. Any code 1674 may be ignored by those that do not understand it. 1676 There are several codes to explain different ways in which a required 1677 element's value may go missing. 1679 (:unac) temporarily inaccessible 1680 (:unal) unallowed, suppressed intentionally 1681 (:unap) not applicable, makes no sense 1682 (:unas) value unassigned (e.g., Untitled) 1683 (:unav) value unavailable indefinitely 1684 (:unkn) unknown (e.g., Anonymous, Inconnue) 1685 (:etal) too numerous to list (I). 1686 (:none) never had a value, never will 1687 (:null) explicitly empty 1688 (:tba) to be assigned or announced later 1690 Once past an optional controlled code, the remaining string value is 1691 subjected to one final test. If the first next non-whitespace 1692 character is a `,' (comma), it indicates that the string value is 1693 "sort-friendly". This means that the value is (a) laid out with an 1694 inverted word order useful for sorting items having comparably laid 1695 out element values (items might be the containing ERC records) and 1696 (b) that the value may contain other commas that indicate inversion 1697 points should it become necessary to recover the value in natural 1698 word order. Typically, this feature is used to express Western-style 1699 personal names in family-name-given-name order. It can also be used 1700 wherever natural word order might make sorting tricky, such as when 1701 data contains titles or corporate names. Here are some example 1702 elements. 1704 who: , van Gogh, Vincent 1705 who:,Howell, III, PhD, 1922-1987, Thurston 1706 who:, Acme Rocket Factory, Inc., The 1707 who:, Mao Tse Tung 1708 who:, McCartney, Paul, Sir, 1709 what:, Health and Human Services, United States Government 1710 Department of, The, 1712 There are rules to use in recovering a copy of the value in natural 1713 word order, if desired. The above example strings have the following 1714 natural word order values, respectively. 1716 Vincent van Gogh 1717 Thurston Howell, III, PhD, 1922-1987 1718 The Acme Rocket Factory, Inc. 1719 Mao Tse Tung 1720 Sir Paul McCartney 1721 The United States Government Department of Health and Human Services 1723 7.6. ERC Element Encoding and Dates 1725 Some characters that need to appear in ERC element values might 1726 conflict with special characters used for structuring ERCs, so there 1727 needs to be a way to include them as literal characters that are 1728 protected from special interpretation. This is accomplished through 1729 an encoding mechanism that resembles the %-encoding familiar to [URI] 1730 handlers. 1732 The ERC encoding mechanism also uses `%', but instead of taking two 1733 following hexadecimal digits, it takes one non-alphanumeric character 1734 or two alphabetic characters that cannot be mistaken for hex digits. 1735 It is designed not to be confused with normal web-style %-encoding. 1736 In particular it can be decoded without risking unintended decoding 1737 of normal %-encoded data (which would introduce errors). Here are 1738 the one-character (non-alphanumeric) ERC encoding extensions. 1740 ERC Purpose 1741 --- ------------------------------------------------ 1742 %! decodes to the element separator `|' 1743 %% decodes to a percent sign `%' 1744 %. decodes to a comma `,' 1745 %_ a non-character used as syntax shim 1746 %{ a non-character that begins an expansion block 1747 %} a non-character that ends an expansion block 1749 One particularly useful construct in ERC element values is the pair 1750 of special encoding markers ("%{" and "%}") that indicates a 1751 "expansion" block. Whatever string of characters they enclose will 1752 be treated as if none of the contained whitespace (SPACEs, TABs, 1753 Newlines) were present. This comes in handy for writing long, multi- 1754 part URLs in a readable way. For example, the value in 1756 where: http://foo.bar.org/node%{ 1757 ? db = foo 1758 & start = 1 1759 & end = 5 1760 & buf = 2 1761 & query = foo + bar + zaf 1762 %} 1764 is decoded into an equivalent element, but with a correct and intact 1765 URL: 1767 where: 1768 http://foo.bar.org/node?db=foo&start=1&end=5&buf=2&query=foo+bar+zaf 1770 In a parting word about ERC element values, a commonly recurring 1771 value type is a date, possibly followed by a time. ERC dates use the 1772 [TEMPER] format, taking on one of the following forms: 1774 1999 (four digit year) 1775 2000 12 29 (year, month, day) 1776 2000 12 29 235955 (year, month, day, hour, minute, second) 1778 In dates, all internal whitespace is squeezed out to achieve a 1779 normalized form suitable for lexical comparison and sorting. This 1780 means that the following dates 1782 2000 12 29 235955 (recommended for readability) 1783 2000 12 29 23 59 55 1784 20001229 23 59 55 1785 20001229235955 (normalized date and time) 1787 are all equivalent. The first form is recommended for readability. 1788 The last form (shortest and easiest to compute with) is the 1789 normalized form. Hyphens and commas are reserved to create date 1790 ranges and lists, for example, 1792 1996-2000 (a range of four years) 1793 1952, 1957, 1969 (a list of three years) 1794 1952, 1958-1967, 1985 (a mixed list of dates and ranges) 1795 20001229-20001231 (a range of three days) 1797 7.7. ERC Stub Records and Internal Support 1799 The ERC design introduces the concept of a "stub" record, which is an 1800 incomplete ERC record intended to be supplemented with additional 1801 elements before being released as a standalone ERC record. A stub 1802 ERC record has no minimum required elements. It is just a group of 1803 elements that does not begin with "erc:" but otherwise conforms to 1804 the ERC record syntax. 1806 ERC stubs may be useful in supporting internal procedures using the 1807 ERC syntax. Often they rely on the convenience and accuracy of 1808 automatically supplied elements, even the basic ones. To be ready 1809 for external use, however, an ERC stub must be transformed into a 1810 complete ERC record having the usual required elements. An ERC stub 1811 record can be convenient for metadata embedded in a document, where 1812 elements such as location, modification date, and size - which one 1813 would not omit from an externalized record - are omitted simply 1814 because they are much better supplied by a computation. A separate 1815 local administrative procedure, not defined for ERC's in general, 1816 would effect the promotion of stubs into complete records. 1818 While the ERC is a general-purpose container for exchange of resource 1819 descriptions, it does not dictate how records must be internally 1820 stored, laid out, or assembled by data providers or recipients. 1821 Arbitrary internal descriptive frameworks can support ERCs simply by 1822 mapping (e.g., on demand) local records to the ERC container format 1823 and making them available for export. Therefore, to support ERCs 1824 there is no need for a data provider to convert internal data to be 1825 stored in an ERC format. On the other hand, any provider (such as 1826 one just getting started in the business of resource description) may 1827 choose to store and manipulate local data natively in the ERC format. 1829 8. Advice to Web Clients 1831 This section offers some advice to web client software developers. 1832 It is hard to write about because it tries to anticipate a series of 1833 events that might lead to native web browser support for ARKs. 1835 ARKs are envisaged to appear wherever durable object references are 1836 planned. Library cataloging records, literature citations, and 1837 bibliographies are important examples. In many of these places URLs 1838 (Uniform Resource Locators) currently stand in, and URNs, DOIs, and 1839 PURLs have been proposed as alternatives. 1841 The strings representing ARKs are also envisaged to appear in some of 1842 the places where URLs currently appear: in hypertext links (where 1843 they are not normally shown to users) and in rendered text (displayed 1844 or printed). Internet search engines, for example, tend to include 1845 both actionable and manifest links when listing each item found. A 1846 normal HTML link for which the URL is not displayed looks like this. 1848 Click Here 1850 The same link with an ARK instead of a URL: 1852 Click Here 1854 Web browsers would in general require a small modification to 1855 recognize and convert this ARK, via mapping authority discovery, to 1856 the URL form. 1858 Click Here 1860 A browser that knows how to make that conversion could also 1861 automatically detect and replace a non-working NMAH. 1863 An NAA will typically make known the associations it creates by 1864 publishing them in catalogs, actively advertizing them, or simply 1865 leaving them on web sites for visitors (e.g., users, indexing 1866 spiders) to stumble across in browsing. 1868 9. Security Considerations 1870 The ARK naming scheme poses no direct risk to computers and networks. 1871 Implementors of ARK services need to be aware of security issues when 1872 querying networks and filesystems for Name Mapping Authority 1873 services, and the concomitant risks from spoofing and obtaining 1874 incorrect information. These risks are no greater for ARK mapping 1875 authority discovery than for other kinds of service discovery. For 1876 example, recipients of ARKs with a specified hostport (NMAH) should 1877 treat it like a URL and be aware that the identified ARK service may 1878 no longer be operational. 1880 Apart from mapping authority discovery, ARK clients and servers 1881 subject themselves to all the risks that accompany normal operation 1882 of the protocols underlying mapping services (e.g., HTTP, Z39.50). 1883 As specializations of such protocols, an ARK service may limit 1884 exposure to the usual risks. Indeed, ARK services may enhance a kind 1885 of security by helping users identify long-term reliable references 1886 to information objects. 1888 10. Authors' Addresses 1890 John A. Kunze 1891 California Digital Library 1892 University of California, Office of the President 1893 415 20th St, 4th Floor 1894 Oakland, CA 94612-3550, USA 1896 Fax: +1 510-893-5212 1897 EMail: jak@ucop.edu 1899 R. P. C. Rodgers 1900 US National Library of Medicine 1901 8600 Rockville Pike, Bldg. 38A 1902 Bethesda, MD 20894, USA 1904 Fax: +1 301-496-0673 1905 EMail: rodgers@nlm.nih.gov 1907 11. References 1909 [ANVL] J. Kunze, B. Kahle, et al, "A Name-Value Language", work 1910 in progress, 1911 http://www.cdlib.org/inside/diglib/ark/anvlspec.pdf 1913 [ARK] J. Kunze, "Towards Electronic Persistence Using ARK 1914 Identifiers", Proceedings of the 3rd ECDL Workshop on Web 1915 Archives, August 2003, (PDF) 1916 http://bibnum.bnf.fr/ecdl/2003/proceedings.php?f=kunze 1918 [DCORE] Dublin Core Metadata Initiative, "Dublin Core Metadata 1919 Element Set, Version 1.1: Reference Description", July 1920 1999, http://dublincore.org/documents/dces/. 1922 [DERC] J. Kunze, "Dictionary of the ERC", work in progress within 1923 the Dublin Core Metadata Initiative's Kernel Working 1924 Group, http://dublincore.org/groups/kernel/ 1926 [DNS] P.V. Mockapetris, "Domain Names - Concepts and 1927 Facilities", RFC 1034, November 1987. 1929 [DOI] International DOI Foundation, "The Digital Object 1930 Identifier (DOI) System", February 2001, 1931 http://dx.doi.org/10.1000/203. 1933 [ERC] J. Kunze, "A Metadata Kernel for Electronic Permanence", 1934 Journal of Digital Information, Vol 2, Issue 2, January 1935 2002, ISSN 1368-7506, (PDF) 1936 http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Kunze/ 1938 [Handle] L. Lannom, "Handle System Overview", ICSTI Forum, No. 30, 1939 April 1999, http://www.icsti.org/forum/30/#lannom 1941 [HTTP] R. Fielding, et al, "Hypertext Transfer Protocol -- 1942 HTTP/1.1", RFC 2616, June 1999. 1944 [MD5] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321, 1945 April 1992. 1947 [NAPTR] M. Mealling, Daniel, R., "The Naming Authority Pointer 1948 (NAPTR) DNS Resource Record", RFC 2915, September 2000. 1950 [NLMPerm] M. Byrnes, "Defining NLM's Commitment to the Permanence of 1951 Electronic Information", ARL 212:8-9, October 2000, 1952 http://www.arl.org/newsltr/212/nlm.html 1954 [NOID] J. Kunze, "Nice Opaque Identifiers", February 2005, 1955 http://www.cdlib.org/inside/diglib/ark/noid.pdf 1957 [PURL] K. Shafer, et al, "Introduction to Persistent Uniform 1958 Resource Locators", 1996, 1959 http://purl.oclc.org/OCLC/PURL/INET96 1961 [RFC822] D. Crocker, "Standard for the format of ARPA Internet text 1962 messages", RFC 822, August 1982. 1964 [TELNET] J. Postel, J.K. Reynolds, "Telnet Protocol Specification", 1965 RFC 854, May 1983. 1967 [TEMPER] J. Kunze, "Temporal Enumerated Ranges", work in progress, 1968 http://www.cdlib.org/inside/diglib/ark/temperspec.pdf 1970 [THUMP] J. Kunze, "The HTTP URL Mapping Protocol", work in 1971 progress. 1973 [URI] T. Berners-Lee, et al, "Uniform Resource Identifiers 1974 (URI): Generic Syntax", RFC 2396, August 1998. 1976 [URNBIB] C. Lynch, et al, "Using Existing Bibliographic Identifiers 1977 as Uniform Resource Names", RFC 2288, February 1998. 1979 [URNSYN] R. Moats, "URN Syntax", RFC 2141, May 1997. 1981 [URNNID] L. Daigle, et al, "URN Namespace Definition Mechanisms", 1982 RFC 2611, June 1999. 1984 12. Appendix: ARK Implementations 1986 Currently, the primary implementation activity is at the California 1987 Digital Library (CDL), 1989 http://ark.cdlib.org/ 1991 housed at the University of California Office of the President, where 1992 over 200,000 ARKs have been assigned to objects that the CDL owns or 1993 controls. Some experimentation in ARKs is taking place at JSTOR, the 1994 Digital Curation Centre, WIPO and at the University of California's 1995 San Diego, San Francisco, and Berkeley campuses. 1997 The US National Library of Medicine (NLM) also has an experimental, 1998 prototype ARK service under development. It is being made available 1999 for purposes of demonstrating various aspects of the ARK system, but 2000 is subject to temporary or permanent withdrawal (without notice) 2001 depending upon the circumstances of the small research group 2002 responsible for making it available. It is described at: 2004 http://ark.nlm.nih.gov/ 2006 Comments and feedback may be addressed to rodgers@nlm.nih.gov. 2008 13. Appendix: Current ARK Name Authority Table 2010 This appendix contains a copy of the Name Authority Table (a file) at 2011 the time of writing. It may be loaded into a local filesystem (e.g., 2012 /etc/natab) for use in mapping NAAs (Name Assigning Authorities) to 2013 NMAHs (Name Mapping Authority Hostports). It contains Perl code that 2014 can be copied into a standalone script that processes the table (as a 2015 file). Because this is still a proposed file, none of the values in 2016 it are real. 2018 # 2019 # Name Assigning Authority / Name Mapping Authority Lookup Table 2020 # Last change: 2004 12 14 2021 # Reload from: http://ark.nlm.nih.gov/etc/natab 2022 # Mirrored at: http://www.cdlib.org/inside/diglib/ark/natab 2023 # To register: mailto:ark@cdlib.org?Subject=naareg 2024 # Process with: Perl script at end of this file (optional) 2025 # 2026 # Each NAA appears at the beginning of a line with the NAA Number 2027 # first, a colon, and an ARK or URL to a statement of naming policy 2028 # (see http://ark.cdlib.org for an example). 2029 # All the NMA hostports that service an NAA are listed, one per 2030 # line, indented, after the corresponding NAA line. 2031 # 2032 # National Library of Medicine 2033 12025: http://www.nlm.nih.gov/xxx/naapolicy.html 2034 ark.nlm.nih.gov USNLM 2035 foobar.zaf.org UCSF 2036 sneezy.dopey.com BIREME 2037 # 2038 # Library of Congress 2039 12026: http://www.loc.gov/xxx/naapolicy.html 2040 foobar.zaf.org USLC 2041 sneezy.dopey.com USLC 2042 # 2043 # National Agriculture Library 2044 12027: http://www.nal.gov/xxx/naapolicy.html 2045 foobar.zaf.gov:80 USNAL 2046 # 2047 # California Digital Library 2048 13030: http://www.cdlib.org/inside/diglib/ark/ 2049 ark.cdlib.org CDL 2050 # 2051 # World Intellectual Property Organization 2052 13038: http://www.wipo.int/xxx/naapolicy.html 2053 www.wipo.int WIPO 2054 # 2055 # University of California San Diego 2056 20775: http://library.ucsd.edu/xxx/naapolicy.html 2057 ucsd.edu UCSD 2058 # 2059 # University of California San Francisco 2060 29114: http://library.ucsf.edu/xxx/naapolicy.html 2061 ucsf.edu UCSF 2062 # 2063 # University of California Berkeley 2064 28722: http://library.berkeley.edu/xxx/naapolicy.html 2065 berkeley.edu UCB 2066 # 2067 # Rutgers University Libraries 2068 15230: http://rci.rutgers.edu/xxx/naapolicy.html 2069 rutgers.edu RUL 2070 # 2071 # Internet Archive 2072 13960: http://www.archive.org/xxx/naapolicy.html 2073 archive.org IA 2074 # 2075 # Digital Curation Centre 2076 64269: http://www.dcc.ac.uk/xxx/naapolicy.html 2077 dcc.ac.uk DCC 2078 # 2079 # New York University Libraries 2080 62624: http://library.nyu.edu/xxx/naapolicy.html 2081 nyu.edu NYUL 2082 # 2083 # University of North Texas Libraries 2084 67531: http://www.library.unt.edu/xxx/naapolicy.html 2085 unt.edu UNTL 2086 # 2087 # Ithaka Electronic-Archiving Initiative 2088 27927: http://www.ithaka.org/xxx/naapolicy.html 2089 ithaka.org ITHAKA 2090 # 2091 #--- end of data --- 2092 # The following Perl script takes an NAA as argument and outputs 2093 # the NMAs in this file listed under any matching NAA. 2094 # 2095 # my $naa = shift; 2096 # while (<>) { 2097 # next if (! /^$naa:/); 2098 # while (<>) { 2099 # last if (! /^[#\s]./); 2100 # print "$1\n" if (/^\s+(\S+)/); 2101 # } 2102 # } 2103 # 2104 # Create a g/t/nroff-safe version of this table with the UNIX command, 2105 # 2106 # expand natab | sed 's/\\/\\\e/g' > natab.roff 2107 # 2108 # end of file 2110 14. Copyright Notice 2112 Copyright (C) The Internet Society (2005). This document is subject 2113 to the rights, licenses and restrictions contained in BCP 78, and 2114 except as set forth therein, the authors retain all their rights. 2116 This document and the information contained herein are provided on an 2117 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2118 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2119 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2120 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2121 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2122 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2124 Expires 19 August 2005 2125 Table of Contents 2127 Status of this Document . . . . . . . . . . . . . . . . . . . . . . 1 2128 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2129 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2130 1.1. Three Reasons to Use ARKs . . . . . . . . . . . . . . . . . . 4 2131 1.2. Organizing Support for ARKs . . . . . . . . . . . . . . . . . 5 2132 1.3. Definition of Identifier . . . . . . . . . . . . . . . . . . . 6 2133 2. ARK Anatomy . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2134 2.1. The Name Mapping Authority Hostport (NMAH) . . . . . . . . . . 7 2135 2.2. The ARK Label Part - ark: . . . . . . . . . . . . . . . . . . 9 2136 2.3. The Name Assigning Authority Number (NAAN) . . . . . . . . . . 9 2137 2.4. The Name Part . . . . . . . . . . . . . . . . . . . . . . . . 10 2138 2.5. The Qualifier Part . . . . . . . . . . . . . . . . . . . . . . 10 2139 2.5.1. ARKs that Reveal Object Hierarchy . . . . . . . . . . . . . 11 2140 2.5.2. ARKs that Reveal Object Variants . . . . . . . . . . . . . . 12 2141 2.6. Character Repertoires . . . . . . . . . . . . . . . . . . . . 14 2142 2.7. Normalization and Lexical Equivalence . . . . . . . . . . . . 14 2143 2.8. Naming Considerations . . . . . . . . . . . . . . . . . . . . 15 2144 3. Assigners of ARKs . . . . . . . . . . . . . . . . . . . . . . . 17 2145 4. Finding a Name Mapping Authority . . . . . . . . . . . . . . . . 17 2146 4.1. Looking Up NMAHs in a Globally Accessible File . . . . . . . . 19 2147 4.2. Looking up NMAHs Distributed via DNS . . . . . . . . . . . . . 19 2148 5. Generic ARK Service Definition . . . . . . . . . . . . . . . . . 22 2149 5.1. Generic ARK Access Service (access, location) . . . . . . . . 22 2150 5.2. Generic Policy Service (permanence, naming, etc.) . . . . . . 22 2151 5.3. Generic Description Service . . . . . . . . . . . . . . . . . 24 2152 6. Overview of the Tiny HTTP URL Mapping Protocol (THUMP) . . . . . 24 2153 7. Overview of Electronic Resource Citations (ERCs) . . . . . . . . 27 2154 7.1. ERC Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2155 7.2. ERC Stories . . . . . . . . . . . . . . . . . . . . . . . . . 30 2156 7.3. The ERC Anchoring Story . . . . . . . . . . . . . . . . . . . 31 2157 7.4. ERC Elements . . . . . . . . . . . . . . . . . . . . . . . . . 32 2158 7.5. ERC Element Values . . . . . . . . . . . . . . . . . . . . . . 34 2159 7.6. ERC Element Encoding and Dates . . . . . . . . . . . . . . . . 36 2160 7.7. ERC Stub Records and Internal Support . . . . . . . . . . . . 37 2161 8. Advice to Web Clients . . . . . . . . . . . . . . . . . . . . . 38 2162 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 39 2163 10. Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 39 2164 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2165 12. Appendix: ARK Implementations . . . . . . . . . . . . . . . . 41 2166 13. Appendix: Current ARK Name Authority Table . . . . . . . . . . 42 2167 14. Copyright Notice . . . . . . . . . . . . . . . . . . . . . . . 44