idnits 2.17.1 draft-kunze-ark-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 4 instances of too long lines in the document, the longest one being 5 characters in excess of 72. ** The abstract seems to contain references ([NMAH]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. == There are 13 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 439 has weird spacing: '...eful to remem...' == Line 1673 has weird spacing: '...for the purpo...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (8 March 2001) is 8444 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'DCORE' -- Possible downref: Non-RFC (?) normative reference: ref. 'DOI' ** Obsolete normative reference: RFC 822 (ref. 'EMHDRS') (Obsoleted by RFC 2822) -- Possible downref: Non-RFC (?) normative reference: ref. 'ERC' -- Possible downref: Non-RFC (?) normative reference: ref. 'HKMP' ** Obsolete normative reference: RFC 2616 (ref. 'HTTP') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. 'MD5') ** Obsolete normative reference: RFC 2915 (ref. 'NAPTR') (Obsoleted by RFC 3401, RFC 3402, RFC 3403, RFC 3404) -- Possible downref: Non-RFC (?) normative reference: ref. 'NLMPerm' -- Possible downref: Non-RFC (?) normative reference: ref. 'PURL' -- Possible downref: Non-RFC (?) normative reference: ref. 'REG' ** Obsolete normative reference: RFC 2396 (ref. 'URI') (Obsoleted by RFC 3986) ** Downref: Normative reference to an Informational RFC: RFC 2288 (ref. 'URNBIB') ** Obsolete normative reference: RFC 2141 (ref. 'URNSYN') (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 2611 (ref. 'URNNID') (Obsoleted by RFC 3406) Summary: 15 errors (**), 0 flaws (~~), 5 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet-Draft: draft-kunze-ark-01.txt J. Kunze 2 ARK Identifier Scheme University of California (UCSF) 3 Expires 8 September 2001 R. P. C. Rodgers 4 US National Library of Medicine 5 8 March 2001 7 The ARK Persistent Identifier Scheme 9 (http://www.ckm.ucsf.edu/people/jak/home/ark-01.txt) 10 (http://www.ckm.ucsf.edu/people/jak/home/ark-01.ps) 12 Status of this Document 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as ``work in progress.'' 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Distribution of this document is unlimited. Please send comments to 34 jak@ckm.ucsf.edu. 36 Copyright (C) The Internet Society (2001). All Rights Reserved. 38 Abstract 40 The ARK (Archival Resource Key) is a scheme intended to facilitate 41 the persistent naming and retrieval of information objects. It 42 comprises an identifier syntax and three services. An ARK has four 43 components: 45 ark:[NMAH]/NAAN/Name 47 the prefix "ark:", the (optional and mutable) Name Mapping Authority 48 Hostport (NMAH, where "hostport" is a hostname followed optionally by 49 a colon and port number), the Name Assigning Authority Number (NAAN), 50 and the assigned Name. The NAAN and Name together form the immutable 51 persistent identifier for the object. 53 An ARK request is an ARK to which is appended a service request 54 beginning with a question mark. Use of an ARK request proceeds in 55 two steps. First, the NMAH, if not specified, is discovered based on 56 the NAAN. Two methods for discovery are proposed: one is file 57 based, the other based on the DNS NAPTR record. Second, the ARK 58 request is submitted to the NMAH. Three ARK services are defined, 59 gaining access to: (1) the object (or a sensible substitute), (2) a 60 description of the object (metadata), and (3) a description of the 61 commitment made by the NMA regarding the persistence of the object 62 (policy). These services are defined initially to use the HTTP 63 protocol, given the World Wide Web's pre-eminence among Internet 64 information retrieval systems. When the NMAH is specified, the 65 "ark:" prefix may be replaced with "http://", to produce a valid URL 66 that can gain access to ARK services using an unmodified Web client. 68 1. Introduction 70 This document describes a scheme for the high-quality naming of 71 information resources. The scheme, called the Archival Resource Key 72 (ARK), is well suited to long-term access and identification for any 73 information resources that accommodate reasonably regular electronic 74 description. This includes digital documents, databases, software, 75 and websites, as well as physical objects (such as books, bones, and 76 statues) and intangible objects (chemicals, diseases, vocabulary 77 terms, performances). Hereafter the term "object" refers to an 78 information resource. The term ARK itself refers both to the scheme 79 and to any single identifier that conforms to it. 81 Schemes for persistent identification of network-accessible objects 82 are not new. In the early 1990's, the design of the Uniform Resource 83 Name [URNSYN] responded to the observed failure rate of URLs by 84 articulating an indirect, non-hostname-based naming scheme and the 85 need for responsible name management. Meanwhile, promoters of the 86 Digital Object Identifier [DOI] succeeded in building a community of 87 providers around a mature software system that supports name 88 management. The Persistent Uniform Resource Locator [PURL] was a 89 third scheme that has the unique advantage of working with unmodified 90 web browsers. The ARK scheme is a new approach. 92 A founding principle of the ARK is that persistence is purely a 93 matter of service. Persistence is neither inherent in an object nor 94 conferred on it by a particular naming syntax. Rather, persistence 95 is achieved through a provider's successful stewardship of objects 96 and their identifiers. The highest level of persistence will be 97 reinforced by a provider's robust contingency, redundancy, and 98 succession strategies. It is further safeguarded to the extent that 99 a provider's mission is shielded from marketplace and political 100 instabilities. 102 1.1. Three Reasons to Use ARKs 104 The first requirement of an ARK is to give users a link from an 105 object to a promise of stewardship for it. That promise is a multi- 106 faceted covenant that binds the word of an identified service 107 provider to a specific set of responsibilities. No one can tell if 108 successful stewardship will take place because no one can predict the 109 future. Reasonable conjecture, however, may be based on past 110 performance. There must be a way to tie a promise of persistence to 111 a provider's demonstrated or perceived ability -- its reputation -- 112 in that arena. Provider reputations would then rise and fall as 113 promises are observed variously to be kept and broken. This is 114 perhaps the best way we have for gauging the strength of any 115 persistence promise. 117 The second requirement of an ARK is to give users a link from an 118 object to a description of it. The problem with a naked identifier 119 is that without a description real identification is incomplete. 120 Identifiers common today are relatively opaque, though some contain 121 ad hoc clues that reflect fleeting life cycle events such as the 122 address of a short stay in a filesystem hierarchy. Possession of 123 both an identifier and an object is some improvement, but positive 124 identification may still be elusive since the object itself need not 125 include a matching identifier or be transparent enough to reveal its 126 identity without significant research. In either case, what is 127 called for is a record bearing witness to the identifier's 128 association with the object, as supported by a recorded set of object 129 characteristics. This descriptive record is partly an identification 130 "receipt" with which users and archivists can verify an object's 131 identity after brief inspection and a plausible match with recorded 132 characteristics such as title and size. Among the recorded 133 characteristics, a checksum (e.g., [MD5]) recorded at the time of 134 last handling may assist automated identification of digital objects 135 (although checksums will require recomputation periodically if 136 extremely persistent objects' bitstreams change as predicted due to 137 inevitable media migration). 139 The final requirement of an ARK is to give users a link to the object 140 itself (or to a copy) if at all possible. Persistent access is the 141 central duty of an ARK, with persistent identification playing a 142 vital but supporting role. Object access may not be feasible for 143 various reasons, such as catastrophic loss of the object, a licensing 144 agreement that keeps an archive "dark" for a period of years, or when 145 an object's own lack of tangible existence precludes normal concepts 146 of access (e.g., a vocabulary term might be accessed through its 147 definition). In such cases the ARK's identification role assumes a 148 much higher profile. But attempts to simplify the persistence 149 problem by decoupling access from identification and concentrating 150 exclusively on the latter are of questionable utility. A perfect 151 system for assigning forever unique identifiers might be created, but 152 if it did so without reducing access failure rates, no one would be 153 interested. The central issue -- which may be summed up as the "HTTP 154 404 Not Found" problem -- would not have been addressed. 156 1.2. Organizing Support for ARKs 158 Co-location of persistent access and identification services is 159 natural. Any organization undertaking persistent identification and 160 description is in an advantaged position to undertake persistent 161 access, and vice versa. The former task becomes all the easier if 162 the organization controls, owns, or otherwise has clear access to the 163 objects. Similarly, the latter cannot be managed without at least 164 internal support for the former, since collection management 165 activities such as monitoring, acquisition, verification, and change 166 control all require record keeping and accountability. Organizing 167 ARK services under one roof tends to make sense. 169 ARK support is not for everybody. By requiring specific, revealed 170 commitments to preservation, object access, and description, the bar 171 for providing ARK services is high. On the other hand, it would be 172 hard to grant credence to a persistence promise from an organization 173 that could not muster the minimum ARK services. Not that there isn't 174 a business model for an ARK-like, description-only service built on 175 top of another organization's full complement of ARK services. For 176 example, there might be competition at the description level for 177 abstracting and indexing a body of scientific literature archived in 178 a combination of open and fee-based repositories. Such a business 179 would benefit more from persistence than it would directly support 180 it. 182 1.3. A Definition of Identifier 184 Heretofore, persistence discussion has been hampered by a borrowed 185 meaning for "identifier" that emerged as a side effect of defining 186 the Uniform Resource Identifier in [URI]: 188 (formerly) An identifier is a sequence of characters with a 189 restricted syntax ... that can act as a reference to something 190 that has identity. 192 The term works in context, but falters when employed for persistence. 193 Troubling phrases arise, such as, 195 "The goal is to create an identifier that does not break." 197 As defined this kind of identifier "breaks" when it sustains damage 198 to its character sequence, but really what breaks has to do with the 199 identifier's reference role. The following definition is proposed. 201 (new definition) An identifier is an association between a 202 string (a sequence of characters) and an information resource. 203 That association is made manifest by a record (e.g., a 204 cataloging or other metadata record) that binds the identifier 205 string to a set of identifying resource characteristics. 207 The identifier (the association) must be vouched for by some sort of 208 record. In the complete absence of any testimony (e.g., metadata) 209 regarding an association, a would-be identifier string is a 210 meaningless sequence of characters. To keep an externally visible 211 but otherwise internal identifier string opaque to outsiders, for 212 example, it suffices for an organization not to disclose the nature 213 of its association. For our immediate purpose, actual existence of 214 an association record is more important than its authenticity. If 215 one is lucky an object carries its own identifier as part of itself 216 (e.g., imprinted on the first page), but in processes such as 217 resource discovery and retrieval the typical object is often unwieldy 218 or unavailable (such as when licensing restrictions are in effect). 219 A metadata record that includes the identifier string is the next 220 best thing -- a conveniently manipulable surrogate that can act as 221 both an association "receipt" and "declaration". 223 It now makes sense to speak of preventing an identifier, as an 224 association, from breaking. Having said that, this document still 225 (ab)uses the terms "ARK" and "identifier" as shorthands to refer to 226 identifier strings, in other words, to sequences of characters. Thus 227 a discussion of ARK syntax refers to a string format, not an 228 association format. The context should make the meaning clear. 230 2. ARK Anatomy 232 An ARK is represented by a sequence of characters (a string) that 233 begins with the prefix "ark:". Here is a diagrammed example. 235 ark:foobar.zaf.org/12025/654xz321 236 \__/\____________/ \___/ \______/ 237 | (optional) | | 238 ARK Prefix | | Name (assigned by the NAA) 239 | | 240 Name Mapping Authority Name Assigning Authority 241 Hostport (NMAH) Number (NAAN) 243 The ARK syntax can be summarized, 245 ark:[NMAH]/NAAN/Name 247 where the NMAH is in brackets to indicate that it is optional. 249 2.1. The Name Mapping Authority Hostport (NMAH) 251 After the prefix may appear an optional Name Mapping Authority 252 Hostport (NMAH) that is a temporary address where ARK service 253 requests may be sent. It consists of an Internet hostname or 254 hostport combination having the same format and semantics as the 255 hostport part of a URL. The most important thing about the NMAH is 256 that it is "identity inert" from the point of view of object 257 identification. In other words, ARKs that differ only in the 258 optional NMAH part identify the same object. Thus, for example, the 259 following three ARKs are synonyms for but one information resource: 261 ark:foobar.zaf.org/12025/654xz321 262 ark:sneezy.dopey.com/12025/654xz321 263 ark:/12025/654xz321 265 The NMAH makes it easy to derive an identifier that is actionable in 266 today's web browsers (i.e., a URL). This amounts to substituting 267 "http://" for the "ark:" prefix (although in most browsers simply 268 deleting "ark:" works). The first example ARK above thus becomes 270 http://foobar.zaf.org/12025/654xz321 272 The NMAH part is temporary, disposable, and replaceable. Over time 273 the NMAH will likely stop working and have to be replaced with a 274 currently active service provider. This relies on a mapping 275 authority discovery process, of which two alternate methods are 276 outlined in a later section. Meanwhile, a carefully chosen NMAH can 277 be as durable as any Internet domain name, and so may last for a 278 decade or longer. Users should be prepared, however, to refresh the 279 NMAH because the one found in an ARK may have stopped working. 281 The above method for creating an actionable identifier from an ARK 282 (replacing "ark:" with "http://") is also temporary. Assuming that 283 the reign of [HTTP] in information retrieval will end one day, ARKs 284 will have to be converted into new kinds of actionable identifiers. 285 In any event, if ARKs see widespread use, web browsers would 286 presumably evolve to perform this simple transformation 287 automatically. 289 2.2. The Name Assigning Authority Number (NAAN) 291 The next part of the ARK is the Name Assigning Authority Number 292 (NAAN) enclosed in `/' (slash) characters. This part is always 293 required, as it identifies the organization that originally assigned 294 the Name of the object. It is used to discover a currently valid 295 NMAH and to provide top-level partitioning of the space of all ARKs. 296 NAANs are registered in a manner similar to URN Namespaces, but they 297 are pure numbers consisting of 5 digits or 9 digits. Thus, the first 298 100,000 registered NAAs fit compactly into the 5 digits, and if 299 growth warrants, the next billion fit into the 9 digit form. In 300 either case the fixed odd number of digits helps reduce the chances 301 of finding a NAAN out of context and confusing it with nearby 302 quantities such as 4-digit dates. 304 2.3. The Name Part 306 The final part of the ARK is the Name assigned by the NAA, and it is 307 also required. The Name is a string of visible ASCII characters and 308 should be less than 128 bytes in length. The length restriction 309 keeps the ARK short enough to append ordinary ARK request strings 310 without running into transport restrictions within HTTP GET requests. 311 Characters may be letters, digits, or any of these seven characters: 313 = @ $ _ * ' # 315 The characters `/', `+', and `?' are reserved and must not be used at 316 this time. A `-' (hyphen) may appear in an ARK, but must be ignored 317 in lexical comparisons. The `%' character is reserved for %-encoding 318 all other octets that would appear in the ARK string, in the same 319 manner as for URIs [URI]. A %-encoded octet consists of a `%' 320 followed by two hex digits; for example, "%7d" stands in for `}'. 321 Lower case hex digits are preferred to reduce the chances of false 322 acronym recognition; thus it is better to use "%acT" instead of 323 "%ACT". The character `%' itself must be represented using "%25". 324 As with URNs, %-encoding permits ARKs to support legacy namespaces 325 (e.g., ISBN, ISSN, SICI) that have less restricted character 326 repertoires [URNBIB]. 328 The creation of names that include linguistically based constructs 329 (having recognizable meaning from natural language) is strongly 330 discouraged if long-term persistence is a naming priority. Such 331 names do not age or travel well. Names that look more or less like 332 numbers avoid common problems that defeat persistence and 333 international acceptance. The use of digits is highly recommended. 334 Mixing in non-vowel alphabetic characters is a relatively safe and 335 easy way to achieve more compact names, although any character 336 repertoire can work if potentially troublesome names will be 337 discarded during a screening process. 339 2.4. Lexical Equivalence 341 Hyphens are always ignored in ARKs. Hyphens may be added to an ARK's 342 Name part for readability, or during the formatting and wrapping of 343 text lines, but (as in phone numbers) they are treated as if they 344 were not present. Thus, like the NMAH, hyphens are "identity inert" 345 in comparing ARKs for equivalence. For example, the following ARKs 346 are equivalent for purposes of comparison and ARK service access: 348 ark:/12025/65-4-xz-321 349 ark:sneezy.dopey.com/12025/654--xz32-1 350 ark:/12025/654xz321 352 To determine if two or more ARKs identify the same object, the ARKs 353 are compared for lexical equivalence after first being normalized. 354 Since ARK strings may appear in various forms (e.g., having different 355 NMAHs), normalizing them minimizes the chances that comparing two ARK 356 strings for equality will fail unless they actually identify 357 different objects. In a specified-host ARK (one having an NMAH), the 358 NMAH never participates in such comparisons. 360 Normalization of ARKs for the purpose of octet-by-octet equality 361 comparison consists of two steps for each ARK. First, any upper case 362 letters in the "ark:" prefix and %-encoded hex digits are converted 363 to lower case. The case of all other letters in the ARK string must 364 be preserved. Then, any NMAH is removed and all hyphens are removed. 365 The resulting ARK string is now normalized. Comparisons between 366 normalized ARKs are case-sensitive, meaning that upper case letters 367 are considered different from their lower case counterparts. 369 To keep ARK string variation to a minimum, no reserved ARK characters 370 should be %-encoded unless it is deliberately to conceal their 371 reserved meanings. No non-reserved ARK characters should ever be %- 372 encoded. Finally, no %-encoded character should ever appear in an 373 ARK in its decoded form. 375 2.5. Naming Considerations 377 The ARK has different goals from the URI, so it has different 378 character set requirements. Because linguistic constructs imperil 379 persistence, for ARKs non-ASCII character support is unimportant. 380 ARKs and URIs share goals of transcribability and transportability 381 within web documents, so characters are required to be visible, non- 382 conflicting with HTML/XML syntax, and not subject to tampering during 383 transmission across common transport gateways. Add the goal of 384 making an undelimited ARK recognizable in running prose, as in 385 ark:/12025/=@_22*$, and certain punctuation characters (e.g., comma, 386 period) end up being excluded from the ARK lest the end of a phrase 387 or sentence be mistaken as part of the ARK. 389 A valuable technique for provision of persistent objects is to try to 390 have the complete identifier appear on, with, or near its retrieved 391 object. An object encountered at a moment in time when its discovery 392 context has long since disappeared could then easily be traced back 393 to its metadata, to alternate versions, to updates, etc. This has 394 seen reasonable success, for example, in book publishing and software 395 distribution. 397 If persistence is the goal, a deliberate local strategy for 398 systematic name assignment is crucial. Names must be chosen with 399 great care. Poorly chosen and managed names will devastate any 400 persistence strategy, and they do not discriminate based on naming 401 scheme. Whether a mistakenly re-assigned identifier is a URN, DOI, 402 PURL, URL, or ARK, the damage -- failed access and confusion -- is 403 not mitigated more in one scheme than in another. Conversely, in- 404 house efforts to manage names responsibly will go much further 405 towards safeguarding persistence than any choice of naming scheme or 406 name resolution technology. 408 Hostnames appearing in any identifier meant to be persistent must be 409 chosen with extra care. The tendency in hostname selection has 410 traditionally been to choose a token with recognizable attributes, 411 such as a corporate brand, but that tendency wreaks havoc with 412 persistence that is to outlive brands, corporations, subject 413 classifications, and natural language semantics (e.g., what did the 414 three letters "gay" mean forty, twenty, and two years ago?). Today's 415 recognized and correct attributes are tomorrow's stale or incorrect 416 attributes. In making hostnames (any names, actually) long-term 417 persistent, it helps to eliminate recognizable identity to the extent 418 possible. This affects selection of any name based on URLs, 419 including PURLs and the explicitly disposable NMAHs. There is no 420 excuse for a provider that manages its internal names impeccably not 421 to exercise the same care in choosing what could be an exceptionally 422 durable hostname, especially if it would form the prefix for all the 423 provider's URL-based external names. Registering an opaque hostname 424 in the ".org" or ".net" domain would not be a bad start. 426 Dubious persistence speculation does not make selecting naming 427 strategies any easier. For example, despite rumors to the contrary, 428 there are really no obvious reasons why the organizations registering 429 DNS names, URN Namespaces, and DOI publisher IDs should have among 430 them one that is intrinsically more fallible than the next. 431 Moreover, it is a misconception that the demise of DNS and of HTTP 432 need adversely affect the persistence of URLs. At such a time, 433 certainly URLs from the present day would not then be actionable by 434 our present-day mechanisms, but resolution systems for future non- 435 actionable URLs are no harder to imagine than resolution systems for 436 present-day non-actionable URNs and DOIs. There is no more stable a 437 namespace than one that is dead and frozen, and that would then 438 characterize the space of names bearing the "http://" prefix. It is 439 useful to remember that just because hostnames have been carelessly 440 chosen in their brief history does not mean that they are unsuitable 441 in NMAHs (and URLs) intended for use in situations demanding the 442 highest level of persistence available in the Internet environment. 443 A well-planned name assignment strategy is everything. 445 3. Assigners of ARKs 447 A Name Assigning Authority (NAA) is an organization that creates (or 448 delegates creation of) long-term associations between identifiers and 449 information objects. Examples of NAAs include national libraries, 450 national archives, and publishers. An NAA may arrange with an 451 external organization for identifier assignment. The US Library of 452 Congress, for example, allows OCLC (the Online Computer Library 453 Center, a major world cataloger of books) to create associations 454 between Library of Congress call numbers (LCCNs) and the books that 455 OCLC processes. A cataloging record is generated that testifies to 456 each association, and the identifier is included by the publisher in 457 places like the front matter of a book. 459 An NAA does not so much create an identifier as create an 460 association. The NAA first draws an identifier from its namespace, 461 which is the set of all identifiers under its control. It then 462 records the assignment of the identifier to an information object 463 having sundry witnessed characteristics, such as a particular author 464 and modification date. A namespace is usually reserved for an NAA by 465 agreement with recognized community organizations (such as IANA and 466 ISO) that all names containing a particular string be under its 467 control. In the ARK an NAA is represented by the Name Assigning 468 Authority Number (NAAN). 470 The ARK namespace reserved for an NAA is the set of names bearing its 471 particular NAAN. For example, all strings beginning with 472 "ark:/12025/" are under control of the NAA registered under 12025, 473 which might be the National Library of Finland. Because each NAA has 474 a different NAAN, names from one namespace cannot conflict with those 475 from another. Each NAA is free to assign names from its namespace 476 (or delegate assignment) according to its own policies. These 477 policies must be documented in a manner similar to the declarations 478 required for URN Namespace registration [URNNID]. 480 For now, registration of ARK NAAs is in a bootstrapping phase. To 481 register, please read about the mapping authority discovery file in 482 the next section and send email to jak@ckm.ucsf.edu. 484 4. Finding a Name Mapping Authority 486 In order to derive an actionable identifier from an ARK, a hostport 487 (hostname or hostname plus port combination) for a working Name 488 Mapping Authority (NMA) must be found. An NMA is a service that is 489 able to respond to the three basic ARK service requests. Relying on 490 registration and client-side discovery, NMAs make known which NAAs' 491 identifiers they are willing to service. 493 Upon encountering an ARK, a user (or client software) looks inside it 494 for the optional NMAH part (the hostport of the NMA's ARK service). 495 If it contains an NMAH that is working, this NMAH discovery step may 496 be skipped; the NMAH effectively uses the beginning of an ARK to 497 cache the results of a prior mapping authority discovery process. If 498 a new NMAH needs to found, the client looks inside the ARK again for 499 the NAAN (Name Assigning Authority Number). Querying a global 500 database, it then uses the NAAN to look up all current NMAHs that 501 service ARKs issued by the identified NAA. The global database is 502 key, and two specific methods for querying it are given in this 503 section. 505 In the interests of long-term persistence, however, ARK mechanisms 506 are first defined in high-level, protocol-independent terms so that 507 mechanisms may evolve and be replaced over time without compromising 508 fundamental service objectives. Either or both specific methods 509 given here may eventually be supplanted by better methods since, by 510 design, the ARK scheme does not depend on a particular method, but 511 only on having some method to locate an active NMAH. 513 At the time of issuance, at least one NMAH for an ARK should be 514 prepared to service it. That NMA may or may not be administered by 515 the Name Assigning Authority (NAA) that created it. Consider the 516 following hypothetical example of providing long-term access to a 517 cancer research journal. The publisher wishes to turn a profit and 518 the National Library of Medicine wishes to preserve the scholarly 519 record. An agreement might be struck whereby the publisher would act 520 as the NAA and the national library would archive the journal issue 521 when it appears, but without providing direct access for the first 522 six months. During the first six months of peak commercial 523 viability, the publisher would retain exclusive delivery rights and 524 would charge access fees. Again, by agreement, both the library and 525 the publisher would act as NMAs, but during that initial period the 526 library would redirect requests for issues less than six months old 527 to the publisher. At the end of the waiting period, the library 528 would then begin servicing requests for issues older than six months 529 by tapping directly into its own archives. Meanwhile, the publisher 530 might routinely redirect incoming requests for older issues to the 531 library. Long-term access is thereby preserved, and so is the 532 commercial incentive to publish content. 534 There is never a requirement that an NAA also run an NMA service, 535 although it seems not an unlikely scenario. Over time NAAs and NMAs 536 would come and go. One NMA would succeed another, and there might be 537 many NMAs serving the same ARKs simultaneously (e.g., as mirrors or 538 as competitors). There might also be asymmetric but coordinated NMAs 539 as in the library-publisher example above. 541 4.1. Looking Up NMAHs in a Globally Accessible File 543 This subsection describes a way to look up NMAHs using a simple text 544 file. For efficient access the file may be stored in a local 545 filesystem, but it needs to be reloaded periodically to incorporate 546 updates. It is not expected that the size of the file or frequency 547 of update should impose an undue maintenance or searching burden any 548 time soon, for even primitive linear search of a file with ten- 549 thousand NAAs is a subsecond operation on modern server machines. 550 The proposed file strategy is similar to the /etc/hosts file strategy 551 that supported Internet host address lookup for a period of years 552 before the advent of the Domain Name System [DNS]. 554 A copy of the current file (at the time of writing) appears in an 555 appendix and is available on the web. A minimal version of the file 556 appears below. Comment lines (lines that begin with `#') explain the 557 format and give the file's modification time, reloading address, and 558 NAA registration instructions. There is even a Perl script that 559 processes the file embedded in the file's comments. Because this is 560 still a proposed file, none of the values in it are real. 562 # 563 # Name Assigning Authority / Name Mapping Authority Lookup Table 564 # Last change: 22 February 2001 565 # Reload from: http://ark.nlm.nih.gov/etc/natab 566 # Mirrored at: http://www.ckm.ucsf.edu/people/jak/home/etc/natab 567 # http://....../etc/natab 568 # To register: mailto:jak@ckm.ucsf.edu?Subject=naareg 569 # Process with: Perl script at end of this file (optional) 570 # 571 # Each NAA appears at the beginning of a line with the NAA Number 572 # first, a colon, and an ARK or URL to a statement of naming policy 573 # (see http://ark.nlm.nih.gov/naapolicyeg.html for an example). 574 # All the NMA hostports that service an NAA are listed, one per 575 # line, indented, after the corresponding NAA line. 576 # 577 # US Library of Congress 578 12025: http://www.loc.gov/xxx/naapolicy.html 579 foobar.zaf.org 580 sneezy.dopey.com 581 # 582 # US National Library of Medicine 583 12026: http://www.nlm.nih.gov/xxx/naapolicy.html 584 lhc.nlm.nih.gov:8080 585 foobar.zaf.org 586 sneezy.dopey.com 587 # 588 # US National Agriculture Library 589 12027: http://www.nal.gov/xxx/naapolicy.html 590 foobar.zaf.gov:80 591 # 592 #--- end of data --- 593 # The enclosed Perl script takes an NAA as argument and outputs 594 # the NMAs in this file listed under any matching NAA. 595 # 596 # my $naa = shift; 597 # while (<>) { 598 # next if (! /^$naa:/); 599 # while (<>) { 600 # last if (! /^[#\s]./); 601 # print "$1\n" if (/^\s+(\S+)/); 602 # } 603 # } 604 # end of file 606 4.2. Looking up NMAHs Distributed via DNS 608 This subsection introduces a method for looking up NMAHs that is 609 based on the method for discovering URN resolvers described in 610 [NAPTR]. It relies on querying the DNS system already installed in 611 the background infrastructure of most networked computers. A query 612 is submitted to DNS asking for a list of resolvers that match a given 613 NAAN. DNS distributes the query to the particular DNS servers that 614 can best provide the answer, unless the answer can be found more 615 quickly in a local DNS cache as a side-effect of a recent query. 616 Responses come back inside Name Authority Pointer (NAPTR) records. 617 The normal result is one or more candidate NMAHs. 619 In its full generality the [NAPTR] algorithm ambitiously accommodates 620 a complex set of preferences, orderings, protocols, mapping services, 621 regular expression rewriting rules, and DNS record types. This 622 subsection proposes a drastic simplification of it for the special 623 case of ARK mapping authority discovery. The simplified algorithm is 624 called Maptr. It uses only one DNS record type (NAPTR) and restricts 625 most of its field values to constants. The following hypothetical 626 excerpt from a DNS data file for the NAAN known as 12026 shows three 627 example NAPTR records ready to use with the Maptr algorithm. 629 12026.ark.arpa. 630 ;; US Library of Congress 631 ;; order pref flags service regexp replacement 632 IN NAPTR 0 0 "h" "ark" "" lhc.nlm.nih.gov:8080 633 IN NAPTR 0 0 "h" "ark" "" foobar.zaf.org 634 IN NAPTR 0 0 "h" "ark" "" sneezy.dopey.com 636 All the fields are held constant for Maptr except for the "flags" and 637 "replacement" fields. The "service" field contains the constant 638 value "ark" so that NAPTR records participating in the Maptr 639 algorithm will not be confused with other NAPTR records. The "order" 640 and "pref" fields are held to 0 (zero) and otherwise ignored for now; 641 the algorithm may evolve to use these fields for ranking decisions 642 when usage patterns and local administrative needs are better 643 understood. 645 When a Maptr query returns a record with a flags field of "h" (for 646 hostport, a Maptr extension to the NAPTR flags), the replacement 647 field contains the NMAH (hostport) of an ARK service provider. When 648 a query returns a record with a flags field of "" (the empty string), 649 the client needs to submit a new query containing the domain name 650 found in the replacement field. This second sort of record exploits 651 the distributed nature of DNS by redirecting the query to another 652 domain name. It looks like this. 654 12345.ark.arpa. 655 ;; Digital Library Consortium 656 ;; order pref flags service regexp replacement 657 IN NAPTR 0 0 "" "ark" "" dlc.spct.org. 659 Here is the Maptr algorithm for ARK mapping authority discovery. In 660 it replace with the NAAN from the ARK for which an NMAH is 661 sought. 663 (1) Initialize the DNS query: type=NAPTR, 664 query=.ark.arpa. 666 (2) Submit the query to DNS and retrieve (NAPTR) records, 667 discarding any record that does not have "ark" for the service 668 field. 670 (3) All remaining records with a flags fields of "h" contain 671 candidate NMAHs in their replacement fields. Set them aside, if 672 any. 674 (4) Any record with an empty flags field ("") has a replacement 675 field containing a new domain name to which a subsequent query 676 should be redirected. For each such record, set 677 query= then go to step (2). When all such records 678 have been recursively exhausted, go to step (5). 680 (5) All redirected queries have been resolved and a set of 681 candidate NMAHs has been accumulated from steps (3). If there 682 are zero NMAHs, exit -- no mapping authority was found. If 683 there is one or more NMAH, choose one using any criteria you 684 wish, then exit. 686 The global database thus distributed via DNS and the Maptr algorithm 687 can easily be seen to mirror the contents of the Name Authority Table 688 file described in the previous section. 690 5. Generic ARK Service Definition 692 An ARK request's output is delivered information; examples include 693 the object itself, a policy declaration (e.g., a promise of support), 694 a descriptive metadata record, or an error message. ARK services 695 must be couched in high-level, protocol-independent terms if 696 persistence is to outlive today's networking infrastructural 697 assumptions. The high-level ARK service definitions listed below are 698 followed in the next section by a concrete method (one of many 699 possible methods) for delivering these services with today's 700 technology. 702 5.1. Generic ARK Access Service (access, location) 704 Returns (a copy of) the object or a redirect to the same, although a 705 sensible object proxy may be substituted. Examples of sensible 706 substitutes include, 708 - a table of contents instead of a large complex document, 709 - a home page instead of an entire web site hierarchy, 710 - a rights clearance challenge before accessing protected data, 711 - directions for access to an offline object (e.g., a book), 712 - a description of an intangible object (a disease, an event), or 713 - an applet acting as "player" for a large multimedia object. 715 May also return a discriminated list of alternate object locators. 716 If access is denied, returns an explanation of the object's current 717 (perhaps permanent) inaccessibility. 719 5.2. Generic Policy Service (permanence, naming, etc.) 721 Returns declarations of policy and support commitments for given 722 ARKs. Declarations are returned in either a structured metadata 723 format or a human readable text format; sometimes one format may 724 serve both purposes. Policy subareas may be addressed in separate 725 requests, but the following areas should should be covered: object 726 permanence, object naming, object fragment addressing, and 727 operational service support. 729 The permanence declaration for an object is a rating defined with 730 respect to an identified permanence provider (guarantor), and may 731 include the following aspects. One permanence rating framework is 732 given in [NLMPerm]. 734 (a) "object availability" -- whether and how access to the 735 object is supported (e.g., online 24x7, or offline only), 737 (b) "identifier validity" -- under what conditions the 738 identifier will be or has been re-assigned, 740 (c) "content invariance" -- under what conditions the content of 741 the object is subject to change, and 743 (d) "change history" -- documentation, whether abbreviated or 744 detailed, of any or all corrections, migrations, revisions, etc. 746 Naming policy for an object includes an historical description of the 747 NAA's (and its successor NAA's) policies regarding differentiation of 748 objects. It may include the following aspects. 750 (e) "similarity" -- (or "unity") the limit, defined by the NAA, 751 to the level of dissimilarity beyond which two similar objects 752 warrant separate identifiers but before which they share one 753 single identifier, and 755 (f) "granularity" -- the limit, defined by the NAA, to the level 756 of object subdivision beyond which sub-objects do not warrant 757 separately assigned identifiers but before which sub-objects are 758 assigned separate identifiers. 760 Addressing policy for an object includes a description of how, during 761 access, object components (e.g., paragraphs, sections) or views 762 (e.g., image conversions) may or may not be "addressed", in other 763 words, how the NMA permits arguments or parameters to modify the 764 object delivered as the result of an ARK request. If supported, 765 these sorts of operations would provide things like byte-ranged 766 fragment delivery and open-ended format conversions, or any set of 767 possible transformations that would be too numerous to list or to 768 identify with separately assigned ARKs. 770 Operational service support policy includes a description of general 771 operational aspects of the NMA service, such as after-hours staffing 772 and trouble reporting procedures. 774 5.3. Generic Description Service 776 Returns a description of the object. Descriptions are returned in 777 either a structured metadata format or a human readable text format; 778 sometimes one format may serve both purposes. A description must at 779 a minimum answer the who, what, when, and where questions concerning 780 an expression of the object. Standalone descriptions should be 781 accompanied by the modification date and source of the description 782 itself. May also return discriminated lists of ARKs that are related 783 to the given ARK. 785 6. Overview of the HTTP Key Mapping Protocol (HKMP) 787 The HTTP Key Mapping Protocol (HKMP) is a way of taking a key (a kind 788 of identifier) and asking such questions as, what information does 789 this identify and how permanent is it? [HKMP] is in fact one 790 specific method under development for delivering ARK services. The 791 protocol runs over HTTP to exploit the web browser's current pre- 792 eminence as user interface to the Internet. HKMP is designed so that 793 a person can enter ARK requests directly into the location field of 794 current browser interfaces. Because it runs over HTTP, HKMP can be 795 simulated and tested within keyboard-based [TELNET] sessions. 797 The asker (a person or client program) starts with an identifier, 798 such as an ARK or a URL. The identifier reveals to the asker (or 799 allows the asker to infer) the Internet host name and port number of 800 a server system that responds to questions. Here, this is just the 801 NMAH that is obtained by inspection and possibly lookup based on the 802 ARK's NAAN. The asker then sets up an HTTP session with the server 803 system, sends a question via an HKMP request (contained within an 804 HTTP request), receives an answer via an HKMP response (contained 805 within an HTTP response), and closes the session. That concludes the 806 connected portion of the protocol. 808 An HKMP request is a string of characters beginning with a `?' 809 (question mark) that is appended to the identifier string. The 810 resulting string is sent as an argument to HTTP's GET command. 811 Request strings too long for GET may be sent using HTTP's POST 812 command. The three most common requests correspond to three 813 degenerate special cases that keep the user's learning and typing 814 burden low. First, a simple key with no request at all is the same 815 as an ordinary access request. Thus a plain ARK entered into a 816 browser's location field behaves much like a plain URL, and returns 817 access to the primary identified object, for instance, an HTML 818 document. 820 The second special case is a minimal ARK description request string 821 consisting of just "?". For example, entering the string, 823 ark.nlm.nih.gov/12025/psbbantu? 825 into the browser's location field directly precipitates a request for 826 a metadata record describing the object identified by 827 ark:/12025/psbbantu. The browser, unaware of HKMP, prepares and 828 sends an HTTP GET request in the same manner as for a URL. HKMP is 829 designed so that the response (indicated by the returned HTTP content 830 type) is normally displayed, whether the output is structured for 831 machine processing (text/plain) or formatted for human consumption 832 (text/html). 834 In the following example HKMP session, each line has been annotated 835 to include a line number and whether it was the client or server that 836 sent it. Without going into much depth, the session has three pieces 837 separated from each other by blank lines: the client's piece (lines 838 1-3), the server's HTTP/HKMP response headers (4-7), and the body of 839 the server's response (8-13). The first and last lines (1 and 13) 840 correspond to the client's steps to start the TCP session and the 841 server's steps to end it, respectively. 843 1 C: ... [opens session] 844 C: GET http://ark.nlm.nih.gov/12025/psbbantu? HTTP/1.1 845 C: 846 S: HTTP/1.1 200 OK 847 5 S: Content-Type: text/plain 848 S: HKMP-Status: 0.1 200 OK 849 S: 850 S: erc: 851 S: who: Lederberg, Joshua 852 10 S: what: Studies of Human Families for Genetic Linkage 853 S: when: 1974 854 S: where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf 855 S: ... [closes session] 857 The first two server response lines (4-5) above are typical of HTTP. 858 The next line (6) is peculiar to HKMP, and indicates the HKMP version 859 and a normal return status. The balance of the response (8-11) is 860 the single metadata record that comprises the ARK description service 861 response. The record is in the format of an Electronic Resource 862 Citation [ERC], which is discussed in more detail in the next 863 section. For now, note that it contains four elements that answer 864 the top priority questions regarding an expression of the object: 865 who played a major role in expressing it, what the expression was 866 called, when is was created, and where the expression may be found. 867 This quartet of elements comes up again and again in ERCs. 869 The third degenerate special case of an ARK request (and no other 870 cases will be described in this document) is the string "??", 871 corresponding to a minimal permanence policy request. It can be seen 872 in use appended to an ARK (on line 2) in the example session that 873 follows. 875 1 C: ... [opens session] 876 C: GET http://ark.nlm.nih.gov/12025/psbbantu?? HTTP/1.1 877 C: 878 S: HTTP/1.1 200 OK 879 5 S: Content-Type: text/plain 880 S: HKMP-Status: 0.1 200 OK 881 S: 882 S: erc: 883 S: who: Lederberg, Joshua 884 10 S: what: Studies of Human Families for Genetic Linkage 885 S: when: 1974 886 S: where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf 887 S: erc-support: 888 S: who: NIH/NLM/LHNCBC 889 15 S: what: Permanent, Unchanging Content 890 S: when: 2001 04 21 891 S: where: http://ark.nlm.nih.gov/yy22948 892 S: ... [closes session] 894 Again, a single metadata record (lines 8-17) is returned, but it 895 consists of two segments. The first segment (8-12) gives the same 896 basic citation information as in the previous example. It is 897 returned in order to establish context for the persistence 898 declaration in the second segment (13-17). 900 Each segment in an ERC tells a different story relating to the 901 object, so although the same four questions (elements) appear in 902 each, the answers depend on the segment's story type. While the 903 first segment tells the story of an expression of the object, the 904 second segment tells the story of the support commitment made to it: 905 who made the commitment, what the nature of the commitment was, when 906 it was made, and where a fuller explanation of the commitment may be 907 found. 909 7. Overview of Electronic Resource Citations (ERCs) 911 An Electronic Resource Citation (or ERC, pronounced e-r-c) [ERC] is a 912 simple, compact, and printable record designed to hold data 913 associated with an information resource. By design, the ERC is a 914 metadata format that balances the needs for expressive power, very 915 simple machine processing, and direct human manipulation. 917 A founding principle of the ERC is that direct human contact with 918 metadata will be a necessary and sufficient condition for the near 919 term rapid development of metadata standards, systems, and services. 920 Thus the machine-processable ERC format must only minimally strain 921 people's ability to read, understand, change, and transmit ERCs 922 without their relying on intermediation with specialized software 923 tools. The basic ERC needs to be succinct, transparent, and 924 trivially parseable by software. 926 In the current Internet, it is natural seriously to consider using 927 XML as an exchange format because of predictions that it will obviate 928 many ad hoc formats and programs, and unify much of the world's 929 information under one reliable data structuring discipline that is 930 easy to generate, verify, parse, and render. It appears, however, 931 that XML is still only catching on after years of standards work and 932 implementation experience. The reasons for it are unclear, but for 933 now very simple XML interpretation is still out of reach. Another 934 important caution is that XML structures are hard on the eyeballs, 935 taking up an amount of display (and page) space that significantly 936 exceeds that of traditional formats. Until these conflicts with ERC 937 principle are resolved, XML is not a first choice for representing 938 ERCs. Borrowing instead from the data structuring format that 939 underlies the successful spread of email and web services, the first 940 ERC format is based on email and HTTP headers (RFC822) [EMHDRS]. 941 There is a naturalness to its label-colon-value format (seen in the 942 previous section) that barely needs explanation to a person beginning 943 to enter ERC metadata. 945 Besides simplicity of ERC system implementation and data entry 946 mechanics, ERC semantics (what the record and its constituent parts 947 mean) must also be easy to explain. ERC semantics are based on a 948 reformulation and extension of the Dublin Core [DCORE] hypothesis, 949 which suggests that the fifteen Dublin Core metadata elements have a 950 key role to play in cross-domain resource description. The ERC 951 design recognizes that the Dublin Core's primary contribution is the 952 international, interdisciplinary consensus that identified fifteen 953 semantic buckets (element categories), regardless of how they are 954 labeled. The ERC then adds a definition for a record and some 955 minimal compliance rules. In pursuing the limits of simplicity, the 956 ERC design combines and relabels some Dublin Core buckets to isolate 957 a tiny kernel (subset) of four elements for basic cross-domain 958 resource description. 960 For the cross-domain kernel, the ERC uses the four basic elements -- 961 who, what, when, and where -- to pretend that every object in the 962 universe can have a uniform minimal description. Each has a name or 963 other identifier, a location, some responsible person or party, and a 964 date. It doesn't matter what type of object it is, or whether one 965 plans to read it, interact with it, smoke it, wear it, or navigate 966 it. Of course, this approach is flawed because uniformity of 967 description for some object types requires more semantic contortion 968 and sacrifice than for others. That is why at the beginning of this 969 document, the ARK was said to be suited to objects that accommodate 970 reasonably regular electronic description. 972 While insisting on uniformity at the most basic level provides 973 powerful cross-domain leverage, the semantic sacrifice is great for 974 many applications. So the ERC also permits a semantically rich and 975 nuanced description to co-exist in a record along with a basic 976 description. In that way both sophisticated and naive recipients of 977 the record can extract the level of meaning from it that best suits 978 their needs and abilities. Key to unlocking the richer description 979 is a controlled vocabulary of ERC record types (not explained in this 980 document) that permit knowledgeable recipients to apply defined sets 981 of additional assumptions to the record. 983 7.1. ERC Syntax 985 An ERC record is a sequence of metadata elements ending in a blank 986 line. An element consists of a label, a colon, and an optional 987 value. Here is an example of a record with five elements. 989 erc: 990 who: Gibbon, Edward 991 what: The Decline and Fall of the Roman Empire 992 when: 1781 993 where: http://www.ccel.org/g/gibbon/decline/ 995 A long value may be folded (continued) onto the next line by 996 inserting a newline and indenting the next line. A value can be thus 997 folded across multiple lines. Here are two example elements, each 998 folded across four lines. 1000 who/created: University of California, San Francisco, AIDS 1001 Program at San Francisco General Hospital | University 1002 of California, San Francisco, Center for AIDS Prevention 1003 Studies 1004 what/Topic: 1005 Heart Attack | Heart Failure 1006 | Heart 1007 Diseases 1009 An element value folded across several lines is treated as if the 1010 lines were joined together on one long line. For example, the second 1011 element from the previous example is considered equivalent to 1013 what/Topic: Heart Attack | Heart Failure | Heart Diseases 1015 An element value may contain multiple values, each one separated from 1016 the next by a `|' (pipe) character. The element from the previous 1017 example contains three values. 1019 For annotation purposes, any line beginning with a `#' (hash) 1020 character is treated as if it were not present; this is a "comment" 1021 line (a feature not available in email or HTTP headers). For 1022 example, the following element is spread across four lines and 1023 contains two values: 1025 what/Topic: 1026 Heart Attack 1027 # | Heart Failure -- hold off until next review cycle 1028 | Heart Diseases 1030 7.2. ERC Stories 1032 An ERC record is organized into one or more distinct segments, where 1033 where each segment tells a story about a different aspect of the 1034 information resource. A segment boundary occurs whenever a segment 1035 label (an element beginning with "erc") is encountered. The basic 1036 label "erc:" introduces the story of an object's expression (e.g., 1037 its publication, installation, or performance). The label "erc- 1038 about:" introduces the story of an object's content (what it is 1039 about) and "erc-support:" introduces the story of a support 1040 commitment made to it. A story segment that concerns the ERC itself 1041 is introduced by the label "erc-from:". It is an important segment 1042 that tells the story of the ERC's provenance. Elements beginning 1043 with "erc" are reserved for segment labels and their associated story 1044 types. From an earlier example, here is an ERC with two segments. 1046 erc: 1047 who: Lederberg, Joshua 1048 what: Studies of Human Families for Genetic Linkage 1049 when: 1974 1050 where: http://profiles.nlm.nih.gov/BB/A/N/T/U/_/bbantu.pdf 1051 erc-support: 1052 who: NIH/NLM/LHNCBC 1053 what: Permanent, Unchanging Content 1054 # Note to ops staff: date needs verification. 1055 when: 2001 04 21 1056 where: http://ark.nlm.nih.gov/yy22948 1058 Segment stories are told according to journalistic tradition. While 1059 any number of pertinent elements may appear in a segment, priority is 1060 placed on answering the questions who, what, when, and where at the 1061 beginning of each segment so that readers can make the most important 1062 selection or rejection decisions as soon as possible. To make things 1063 simple, the listed ordering of the questions is maintained in each 1064 segment (as it happens most people who have been exposed to this 1065 story telling technique are already familiar with the above 1066 ordering). 1068 The four questions are answered by using corresponding element 1069 labels. The four element labels can be re-used in each story 1070 segment, but their meaning changes depending on the segment (the 1071 story type) in which they appear. In the example above, "who" is 1072 first used to name a document's author and subsequently used to name 1073 the permanence guarantor (provider). Similarly, "when" first lists 1074 the date of object creation and in the next segment lists the date of 1075 a commitment decision. Four labels appearing across three segments 1076 effectively map to twelve semantically distinct elements. Distinct 1077 element meanings are mapped to Dublin Core elements in a later 1078 section. 1080 7.3. The ERC Anchoring Story 1082 Each ERC contains an anchoring story. It is usually found in the 1083 first segment labeled "erc:" and it concerns an "anchoring" 1084 expression of the object. An "anchoring" expression is the one that 1085 a provider deemed the most suitable basic referent given the audience 1086 and application for which it produced the ERC. If it sounds like the 1087 provider has great latitude in choosing its anchoring expression, it 1088 is because it does. A typical anchoring expression in an ERC for a 1089 born-digital document would be described in the story of the 1090 document's release on a web site. 1092 An anchoring story need not be the central descriptive goal of an ERC 1093 record. For example, a museum provider may create an ERC for a 1094 digitized photograph of a painting but choose to anchor it in the 1095 story of the original painting instead of the story of the electronic 1096 likeness; although the ERC may through other segments prove to be 1097 centrally concerned with describing the electronic likeness, the 1098 provider may have chosen this particular anchoring story in order to 1099 make the ERC visible in a way that is most natural to patrons (who 1100 would find the Mona Lisa under da Vinci sooner than they would find 1101 it under the name of the person who snapped the photograph or scanned 1102 the image). In another example, a provider that creates an ERC for a 1103 dramatic play as an abstract work has the task of describing a piece 1104 of intangible intellectual property. To anchor this abstract object 1105 in the concrete world, if only through a derivative expression, it 1106 makes sense for the provider to choose one printed edition of the 1107 play as the anchoring object expression (to describe in the anchoring 1108 story) of the ERC. 1110 The anchoring story has special rules designed to keep ERC processing 1111 simple and predictable. Each of the four basic elements (who, what, 1112 when, and where) must be present, unless a best effort to supply it 1113 fails. In the event of failure, the element still appears but a 1114 special value (described later) is used to explain the missing value. 1115 While the requirement that each of the four elements be present only 1116 applies to the anchoring story segment, as usual these elements 1117 appear at the beginning of the segment and may only be used in the 1118 prescribed order. A minimal ERC would normally consist of just an 1119 anchoring story and the element quartet, as illustrated in the next 1120 example. 1122 erc: 1123 who: National Research Council 1124 what: The Digital Dilemma 1125 when: 2000 1126 where: http://books.nap.edu/html/digital%5Fdilemma 1128 A minimal ERC can be abbreviated so that it resembles a traditional 1129 compact bibliographic citation that is nonetheless completely machine 1130 processable. The required elements and ordering makes it possible to 1131 eliminate the element labels, as shown here. 1133 erc: National Research Council | The Digital Dilemma | 2000 1134 | http://books.nap.edu/html/digital%5Fdilemma 1136 Although machine readable, this abbreviated ERC format is neither 1137 required nor permitted at this time in HKMP responses. 1139 7.4. ERC Elements 1141 As mentioned, the four basic ERC elements (who, what, when, and 1142 where) take on different specific meanings depending on the story 1143 segment in which they are used. By appearing in each segment, albeit 1144 in different guises, the four elements serve as a valuable mnemonic 1145 device -- a kind of checklist -- for constructing minimal story 1146 segments from scratch. Again, it is only in the anchoring segment 1147 that all four elements are mandatory. 1149 An ERC segment allows an unlimited number of elements with the same 1150 label. Multiple such elements are treated as if they were combined 1151 into one element with multiple values. For example, the three 1152 elements, 1154 who: Bullock, T.H. 1155 who: Achimowicz, J.Z. | Duckrow, R.B. | Spencer, S.S. 1156 who: Iragui-Madoz, V.J. 1158 are treated as if all the values were in one element with five- 1159 values: 1161 who: Bullock, T.H. | Achimowicz, J.Z. | Duckrow, R.B. 1162 | Spencer, S.S. | Iragui-Madoz, V.J. 1164 Required ordering is preserved as long as all who's precede all 1165 what's, all what's precede all when's, and all when's precede all 1166 where's. 1168 Here are some mappings between ERC elements and Dublin Core [DCORE] 1169 elements. 1171 Segment ERC Element Equivalent Dublin Core Element 1172 --------- ----------- ------------------------------ 1173 erc who Creator/Contributor/Publisher 1174 erc what Title 1175 erc when Date 1176 erc where Identifier 1177 erc-about who 1178 erc-about what Subject 1179 erc-about when Coverage (temporal) 1180 erc-about where Coverage (spatial) 1182 The basic element labels may also be qualified to add nuances to the 1183 semantic categories that they identify. Elements are qualified by 1184 appending a `/' (slash) and a qualifier term. Often qualifier terms 1185 appear as the past tense form of a verb because it makes re-using 1186 qualifiers among elements easier. Such is the case for three out of 1187 the four qualifiers appearing in the seven elements below. 1189 who/created: ... 1190 who/published: ... 1191 who/modified: ... 1192 when/valid: ... 1193 when/modified: ... 1194 when/published: ... 1195 when/created: ... 1197 Using past tense verbs for qualifiers also reminds providers and 1198 recipients that element values contain transient assertions that may 1199 have been true once, but that tend to become less true over time. 1200 Recipients that don't understand the meaning of a qualifier can fall 1201 back onto the semantic category (bucket) designated by the 1202 unqualified element label. Inevitably recipients (people and 1203 software) will have diverse abilities in understanding elements and 1204 qualifiers. 1206 Any number of other elements and qualifiers may be used in 1207 conjunction with the quartet of basic segment questions. The only 1208 semantic requirement is that they pertain to the segment's story. 1209 Also, it is only the four basic elements that change meaning 1210 depending on their segment context. All other elements have meaning 1211 independent of the segment in which they appear. If an element label 1212 stripped of its qualifier is still not recognized by the recipient, a 1213 second fall back position is to ignore it and rely on the four basic 1214 elements. 1216 Elements may be either Canonical, Provisional, or Local. Canonical 1217 elements are officially recognized via a registry as part of the 1218 metadata vernacular. All elements, qualifiers, and segment labels 1219 used in this document up until now belong to that vernacular. 1220 Provisional elements are also officially recognized via the registry, 1221 but have only been proposed for inclusion in the vernacular. To be 1222 promoted to the vernacular, a provisional element passes through a 1223 vetting process during which its documentation must be in order and 1224 its community acceptance demonstrated. Local elements are any 1225 elements not officially recognized in the registry. The registry 1226 [REG] is a work in progress. 1228 Local elements can be immediately distinguishable from Canonical or 1229 Provisional elements because all terms that begin with an upper case 1230 letter are reserved for spontaneous local use. No term beginning 1231 with an upper case letter will ever be assigned Canonical or 1232 Provisional status, so it should be safe to use such terms for local 1233 purposes. Any recipient of external ERCs containing such terms will 1234 understand them to be part of the originating provider's local 1235 metadata dialect. Here's an example ERC with three segments, one 1236 local element, and two local qualifiers. The segment boundaries have 1237 been emphasized by comment lines (which, as before, are ignored by 1238 processors). 1240 erc: 1241 who: Bullock, TH | Achimowicz, JZ | Duckrow, RB 1242 | Spencer, SS | Iragui-Madoz, VJ 1243 what: Bicoherence of intracranial EEG in sleep, 1244 wakefulness and seizures 1245 when: 1997 12 00 1246 where: http://cogprints.soton.ac.uk/%{ 1247 documents/disk0/00/00/01/22/index.html %} 1248 in: EEG Clin Neurophysiol | v103, i6, p661-678 | 1997 12 00 1249 IDcode: cog00000122 1250 # ---- new segment ---- 1251 erc-about: 1252 what/Subcategory: Bispectrum | Nonlinearity | Epilepsy 1253 | Cooperativity | Subdural | Hippocampus | Higher moment 1254 # ---- new segment ---- 1255 erc-from: 1256 who: NIH/NLM/NCBI 1257 what: pm9546494 1258 when/Reviewed: 1998 04 18 021600 1259 where: http://ark.nlm.nih.gov/12025/pm9546494? 1261 The local element "IDcode" immediately precedes the "erc-about" 1262 segment, which itself contains an element with the local qualifier 1263 "Subcategory". The second to last element also carries the local 1264 qualifier "Reviewed". Finally, what might be a provisional element 1265 "in" appears near the end of the first segment. It might have been 1266 proposed as a way to complete a citation for an object originally 1267 appearing inside another object (such as an article appearing in a 1268 journal or an encyclopedia). 1270 7.5. ERC Element Values 1272 ERC element values tend to be straightforward strings. If the 1273 provider intends something special for an element, it will so 1274 indicate with markers at the beginning of its value string. The 1275 markers are designed to be uncommon enough that they would not likely 1276 occur in normal data except by deliberate intent. Markers can only 1277 occur near the beginning of a string, and once any octet of non- 1278 marker data has been encountered, no further marker processing is 1279 done for the element value. In the absence of markers the string is 1280 considered pure data; this has been the case with all the examples 1281 seen thus far. The fullest form of an element value with all three 1282 optional markers in place looks like this. 1284 VALUE = [markup_flags] (:ccode) , DATA 1286 In processing, the first non-whitespace character of an ERC element 1287 value is examined. An initial `[' is reserved to introduce a 1288 bracketed set of markup flags (not described in this document) that 1289 ends with `]'. If ERC data is machine-generated, each value string 1290 may be preceded by "[]" to prevent any of its data from being 1291 mistaken for markup flags. Once past the optional markup, the 1292 remaining value may optionally begin with a controlled code. A 1293 controlled code always has the form "(:ccode)", for example, 1295 who: (:unkn) Anonymous 1296 what: (:791) Bee Stings 1298 Any string after such a code is taken to be an uncontrolled (e.g., 1299 natural language) equivalent. The code "unkn" indicates a 1300 conventional explanation for a missing value (stating that the value 1301 is unknown). The remainder of the string makes an equivalent 1302 statement in a form that the provider deemed most suitable to its 1303 (probably human) audience. The code "791" could be a fixed numeric 1304 topic identifier within an unspecified topic vocabulary. Any code 1305 may be ignored by those that do not understand it. 1307 There are several codes to explain different ways in which a required 1308 element's value may go missing. 1310 (:unkn) unknown (e.g., Anonymous) 1311 (:unav) value unavailable indefinitely 1312 (:none) never had a value, never will 1313 (:unac) temporarily inaccessible 1314 (:unap) not applicable (makes no sense) 1315 (:null) explicitly empty 1316 (:igno) element here only to satisfy syntax 1317 (:unas) value unassigned (e.g., untitled painting) 1318 (:elwr) value appears elsewhere in record 1319 (:remo) withheld, suppressed intentionally 1321 Once past an optional controlled code, the remaining string value is 1322 subjected to one final test. If the first next non-whitespace 1323 character is a `,' (comma), it indicates that the string value is 1324 "sort-friendly". This means that the value is (a) laid out with an 1325 inverted word order useful for sorting items having comparably laid 1326 out element values (items might be the containing ERC records) and 1327 (b) that the value may contain other commas that indicate inversion 1328 points should it become necessary to recover the value in natural 1329 word order. Typically, this feature is used to express Western-style 1330 personal names in family-name-given-name order. It can also be used 1331 wherever natural word order might make sorting tricky, such as when 1332 data contains titles or corporate names. Here are some example 1333 elements. 1335 who: , van Gogh, Vincent 1336 who:,Howell, III, PhD, 1922-1987, Thurston 1337 who:, Acme Rocket Factory, Inc., The 1338 who:, Mao Tse Tung 1339 who:, McCartney, Paul, Sir, 1340 what:, Health and Human Services, United States Government 1341 Department of, The, 1343 There are rules to use in recovering a copy of the value in natural 1344 word order, if desired. The above example strings have the following 1345 natural word order values, respectively. 1347 Vincent van Gogh 1348 Thurston Howell, III, PhD, 1922-1987 1349 The Acme Rocket Factory, Inc. 1350 Mao Tse Tung 1351 Sir Paul McCartney 1352 The United States Government Department of Health and Human Services 1354 7.6. ERC Element Encoding and Dates 1356 Some characters that need to appear in ERC element values might 1357 conflict with special characters used for structuring ERCs, so there 1358 needs to be a way to include them as literal characters that are 1359 protected from special interpretation. This is accomplished through 1360 an encoding mechanism that resembles the %-encoding familiar to [URI] 1361 handlers. 1363 The ERC encoding mechanism also uses `%', but instead of taking two 1364 following hexadecimal digits, it takes one non-alphanumeric character 1365 or two alphabetic characters that cannot be mistaken for hex digits. 1366 It is designed not to be confused with normal web-style %-encoding. 1367 In particular it can be decoded without risking unintended decoding 1368 of normal %-encoded data (which would introduce errors). Here are 1369 the one-character (non-alphanumeric) ERC encoding extensions. 1371 ERC Purpose 1372 --- ------------------------------------------------ 1373 %! decodes to the element separator `|' 1374 %% decodes to a percent sign `%' 1375 %. decodes to a comma `,' 1376 %_ a non-character used as syntax shim 1377 %{ a non-character that begins a ws-squeezed block 1378 %} a non-character that ends a ws-squeezed block 1380 One particularly useful construct in ERC element values is the pair 1381 of special encoding markers ("%{" and "%}") that indicates a 1382 "whitespace-squeezed" block. Whatever string of characters they 1383 enclose will be treated as if none of the contained whitespace 1384 (SPACEs, TABs, Newlines) were present. This comes in handy for 1385 writing long, multi-part URLs in a readable way. For example, the 1386 value in 1388 where: http://foo.bar.org/node%{ 1389 ? db = foo 1390 & start = 1 1391 & end = 5 1392 & buf = 2 1393 & query = foo + bar + zaf 1394 %} 1396 is decoded into an equivalent element, but with a correct and intact 1397 URL: 1399 where: 1400 http://foo.bar.org/node?db=foo&start=1&end=5&buf=2&query=foo+bar+zaf 1402 In a parting word about ERC element values, a commonly recurring 1403 value type is a date, possibly followed by a time. ERC dates take on 1404 one of the following forms: 1406 1999 (four digit year) 1407 2000 12 29 (year, month, day) 1408 2000 12 29 235955 (year, month, day, hour, minute, second) 1410 In dates, all internal whitespace is squeezed out to achieve a 1411 normalized form suitable for lexical comparison and sorting. This 1412 means that the following dates 1414 2000 12 29 235955 (recommended for readability) 1415 2000 12 29 23 59 55 1416 20 001 229 235 9 5 5 1417 20001229235955 (normalized date and time) 1419 are all equivalent. The first form is recommended for readability. 1420 The last form (shortest and easiest to compute with) is the 1421 normalized form. Hyphens and commas are reserved to create date 1422 ranges and lists, for example, 1424 1996-2000 (a range of four years) 1425 1952, 1957, 1969 (a list of three years) 1426 1952, 1958-1967, 1985 (a mixed list of dates and ranges) 1427 20001229-20001231 (a range of three days) 1429 7.7. ERC Stub Records and Internal Support 1431 The ERC design introduces the concept of a "stub" record, which is an 1432 incomplete ERC record intended to be supplemented with additional 1433 elements before being released as a standalone ERC record. Stub ERC 1434 records have no minimum required elements. They may be useful in 1435 supporting internal procedures using the ERC syntax. Often they rely 1436 on the convenience and accuracy of automatically supplied elements, 1437 even the basic ones. To be ready for external use, however, an ERC 1438 stub must be transformed into a complete ERC record having the usual 1439 required elements. An ERC stub record can be convenient for metadata 1440 embedded in a document, where elements such as location, modification 1441 date, and size -- which one would not omit from an externalized 1442 record -- are omitted simply because they are much better supplied by 1443 a computation. A separate local administrative procedure, not 1444 defined for ERC's in general, would effect the promotion of stubs 1445 into complete records. 1447 While the ERC is a general-purpose container for exchange of resource 1448 descriptions, it does not dictate how records must be internally 1449 stored, laid out, or assembled by data providers or recipients. 1450 Arbitrary internal descriptive frameworks can support ERCs simply by 1451 mapping (e.g., on demand) local records to the ERC container format 1452 and making them available for export. Therefore, to support ERCs 1453 there is no need for a data provider to convert internal data to be 1454 stored in an ERC format. On the other hand, any provider (such as 1455 one just getting started in the business of resource description) may 1456 choose to store and manipulate local data natively in the ERC format. 1458 8. Advice to Web Clients 1460 This section offers some advice to web client software developers It 1461 is hard to write about because it tries to anticipate a series of 1462 events that might lead to native web browser support for ARKs. 1464 ARKs are envisaged to appear wherever durable object references are 1465 planned. Library cataloging records, literature citations, and 1466 bibliographies are important examples. In many of these places URLs 1467 (Uniform Resource Locators) currently stand in, and URNs, DOIs, and 1468 PURLs have been proposed as alternatives. 1470 The strings representing ARKs are also envisaged to appear in some of 1471 the places where URLs currently appear: in hypertext links (where 1472 they are not normally shown to users) and in rendered text (displayed 1473 or printed). Internet search engines, for example, tend to include 1474 both actionable and manifest links when listing each item found. A 1475 normal HTML link for which the URL is not displayed looks like this. 1477 Click Here 1479 The same link with an ARK instead of a URL: 1481 Click Here 1483 Web browsers would in general require a small modification to 1484 recognize and convert this ARK, via mapping authority discovery, to 1485 an equivalent URL. 1487 Click Here 1489 A simple expedient that works for now without browser modification is 1490 to use a specified-host ARK (one with an NMAH) but without the usual 1491 "ark:" prefix (a prefix of "http://" is normally assumed), as in, 1493 Click Here 1495 An NAA will typically make known the associations it creates by 1496 publishing them in catalogs, actively advertizing them, or simply 1497 leaving them on web sites for visitors (e.g., users, indexing 1498 spiders) to stumble across in browsing. 1500 9. Security Considerations 1502 The ARK naming scheme poses no direct risk to computers and networks. 1503 Implementors of ARK services need to be aware of security issues when 1504 querying networks and filesystems for Name Mapping Authority 1505 services, and the concomitant risks from spoofing and obtaining 1506 incorrect information. These risks are no greater for ARK mapping 1507 authority discovery than for other kinds of service discovery. For 1508 example, recipients of ARKs with a specified hostport (NMAH) should 1509 treat it like a URL and be aware that the identified ARK service may 1510 no longer be operational. 1512 Apart from mapping authority discovery, ARK clients and servers 1513 subject themselves to all the risks that accompany normal operation 1514 of the protocols (e.g., HTTP, Z39.50) underlying mapping services. 1515 As specializations of such protocols, an ARK service may limit 1516 exposure to the usual risks. Indeed, ARK services may enhance a kind 1517 of security by helping users identify long-term reliable references 1518 to information objects. 1520 10. Authors' Addresses 1522 John A. Kunze 1523 Center for Knowledge Management 1524 University of California, San Francisco 1525 530 Parnassus Ave, Box 0840 1526 San Francisco, CA 94143-0840, USA 1528 Fax: +1 415-476-4653 1529 EMail: jak@ckm.ucsf.edu 1531 R. P. C. Rodgers 1532 US National Library of Medicine 1533 8600 Rockville Pike, Bldg. 38A 1534 Bethesda, MD 20894 1536 Fax: +1 301-496-0673 1537 EMail: rodgers@nlm.nih.gov 1539 11. References 1541 [DCORE] Dublin Core Metadata Initiative, "Dublin Core Metadata 1542 Element Set, Version 1.1: Reference Description", July 1543 1999, http://dublincore.org/documents/dces/. 1545 [DNS] P.V. Mockapetris, "Domain Names - Concepts and 1546 Facilities", RFC 1034, November 1987. 1548 [DOI] International DOI Foundation, "The Digital Object 1549 Identifier (DOI) System", February 2001, 1550 http://dx.doi.org/10.1000/203. 1552 [EMHDRS] D. Crocker, "Standard for the format of ARPA Internet text 1553 messages", RFC 822, August 1982. 1555 [ERC] J. Kunze, "Electronic Resource Citations", work in 1556 progress. 1558 [HKMP] J. Kunze, "HTTP Key Mapping Protocol", work in progress. 1560 [HTTP] R. Fielding, et al, "Hypertext Transfer Protocol -- 1561 HTTP/1.1", RFC 2616, June 1999. 1563 [MD5] R. Rivest, "The MD5 Message-Digest Algorithm", RFC 1321, 1564 April 1992. 1566 [NAPTR] M. Mealling, Daniel, R., "The Naming Authority Pointer 1567 (NAPTR) DNS Resource Record", RFC 2915, September 2000. 1569 [NLMPerm] M. Byrnes, "Defining NLM's Commitment to the Permanence of 1570 Electronic Information", ARL 212:8-9, October 2000, 1571 http://www.arl.org/newsltr/212/nlm.html 1573 [PURL] K. Shafer, et al, "Introduction to Persistent Uniform 1574 Resource Locators", 1996, 1575 http://purl.oclc.org/OCLC/PURL/INET96 1577 [REG] J. Kunze, "Resource Metadata Vocabulary", work in 1578 progress. 1580 [URI] T. Berners-Lee, et al, "Uniform Resource Identifiers 1581 (URI): Generic Syntax", RFC 2396, August 1998. 1583 [URNBIB] C. Lynch, et al, "Using Existing Bibliographic Identifiers 1584 as Uniform Resource Names", RFC 2288, February 1998. 1586 [URNSYN] R. Moats, "URN Syntax", RFC 2141, May 1997. 1588 [URNNID] L. Daigle, et al, "URN Namespace Definition Mechanisms", 1589 RFC 2611, June 1999. 1591 [TELNET] J. Postel, J.K. Reynolds, "Telnet Protocol Specification", 1592 RFC 854, May 1983. 1594 12. Appendix: An NLM Prototype ARK Service 1596 The US National Library of Medicine (NLM) has an experimental, 1597 prototype ARK service under development. It is being made available 1598 for purposes of demonstrating various aspects of the ARK system, but 1599 is subject to temporary or permanent withdrawal (without notice) 1600 depending upon the circumstances of the small research group 1601 responsible for making it available. It is described at: 1603 http://ark.nlm.nih.gov/ 1605 Comments and feedback may be addressed to rodgers@nlm.nih.gov. 1607 13. Appendix: Current ARK Name Authority Table 1609 This appendix contains a copy of the Name Authority Table (a file) at 1610 the time of writing. It may be loaded into a local filesystem (e.g., 1611 /etc/natab) for use in mapping NAAs (Name Assigning Authorities) to 1612 NMAHs (Name Mapping Authority Hostports). It contains Perl code that 1613 can be copied into a standalone script that processes the table (as a 1614 file). Because this is still a proposed file, none of the values in 1615 it are real. 1617 # 1618 # Name Assigning Authority / Name Mapping Authority Lookup Table 1619 # Last change: 22 February 2001 1620 # Reload from: http://ark.nlm.nih.gov/etc/natab 1621 # Mirrored at: http://www.ckm.ucsf.edu/people/jak/home/etc/natab 1622 # http://....../etc/natab 1623 # To register: mailto:jak@ckm.ucsf.edu?Subject=naareg 1624 # Process with: Perl script at end of this file (optional) 1625 # 1626 # Each NAA appears at the beginning of a line with the NAA Number 1627 # first, a colon, and an ARK or URL to a statement of naming policy 1628 # (see http://ark.nlm.nih.gov/naapolicyeg.html for an example). 1629 # All the NMA hostports that service an NAA are listed, one per 1630 # line, indented, after the corresponding NAA line. 1631 # 1632 # US Library of Congress 1633 12025: http://www.loc.gov/xxx/naapolicy.html 1634 foobar.zaf.org 1635 sneezy.dopey.com 1636 # 1637 # US National Library of Medicine 1638 12026: http://www.nlm.nih.gov/xxx/naapolicy.html 1639 lhc.nlm.nih.gov:8080 1640 foobar.zaf.org 1641 sneezy.dopey.com 1642 # 1643 # US National Agriculture Library 1644 12027: http://www.nal.gov/xxx/naapolicy.html 1645 foobar.zaf.gov:80 1646 # 1647 #--- end of data --- 1648 # The enclosed Perl script takes an NAA as argument and outputs 1649 # the NMAs in this file listed under any matching NAA. 1650 # 1651 # my $naa = shift; 1652 # while (<>) { 1653 # next if (! /^$naa:/); 1654 # while (<>) { 1655 # last if (! /^[#\s]./); 1656 # print "$1\n" if (/^\s+(\S+)/); 1657 # } 1658 # } 1659 # end of file 1661 14. Copyright Notice 1663 Copyright (C) The Internet Society (2001). All Rights Reserved. 1665 This document and translations of it may be copied and furnished to 1666 others, and derivative works that comment on or otherwise explain it 1667 or assist in its implementation may be prepared, copied, published 1668 and distributed, in whole or in part, without restriction of any 1669 kind, provided that the above copyright notice and this paragraph are 1670 included on all such copies and derivative works. However, this 1671 document itself may not be modified in any way, such as by removing 1672 the copyright notice or references to the Internet Society or other 1673 Internet organizations, except as needed for the purpose of 1674 developing Internet standards in which case the procedures for 1675 copyrights defined in the Internet Standards process must be 1676 followed, or as required to translate it into languages other than 1677 English. 1679 The limited permissions granted above are perpetual and will not be 1680 revoked by the Internet Society or its successors or assigns. 1682 This document and the information contained herein is provided on an 1683 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 1684 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 1685 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1686 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 1687 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1689 The IETF invites any interested party to bring to its attention any 1690 copyrights, patents or patent applications, or other proprietary 1691 rights which may cover technology that may be required to practice 1692 this standard. Please address the information to the IETF Executive 1693 Director. 1695 Expires 8 September 2001 1696 Table of Contents 1698 Status of this Document ........................................... 1 1699 Abstract .......................................................... 1 1700 1. Introduction .................................................. 3 1701 1.1. Three Reasons to Use ARKs ................................... 3 1702 1.2. Organizing Support for ARKs ................................. 4 1703 1.3. A Definition of Identifier .................................. 5 1704 2. ARK Anatomy ................................................... 6 1705 2.1. The Name Mapping Authority Hostport (NMAH) .................. 6 1706 2.2. The Name Assigning Authority Number (NAAN) .................. 7 1707 2.3. The Name Part ............................................... 8 1708 2.4. Lexical Equivalence ......................................... 8 1709 2.5. Naming Considerations ....................................... 9 1710 3. Assigners of ARKs ............................................. 11 1711 4. Finding a Name Mapping Authority .............................. 11 1712 4.1. Looking Up NMAHs in a Globally Accessible File .............. 13 1713 4.2. Looking up NMAHs Distributed via DNS ........................ 15 1714 5. Generic ARK Service Definition ................................ 16 1715 5.1. Generic ARK Access Service (access, location) ............... 17 1716 5.2. Generic Policy Service (permanence, naming, etc.) .......... 17 1717 5.3. Generic Description Service ................................. 18 1718 6. Overview of the HTTP Key Mapping Protocol (HKMP) .............. 18 1719 7. Overview of Electronic Resource Citations (ERCs) .............. 21 1720 7.1. ERC Syntax .................................................. 23 1721 7.2. ERC Stories ................................................. 24 1722 7.3. The ERC Anchoring Story ..................................... 25 1723 7.4. ERC Elements ................................................ 26 1724 7.5. ERC Element Values .......................................... 29 1725 7.6. ERC Element Encoding and Dates .............................. 31 1726 7.7. ERC Stub Records and Internal Support ....................... 33 1727 8. Advice to Web Clients ......................................... 33 1728 9. Security Considerations ....................................... 34 1729 10. Authors' Addresses ........................................... 34 1730 11. References ................................................... 35 1731 12. Appendix: An NLM Prototype ARK Service ...................... 36 1732 13. Appendix: Current ARK Name Authority Table .................. 36 1733 14. Copyright Notice ............................................. 38