idnits 2.17.1 

draft-mallery-urn-pdi-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** The document is more than 15 pages and seems to lack a Table of Contents.

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 1296 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 25 instances of too long lines in the document, the longest
     one being 22 characters in excess of 72.

  ** The abstract seems to contain references ([21], [17], [12]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 223: '...   MUST be encoded according to the ch...'
     RFC 2119 keyword, line 243: '...ument issues, it MUST increment the ve...'
     RFC 2119 keyword, line 260: '...   <DOCUMENT-SERIES> MUST be a two digit ISO 3166 country code [10],...'
     RFC 2119 keyword, line 262: '...DOCUMENT-SERIES> SHOULD add a term to ...'
     RFC 2119 keyword, line 277: '...   MAY utilize existing organizational...'
     (69 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 10, 1997) is 9664 days in the past.  Is this
     intentional?

  -- Found something which looks like a code comment -- if you have code
     sections in the document, please surround them with '<CODE BEGINS>' and
     '<CODE ENDS>' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '5' is defined on line 1200, but no explicit reference
     was found in the text

  == Unused Reference: '13' is defined on line 1234, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 1738 (ref. '1') (Obsoleted by RFC
     4248, RFC 4266)

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Obsolete normative reference: RFC 1521 (ref. '3') (Obsoleted by RFC
     2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049)

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  -- Possible downref: Non-RFC (?) normative reference: ref. '5'

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  ** Downref: Normative reference to an Historic RFC: RFC 2169 (ref. '7')

  ** Obsolete normative reference: RFC 2168 (ref. '8') (Obsoleted by RFC
     3401, RFC 3402, RFC 3403, RFC 3404)

  ** Obsolete normative reference: RFC 2068 (ref. '9') (Obsoleted by RFC 2616)

  ** Obsolete normative reference: RFC 2048 (ref. '10') (Obsoleted by RFC
     4288, RFC 4289)

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'

  -- Possible downref: Non-RFC (?) normative reference: ref. '12'

  -- Possible downref: Non-RFC (?) normative reference: ref. '13'

  -- Possible downref: Non-RFC (?) normative reference: ref. '14'

  -- Possible downref: Non-RFC (?) normative reference: ref. '15'

  -- Possible downref: Non-RFC (?) normative reference: ref. '16'

  ** Obsolete normative reference: RFC 2141 (ref. '17') (Obsoleted by RFC
     8141)

  ** Obsolete normative reference: RFC 1700 (ref. '18') (Obsoleted by RFC
     3232)

  -- Possible downref: Non-RFC (?) normative reference: ref. '19'

  ** Downref: Normative reference to an Informational RFC: RFC 1737 (ref.
     '20')

  -- Possible downref: Non-RFC (?) normative reference: ref. '21'


     Summary: 22 errors (**), 0 flaws (~~), 4 warnings (==), 15 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet-Draft                                           J. C. Mallery
3	draft-mallery-urn-pdi-00.txt                             M.I.T.
4	Expires in six months                                    November 10, 1997

6	                  Persistent Document Identifiers
7	                  Filename: draft-mallery-urn-pdi-00.txt

9	Status of This Memo

11	      This document is an Internet-Draft.  Internet-Drafts are working
12	      documents of the Internet Engineering Task Force (IETF), its
13	      areas, and its working groups.  Note that other groups may also
14	      distribute working documents as Internet-Drafts.

16	      Internet-Drafts are draft documents valid for a maximum of six
17	      months and may be updated, replaced, or obsoleted by other
18	      documents at any time.  It is inappropriate to use Internet-
19	      Drafts as reference material or to cite them other than as ``work
20	      in progress.''

22	      To learn the current status of any Internet-Draft, please check
23	      the ``1id-abstracts.txt'' listing contained in the Internet-
24	      Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
25	      (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East
26	      Coast), or ftp.isi.edu (US West Coast).

28	Abstract

30	   This document specifies the syntax and semantics of the Persistent
31	   Document Identifier (PDI) namespace within the URN framework
32	   defined by RFC 2141 [17]. PDIs provide a means to refer to digital
33	   objects and fragments that does not depend their storage location
34	   or the protocol used to access them. Since 1994, several
35	   large-scale applications with these requirements have used PDIs
36	   [12] [21].

38	   PDIs are intended primarily as permanent identifiers for archival
39	   reference to long-lived documents.  PDIs have a fragment syntax to
40	   allow permanent references to parts of documents (within specific
41	   formats) as well as a citation syntax to allow references to
42	   appearances of such fragments in composite documents.

44	   PDIs are most useful for any document series that is distributed via
45	   multiple protocols, is available from multiple sources, migrates to
46	   new locations, needs fragment references, or participates in
47	   distributed assertion semantics related to collaboration or access
48	   control.

50	1. Namespace Syntax

52	1.2 Design Goals

54	   Persistent Document Identifiers provide a means to refer to digital
55	   objects and fragments that does not depend their storage location
56	   or the protocol used to access them.  PDIs offer the following
57	   capabilities:

59	        * Multisourcing: The same resource can be stored in different
60	        locations yet retrieved by a virtue of a shared identifier.

62	        * Multiple Protocols: Identifiers are not tied to specific
63	        transport protocols.

65	        * Persistence: PDIs persist across relocation of a digital
66	        object to different storage sites. The longevity of a PDI is
67	        not limited by lifetime of a directory, domain name, or even,
68	        a transport protocol.

70	        * Organizational Delegation: PDIs define a hierarchical
71	        encoding of the issuing authority that allows delegation in a
72	        manner analogous to names in the Domain Name System names but
73	        more akin to X.400.

75	        * Chronological Delegation: PDIs incorporate a time hierarchy
76	        that allows delegation of identifiers with different time
77	        ranges to different authorities or to different resolution
78	        regimes.

80	        * Fragment Syntax: PDIs offer an extensible syntax for
81	        referring to part of a resource. This evolutionary approach
82	        allows different schemes according to media type as well as
83	        multiple schemes per media type. Longevity of reference is
84	        sought by defining fragment schemes that are independent of
85	        machine representation. Referential consistency is guaranteed
86	        by monotonic commitment of versioned PDIs to immutable
87	        resource representations.

89	        * Citation Syntax: PDIs include a syntax for referring to
90	        appearances of document fragments as quoted in other composite
91	        documents. This makes fragment quotations first-class objects,
92	        about which assertions can be made.

94	        * User Friendly: PDIs carry a relatively simple syntax with
95	        some mnemonics so that, if need be, people can type them to
96	        access a resource.

98	   A guiding design principle for PDIs is to minimize the document
99	   semantics carried within the identifier.  Most semantics is better
100	   encoded by assertions about PDIs. Not only is overloading of the
101	   identifier avoided, but assertions can also be modified without
102	   recourse to changing the identifier.

104	1. Namespace Syntax

106	   Consistent with the URN syntax specification in RFC 2141 [17], each
107	   namespace must specify syntax related information that is specific to
108	   that namespace.  This section provides these specifications for the
109	   PDI namespace. The PDI grammar below uses the ABNF [6]. A URN using
110	   the Persistent Document Identifier namespace has the form:

112	        <URN> = "urn:" pdi              ; Encoding in URN syntax

114	1.1. Namespace Identifier (NID)

116	   The Namespace Identifier for this namespace is "pdi", which is case
117	   insensitive.

119	        <PDI> = "pdi" ":" nss           ; Persistent Document Identifier

121	1.2. Namespace Specific String (NSS)

123	   The Namespace Specific String for this namespace is:

125	        <NSS> = resource-identifier [(citation-specifier / fragment-specifier)]

127	1.2.1 Resource Identifier

129	        <RESOURCE-IDENTIFIER> = "//" document-series "/" iso-date "/" specifier

131	        <DOCUMENT-SERIES> = component *["." component] "." iso-country

133	        <COMPONENT> = alpha-hyphen-digits

135	        <ISO-COUNTRY> = 2*alpha                 ; See ISO Standard 3166 [10]

137	        <ISO-DATE> = year "/" month "/" day

139	        <YEAR> = 4*digit / wildcard

141	        <MONTH> = 2*digit / wildcard

143	        <DAY> = 2*digit / wildcard

145	        <SPECIFIER> = unique-id ["." format ["." version]] ;versions require formats

147	        <UNIQUE-ID>  = daily-serial-number / encapulated-unique-id / digits /
148	                         wildcard

150	        <DAILY-SERIAL-NUMBER> = digits

152	        <ENCAPULATED-UNIQUE-ID> = unique-id-chars

154	        <UNIQUE-ID-CHARS> = alpha / digit / other / "%" hex hex

156	        <FORMAT> =  media-type-token / wildcard

158	        <MEDIA-TYPE-TOKEN> = "text" / "html" / extension-token

160	        <EXTENSION-TOKEN> = alpha-hyphen

162	        <VERSION> = digits / wildcard

164	        <WILDCARD> = "*"

166	1.2.2 Citation Specifier

168	        <CITATION-SPECIFIER> = "@" origin-position "=" pdi

170	        <ORIGIN-POSITION> = position

172	1.2.3 Fragment Specifier

174	        <FRAGMENT-SPECIFIER> = "#" [fragment-scheme "="] position [*("," position)]

176	        <FRAGMENT-SCHEME> = "char" / "elt" / "name" / "rect" / "msec" / "sec" /
177	                              "crop" / "byte" / ext-fragment-scheme

179	        <EXT-FRAGMENT-SCHEME> = alpha-hyphen

181	        <POSITION> = char-position / element-position / element-name /
182	                       2-dim-coordinate / frame-number / time / byte-position /
183	                       ext-position

185	        <EXT-POSITION> = position-specifier /
186	                           "(" position-specifier  *["," position-specifier] ")"

188	        <POSITION-SPECIFIER> = alphadigits

190	1.2.4 Supporting Definitions

192	        <ALPHA> = %x41-5A / %x61-7A                     ; A-Z / a-z

194	        <ALPHA-DIGITS> = alphas / digits

196	        <ALPHA-HYPHEN-DIGITS> = alpha-hyphen / digits

198	        <ALPHA-HYPHEN> = alpha / "-"

200	        <ALPHA-HYPHENS> = *alpha-hyphen

202	        <ALPHAS> = *ALPHA

204	        <DIGIT> = %x30-39                               ; 0-9

206	        <DIGITS> = *DIGIT

208	        <URN-CHARS> = trans / "%" hex hex               ;RFC 2141

210	        <TRANS> =  alpha / digit / other / reserved

212	        <HEX> = digit / "A" / "B" / "C" / "D" / "E" / "F" /
213	                        "a" / "b" / "c" / "d" / "e" / "f"

215	        <OTHER> = "(" / ")" / "-" / ":" / ";" / "$" / "_" / "!" / "'"

217	        <RESERVED> = "%" / "." / "," / "/" / "#" / "*" / "@" /
218	                     "=" / "?" / "+"

220	1.2.5 Reserved Characters

222	   <RESERVED> are used as special characters in the PDI grammar. They
223	   MUST be encoded according to the character escaping method
224	   described in RFC 2141 [17].

226	2 Discussion

228	2.1 Minting PDIs

230	   PDIs are issued by the authority named in <DOCUMENT-SERIES>.
231	   <DOCUMENT-SERIES> is intended to look like a domain name for easy
232	   parsing but there is no requirement to serve the name via the Domain
233	   Name System (DNS) nor to assure that the name is not assigned for
234	   other purposes by DNS.

236	   The encoded date in <ISO-DATE> is the date when the identifier is
237	   minted. This date is based on Greenwich meantime. The encoded date
238	   bears no relationship to dates associated with the resource that the
239	   PDI denotes, even if there may be proximity between the time when the
240	   resource issues and the time when the PDI is minted.

242	   The PDI namespace is monotonic; PDIs cannot be retracted. If a new
243	   version of the same document issues, it MUST increment the version
244	   number for the previously issued PDI. This requirement assures that
245	   any machine representation (byte sequence) associated with formats
246	   of a versioned PDI never changes.

248	   Byte equivalence for all resource formats denoted by a specific PDI
249	   version ensures that digital signatures associated with a PDI check
250	   for any uncorrupted resource. More significantly, byte equivalence
251	   enables reliable, efficient fragment references for many media types.
252	   It eliminates the potentially difficult problem of rolling fragment
253	   references forward as a target resource is modified.

255	2.2 Issuing Authority

257	   The issuing authority controls the name in a document series. These
258	   names are hierarchical so that administration can be delegated within
259	   authority domains. Unlike domain names, the right most component of a
260	   <DOCUMENT-SERIES> MUST be a two digit ISO 3166 country code [10],
261	   indicating the country in which the issuing organization resides.  In
262	   most cases, a <DOCUMENT-SERIES> SHOULD add a term to the issuing
263	   authority in order to differentiate the series from other document
264	   sets that the authority might issue. By specializing the document
265	   series below the issuing authority, identifiers reflect the chain of
266	   delegation.  Additionally, it becomes easier to obsolesce an entire
267	   document series, if that becomes necessary.

269	   For wide use of PDIs, an issuing authority will need to issue
270	   toplevel authority names to organizations wishing to mint PDIs in
271	   their own document series. Once a toplevel document series name has
272	   been obtained, an organization may issue PDIs itself or delegate
273	   subseries.

275	   A subseries is delegated by adding a name component to the left of
276	   <DOCUMENT-SERIES>. The accretion of components on a document series
277	   MAY utilize existing organizational names or acronyms whenever
278	   feasible in order to preserve mnemonics in the document series
279	   name. Additionally, dropping components from the left SHOULD lead
280	   to ever more general issuing authorities in terms of organizational
281	   scope.

283	   Delegation SHOULD follow de jure organizational structure. Issuing
284	   authority SHOULD NEVER be delegated outside the organization unless
285	   the external agent is acting directly on behalf of the document
286	   series owner. When organizational boundaries are crossed, a new
287	   document series toplevel SHOULD be acquired. Within an organization,
288	   issuing authority SHOULD be delegated to the level where
289	   responsibility for content resides. This facilitates contact with
290	   document originators. More importantly, it reduces administrative
291	   scope, and thus, encourages more uniform document management policies
292	   for a particular document series.

294	2.3 Hierarchical Date

296	   <ISO-DATE> of a PDI MUST be assigned when the identifier is minted.
297	   The calendar date MUST correspond to Greenwich Mean time.

299	   Inclusion of the ISO date conveys the time when the identifier was
300	   minted.  Beyond making it easier to guarantee identifier uniqueness,
301	   hierarchicalization by date enables reference to ranges of
302	   identifiers issued within specific time intervals.

304	   Use of ISO dates also ensures that lexical sorts of identifiers
305	   produce a chronological ordering of PDIs, making various listings
306	   (e.g., directory lists) automatically appear in a meaningful
307	   order.

309	   Moreover, different administrative policies MAY be applied to any
310	   particular time interval.  For example, when responsibility for
311	   resolving PDIs shifts to a different administrative authority,
312	   intervals covered by the new policy are readily specifiable and
313	   conveyed. For example, different intervals may be delegated to
314	   different URN resolvers and these delegations recorded with
315	   relevant URN discovery systems.

317	   Operations may be applied to identifiers within an interval. For
318	   example, a browser can provide a directory list of all the
319	   documents in a year, a month, or on a day.

321	   More generally, assertions can be made about identifiers within an
322	   interval, such as where to find a resolver.

324	2.4 Daily Unique ID

326	   An application may use a mnemonic name or a serial number as the
327	   <UNIQUE-ID>. The only requirement is that <UNIQUE-ID> MUST be a
328	   unique sequence of <UNIQUE-ID-CHARS> for <ISO-DATE> and
329	   <DOCUMENT-SERIES>.

331	   If the unique ID is a <DAILY-SERIAL-NUMBER>, serial numbers SHOULD
332	   start from 1 and SHOULD be incremented by 1 as each new PDI is
333	   minted. When the calendar day is incremented at midnight GMT, the
334	   unique ID of the day SHOULD be reset to start at 1 on the new day.
335	   This prevents daily unique IDs from growing very large as it
336	   enforces date semantics on the identifier.

338	2.4.1 Encapsulation of Foreign Identifiers

340	   The specification of this field has been left open so that foreign
341	   document identifiers MAY be incorporated within a PDI as the daily
342	   unique ID. For our purposes, a foreign identifier is any identifier
343	   used by other naming or reference regimes.  Examples of foreign
344	   identifiers include, serial numbers, invoice numbers, URIs, URLs or
345	   other application-specific identifiers.

347	   When encapsulating a foreign identifier, <FORMAT> is required and
348	   MUST use a <MEDIA-TYPE-TOKEN> that identifies the media type of the
349	   resource and format of the encapsulated identifier. The media type
350	   token is required in order to allow unambiguous interpretation by
351	   applications aware of the identifier semantics. All other
352	   applications, MUST treat the unique id as opaque.

354	2.5 Format

356	   Format should use standard, controlled terms that indicate the
357	   media type [3] of the resource to which the identifier refers or,
358	   in the case of encapsulated identifiers, indicate the type of the
359	   encapsulated identifier. <FORMAT> is case insensitive.

361	   The standards for MIME content types [10] do not as yet provide a
362	   single controlled term per media type that can be used as a file
363	   extension or here as a PDI format. Below we provide a rule for
364	   constructing the <MEDIA-TYPE-TOKEN>. These tokens are created from
365	   the registered media types [10] by using the <MINOR-TYPE> if it is
366	   unique, or otherwise, concatenating the <MAJOR-TYPE> and
367	   <MINOR-TYPE>. These tokens are case insensitive and MUST encode any
368	   reserved characters (<RESERVED>) for PDIs.

370	        <CONTENT-TYPE> = major-type "/" minor-type
371	                           [* (";" parameter ["=" value])]

373	        <MEDIA-TYPE-TOKEN> = minor-type / (major-type "+" minor-type)

375	        <MAJOR-TYPE> = alpha-hyphen-digits

377	        <MINOR-TYPE> = alpha-hyphen-digits

379	   There are two media types for which <MEDIA-TYPE-TOKEN> is not
380	   <MINOR-TYPE>:

382	        Token           Content Type

384	        text            text/plain
385	        header          message/header          ;RFC 822 message headers

387	   <FORMAT> is always required when:

389	      * A PDI is minted and assigned to a specific resource.
390	      * A foreign document ID is encapsulated in <UNIQUE-ID>.
391	      * References to resource fragments are made.
392	      * A client requests a resource in a specific format.

394	   The format indicates how to interpret encapsulated identifiers and
395	   MUST be supplied whenever foreign document identifiers are
396	   encapsulated. For example, if an HTTP URL was encapsulated, the
397	   PDI might look like:

399	   pdi://oma.eop.gov.us/1994/10/20/http%3a%2f%2fwww%2ewhitehouse%2egov%2f.html.1

401	   This PDI encapsulates the URL http://www.whitehouse.gov/ and denotes
402	   its content on October 20, 1994, when the site was unveiled.

404	   When a PDI contains fragment syntax, a format MUST be provided in
405	   order to convey the media type of the resource to which the
406	   fragment reference applies.

408	   A server may store any subset of formats for a resource. It may
409	   compute unstored formats on demand. A client can specify the desired
410	   format by using a PDI with the appropriate format field.

412	   If format is omitted, the identifier refers to the generic resource
413	   denoted by the PDI. Assertions about the generic resource apply to
414	   all the instantiations in the various media types indicated by the
415	   universe of format in which the resource is available.

417	2.6 Version

419	   The PDI <VERSION> is an optional component indicating a specific
420	   version of a resource. <VERSION> is a positive integer greater than
421	   0. When <VERSION> is omitted, it defaults to version 1.

423	   Version numbers refer to the generic resource and not the specific
424	   format, but a resource cannot have a version without having at
425	   least one format. When a resource is changed in any format, version
426	   numbers for all formats MUST be updated. In general, when a
427	   resource changes significantly, applications SHOULD generate new
428	   PDIs. When changes are small or incremental, applications SHOULD
429	   increment the version. Any change in the byte count of a resource
430	   for a specific <FORMAT> is a change and the version SHOULD be
431	   incremented. Addition of a new <FORMAT> with the same semantics as
432	   an existing <FORMAT> for the PDI is not a change and does not
433	   require the version to be incremented.

435	   Consequently, if an HTML document issues under

437	                 pdi://oma.eop.gov/1997/09/01.html.1

439	   ,and later, the HTML is converted to text, the PDI for the text
440	   version is

442	                 pdi://oma.eop.gov/1997/09/01.text.1

444	   However, if a spelling mistake is corrected later, whether or not
445	   it changes the byte count in any format, the version number is
446	   incremented.

448	                pdi://oma.eop.gov/1997/09/01.text.2

450	   An editing application MAY write internal versions of a document in
451	   progress and only commit to the versioned PDI at a point when the
452	   editing completed and the document is ready for release.

454	   Version numbers MUST be included when:

456	        * PDIs are minted and associated with specific resources.
457	        * PDIs contain a fragment references.
458	        * PDIs contain a fragment citation.

460	   Inclusion of a <VERSION> in a fragment references ensures that the
461	   fragment reference is resolved against a consistent machine
462	   representation of the resource.

464	3 Fragment Syntax

466	3.1 Motivation

468	   The PDI namespace provides an extensible syntax for referring to
469	   parts of resources. Fragment syntax must be extensible because:

471	        * There are too many existing media types.

473	        * Some media types require highly technical fragment syntax,
474	        (e.g., multidimensional points, multiresolution channels).

476	        * New media types are coming into existence all the time.

478	   The approach adopted here is to allow additional RFCs to extend
479	   fragment syntax by adding fragment specifiers as they are needed.

481	   The availability of a syntax for referring to resource fragments
482	   raises the problem of referring to citations of fragments by
483	   composite resources. The PDI namespace provides a fragment citation
484	   syntax to address this issue.

486	3.2 Philosophy

488	3.2.1 Media Representations

490	   A fragment syntax SHOULD differentiate the media representation from
491	   the machine representation. If fragment schemes for a particular
492	   media type use a media representation, they can be retargeted at new
493	   or different machine representations. Otherwise, fragment schemes may
494	   become unresolvable in the future when machine representations
495	   change. Consequently, although a byte fragment specifier is provided
496	   below, it SHOULD be used only for short-term purposes when
497	   alternatives are unavailable.

499	3.2.2  Immediate Fragments

501	   URNs require a fragment syntax because the alternative of interning
502	   every fragment PDI in a URN namespace does not scale. It requires
503	   the resolver to store potentially all possible permutations of the
504	   fragment specifier for every resource.  Immediate fragments require
505	   the fragment syntax to be part of the identifier.  With immediate
506	   fragments, resolvers need only store those fragment PDIs for which
507	   there are assertions beyond the binding to the resource subset.
508	   Additionally, immediate fragments enhance privacy by not storing
509	   all references to resource subsets. They also conserve storage and
510	   reduce computation on resolvers.

512	3.2.3 Fragment Conjunctions

514	   The fragment syntax does not support conjunctions of fragments
515	   because this introduces a source of ambiguity when assertions are
516	   made about PDIs.  Conjunctive fragments SHOULD be handled by creating
517	   a new PDI and asserting that it is the conjunction of some fragments.
518	   In this way, the set is explicitly represented and ambiguous
519	   references are excluded from the syntax.

521	3.2.4 Decoupling from Reference Mechanics

523	   Fragment reference could be accomplished by providing a program
524	   that given a resource return the specified part.  This is not the
525	   approach advocated here. The fragment scheme MUST be a minimal set
526	   of parameters required for a program to extract the relevant part.
527	   Additionally, these parameters SHOULD be specified in the order of
528	   importance for extracting the referent. This increases the
529	   probability of finding a referent if an identifier is accidently
530	   truncated. In general, new fragment specifiers SHOULD minimize the
531	   syntax the of invariants and parameters they require.

533	3.3 Fragment Scheme

535	   The <FRAGMENT-SCHEME> indicates the position syntax used in
536	   <POSITION>.  A default position scheme should be defined for each
537	   Content Type token used in PDIs. For example, text/plain uses
538	   character positions as the default.  The <FRAGMENT-SCHEME> MAY be
539	   omitted when it is the default position scheme for the content type
540	   indicated by <FORMAT>. In all other circumstances,
541	   <FRAGMENT-SCHEME> MUST be supplied in order to ensure unambiguous
542	   interpretation of position specifiers. Position schemes are case
543	   insensitive.

545	3.4 Fragment Specifiers

547	   The following position reference schemes have been defined:

549	3.4.1 Text Fragment Specifier

551	   Text fragments are defined for the MIME Content Type text/*.  Each
552	   text fragment is an interval bounded by two character positions in
553	   the resource. The fragment is the set of characters from <START>
554	   upto but excluding <END>. The first character position starts with
555	   0.  Character positions are relative to the canonical,CRLF encoded
556	   text for the resource.  Therefore, all text/* resources MUST be
557	   CRLF encoded to ensure correct fragment references.  The PDI
558	   <FORMAT> for text/plain is "text" and <CHAR-FRAGMENT-SPECIFIER> is
559	   the default position specifier for the media type.

561	        <CHAR-FRAGMENT-SPECIFIER> = "#" ["char" "="] start-char
562	                                        "," end-char

564	        <START-CHAR> = digits

566	        <END-CHAR> = digits

568	   Although wide-spread encodings for many alphabets use a single 8 bit
569	   byte (e.g., ISO-8859 [15]), other encodings (e.g., unicode) employ
570	   multi-byte encodings. Consequently, a server MUST be aware of the
571	   character set used to encode a text resource.  For 8 bit character
572	   sets, char fragment resolution reduces to byte position.  However,
573	   multi-byte character sets require the server to perform appropriate
574	   translation from the stored data representation.

576	   The following PDI refers to the text starting at character 37 and
577	   continuing upto but excluding character 51.

579	            pdi://oma.eop.gov.us/1997/09/01/1.text.1#char=37,51

581	   Since the default fragment specifier for text is
582	   <CHAR-FRAGMENT-SCHEME>, the following PDI is equivalent:

584	            pdi://oma.eop.gov.us/1997/09/01/1.text.1#37,51

586	   When a text/plain content type uses a multi-byte character set,
587	   <FORMAT> MUST be the character set token as defined by the IANA
588	   Character Set Registry [18].

590	3.4.2 HTML Fragment Specifier

592	   Fragments may be specified for the MIME Content Type text/html using
593	   character fragment specifiers. The PDI <FORMAT> for text/html is
594	   "html".  The default position specifier for text/html is "char"
595	   because it simplifies serving fragments.

597	   Although character references are simple and effective for HTML
598	   document fragments, it is often more convenient to use HTML
599	   elements to delimit an interval within a document.  Specific HTML
600	   elements can be identified using the name parameter value or the
601	   position of the tag in the document. In either case, the fragment
602	   consists of all text and HTML tags from <START-ELEMENT> to and
603	   including <END-ELEMENT>.  References to HTML containers is
604	   facilitated by use of a closed interval, but it can be awkward for
605	   tags that are not explicitly closed, especially if they are
606	   implicitly closed (e.g., <p>). Tag positions are counted from the
607	   start of the resource, with the first being assigned 0. An
608	   <ELEMENT-NAME> refers to the first element whose name parameter
609	   value is equal to <ELEMENT-NAME>, which must be encoded according
610	   to URN syntax [17], but decoded for case-sensitive equality testing.

612	        <HTML-FRAGMENT-SPECIFIER> = "#" start-element "," end-element

614	        <HTML-FRAGMENT-SCHEME> = char-fragment-scheme /
615	                                  element-fragment-scheme /
616	                                  named-fragment-scheme

618	        <ELEMENT-FRAGMENT-SCHEME> = "elt"

620	        <START-ELEMENT> = element-position / element-name

622	        <END-ELEMENT> = element-position / element-name

624	        <ELEMENT-POSITION> = digits

626	        <NAMED-FRAGMENT-SCHEME> = "name"

628	        <ELEMENT-NAME> = urn-chars

630	   Char, elt, and name position references MUST use the same position
631	   scheme for <START-ELEMENT> and <END-ELEMENT> an HTML fragment
632	   reference.

634	   HTML fragments may depend on surrounding context that is not part
635	   of the fragment. HTML rendition without this containing context may
636	   produce different effects or incorrect HTML. Responsibility for
637	   assuring legal and felicitous HTML must reside with the user or
638	   application creating the fragment reference because document
639	   authors cannot be expected to anticipate all possible citations.
640	   Therefore, the user or application creating the fragment citation
641	   MUST NOT create illegal HTML fragments.

643	   When fragments require context, the user or application MAY create
644	   an intermediate document that uses fragment references to extract
645	   both the relevant context and the target fragment.  This
646	   intermediate document SHOULD be legal HTML capable of standing
647	   alone.

649	3.4.2 SGML & XML Fragment Specifier

651	   The element and char fragment schemes can be applied to the more
652	   general Standard Generalized Markup Language (SGML) [14] and
653	   Extensible Markup Language XML [4] mark up languages, of which it
654	   is a subset.  The BNF below give the fragment specification for
655	   SGML and any subsets, such as XML.

657	        <SGML-FRAGMENT-SPECIFIER> = "#" sgml-start-element ","
658	                                        sgml-end-element

660	        <SGML-FRAGMENT-SCHEME> = element-fragment-scheme /
661	                                 char-fragment-scheme

663	        <SGML-START-ELEMENT> = element-position

665	        <SGML-END-ELEMENT> = element-position

667	   The default fragment specifier for SGML and SGML subsets is "char".
668	   The following content tokens are defined:

670	        text/sgml               sgml
671	        text/xml                xml

673	   The context caveats for HTML fragments should be extended pari pasu
674	   to SGML and XML fragments.

676	3.4.5 Image Fragment Specifier

678	   Image media types use a variety of encoding schemes and some
679	   include multiple frames. Fragment reference for image/* uses a two
680	   dimensional cartesian coordinate system with the origin (0, 0) being
681	   in the upper left hand corner. The scale of the coordinate system is
682	   the pixel level scale of the containing image. References to
683	   subrectangles are made by specifying for the image fragment the
684	   <START-COORDINATE> as the upper left most point and <END-COORDINATE>
685	   as the lower right most point. These x and y coordinates are in
686	   coordinate system of the containing image. When multiple frames are
687	   present in an image, the reference frame is specified by providing
688	   <FRAME>, which is 0 based and defaults to 0 when omitted.
689	   <RECTANGLE-FRAGMENT-SPECIFIER> is the default fragment specifier for
690	   the media types image/*.

692	        <RECTANGLE-FRAGMENT-SPECIFIER> = # ["rect" "="] start-coordinate
693	                                           "," end-coordinate ["," frame]

695	        <FRAME> = digits

697	        <START-COORDINATE> = 2-dim-coordinate

699	        <END-COORDINATE> = 2-dim-coordinate

701	        <2-DIM-COORDINATE> = "(" x-coordinate "," y-coordinate ")"

703	        <X-COORDINATE> = digits

705	        <Y-COORDINATE> = digits

707	   The example below refers to an image fragment whose origin is x=5,
708	   y=10 and extends to x=25, y=30. This yields the maximal rectangle
709	   including the coordinates (5,10), (24,10), (24,29), (5,29). Note that
710	   the zero-based coordinate system does not include the point denoted by
711	   the <END-COORDINATE>.

713	        pdi://images.satellite.nasa.gov.us/1997/09/30/1234.gif#(5,10),(25,30)

715	   Since frame is unspecified, it defaults to zero and this PDI is equivalent to

717	        pdi://images.satellite.nasa.gov.us/1997/09/30/1234.gif#(5,10),(25,30),0

719	   The next PDI refers to the third frame in an animated GIF. As it
720	   simplifies array references, the zero-based index shifts references
721	   to the left by 1.

723	        pdi://images.satellite.nasa.gov.us/1997/09/30/1234.gif#(5,10),(25,30),2

725	3.4.4 Audio Fragment Specifier

727	   Audio media types use various encoding schemes (including variable
728	   quality) that make byte ranges problematic for fragment references.
729	   Start and end times provide a coordinate scheme that can be resolved
730	   for any audio media type. A fragment reference includes data from and
731	   including <START-TIME> upto and excluding <END-TIME>. The position
732	   scheme for the temporal reference gives the time units. Two time
733	   position schemes are defined. "msec" is millesconds and "sec" is
734	   seconds.  Temporal position schemes MUST NOT be intermixed. The
735	   default time position scheme for audio/* is "sec".
736	   <TIME-FRAGMENT-SPECIFIER> is the default fragment specifier for
737	   audio/*.

739	        <TIME-FRAGMENT-SPECIFIER> = # time-position-scheme "="
740	                                      start-time "," end-time

742	        <TIME-POSITION-SCHEME> = "msec" / "sec" /
743	                                      ext-time-position-scheme

745	        <START-TIME> = time

747	        <END-TIME> = time

749	        <TIME> = digits

751	        <EXT-TIME-POSITION-SCHEME> = alpha-hyphen-digits

753	   The example below denotes the audio clip extending from second 23
754	   upto but not including second 57.

756	        pdi://audio.npr.org.us/1997/09/30/1234.au#sec=23,57

758	3.4.5 Video Fragment Specifier

760	   Video media types combine difficulties similar to those presented by
761	   audio and image media types.  A simple syntax for video should allow
762	   fragment references to video by start and end times. Because some
763	   applications may wish to crop the image, an optional x-y coordinate
764	   framework is also supported.

766	   The video fragment specifier uses a required time component and an
767	   optional pair of coordinates to denote cropping. A video fragment
768	   reference includes data from and including <START-TIME> upto and
769	   excluding <END-TIME> in time units given by <TIME-POSITION-SCHEME>.

771	   When cropping is desired, a fragment may include the optional
772	   <START-COORDINATE> and <END-COORDINATE>.

774	        <VIDEO-FRAGMENT-SPECIFIER> = # "crop" "=" time-position-scheme
775	                                       "," start-time "," end-time
776	                                       ["," start-coordinate "," end-coordinate]

778	   The following example refers to seconds 23 upto 51 of the video clip
779	   1234.mpeg.

781	        pdi://video.cnn.co.us/1997/09/30/1234.mpeg.1#sec,23,51

783	   The next PDI uses the crop fragment scheme and is equivalent to the
784	   preceding one because no cropping is specified.

786	        pdi://video.cnn.co.us/1997/09/30/1234.mpeg.1#crop=sec,23,51

788	   However, the crop scheme is easily able to specify a cropping, such
789	   as the one below.

791	        pdi://video.cnn.co.us/1997/09/30/1234.mpeg.1#crop=sec,23,51,(10,10),(20,20)

793	   Here, only the rectangle from origin (10,10) to (20,20), the right
794	   lowest point, is included in the fragment.

796	   The video fragment scheme presupposes a pixel-based imaging model.
797	   For other models, such as line-oriented analog video, specialized
798	   fragment schemes may be appropriate when cropping is desired.  Note
799	   that the time fragment scheme works for analog models. For this
800	   reason, the time position scheme <TIME-FRAGMENT-SPECIFIER> is the
801	   default fragment specifier for video/*.

803	3.4.6 Octet Fragment Specifier

805	   In general, long-lived fragment specifiers seek to avoid schemes
806	   that depend on the underlying data representation because of low
807	   generality and high probably of future failure when the data
808	   representation is obsolesced by future developments.  The byte
809	   fragment scheme is provided as short-term solution that SHOULD be
810	   superseded by defining a new fragment specifier. In any event, byte
811	   fragments provide a fallback scheme for use until a new extension
812	   can be introduced.

814	   The content token for <BYTE-FRAGMENT-SPECIFIER> is "byte". The
815	   fragment includes <START-BYTE> and all intervening bytes upto
816	   <END-BYTE> which is excluded from the fragment reference.

818	        <BYTE-FRAGMENT-SPECIFIER> = "#" "byte" "=" start-byte "," end-byte

820	        <START-BYTE> = digits

822	        <END-BYTE> = digits

824	   The fragment PDI below refers to the byte sequence starting with byte
825	   23 and continuing upto but excluding byte 57.

827	        pdi://documentation.adobe.co.us/1997/09/30/1234.pdf#byte=23,57

829	   These byte indices are compatible with byte ranges used in HTTP 1.1
830	   [9].

832	3.5 Fragment Citation

834	3.5.1 Motivation

836	   Once fragments can be specified, documents can move from a
837	   cut-and-paste model to an inline reference model, where the
838	   fragment is served from an origin site.  Composite documents are
839	   those that combine fragments using inline references. One reason
840	   for preferring inline fragment references to cut-and-paste is that
841	   the etiology of the fragment is preserved. Thus, meta-data such as
842	   digital signatures can be carried forward and a document consumer
843	   can easily follow references back to sources to examine original
844	   contexts.

846	   An example might be citing some sentences from Presidential policy
847	   speech in a decision memorandum. Authenticity can be checked
848	   because the original document and its digital signature are
849	   available.  But, what if the citation was out of context?  Someone
850	   else could use the citation syntax to refer to the citation of a
851	   fragment by the composite document and assert that the fragment
852	   citation was out of context as it built an argument against the
853	   logic of the decision memorandum.

855	3.5.2 Discussion

857	   Reference of a fragment as cited in a composite document is
858	   accomplished by using a position specifier to give the
859	   <ORIGIN-POSITION> in the composite document. The cited fragment
860	   begins at the origin position and continues for the full extent of
861	   the fragment. Thus, <ORIGIN-POSITION> provides the alignment in the
862	   citing document while the dimensions or extent is carried by the
863	   fragment reference.  By keeping knowledge of the dimensions of the
864	   fragment out of the coordinate framework of the composite document,
865	   any position scheme can use the same generic citation syntax.

867	   The origin position is defined as the point closest to the start of
868	   the document. In a media type structured as a single sequence of
869	   characters, the origin position is character 0.  In a three
870	   dimensional cartesian coordinate systems, the origin position is
871	   0,0,0. When each content type token is made available for fragment
872	   citation, the origin position MUST be defined.

874	   The following identifier

876	   pdi://oma.eop.gov.us/1997/11/03/4.text.1@103=pdi://oma.eop.gov.us/1997/09/01/1.text.1#37,51

878	   refers to the citation of the fragment

880	        pdi://oma.eop.gov.us/1997/09/01/1.text.1#37,51

882	   by the document,

884	        pdi://oma.eop.gov.us/1997/11/03/4.text.1

886	   The citing document PDI uses the fragment text from character 37 to
887	   51 starting from character 103. If the fragment PDI had no fragment
888	   specifier, the entire document would appear starting at position 103.

890	3.6 Operations on PDIs

892	   The three classes of operations on PDIs have slightly different
893	   characteristics.

895	3.6.1 Minting

897	   When PDIs are minted, they MUST carry a format and a version number
898	   and they MUST NOT contain any wildcards. This ensures that the
899	   identifiers associated with digital resources are fully specified
900	   and convey both the media type and the version number. The presence
901	   of a version number makes it possible to check for a higher version
902	   of the resource.  Together, the format and version number enable
903	   fragment citation.

905	3.6.2 Binding

907	   Once minted, a PDI can be bound to directly to a resource or
908	   indirectly via a Uniform Resource Indicator (URI) [2], which may
909	   often be a URL. Binding to a resource commits the PDI to a specific
910	   sequence of bytes. Therefore, a URI that is indirectly bound to a
911	   PDI MUST NOT change. If the URI changes, the indirect binding MUST
912	   be broken, and the original machine representation associated
913	   directly with the PDI.

915	3.6.3 Resolution

917	   PDIs may be resolved to URLs, or other locators using recent URN
918	   resolution standards, such as THTTP described by RFC 2169 [10] and
919	   these resolvers may be discovered using the DNS extensions for
920	   Naming Authority PoinTeR (NAPTR) described by RFC 2168 [8].  As
921	   these experimental resolution standards evolve, or new ones are
922	   introduced, PDI resolution can track any new standards precisely
923	   because a URN namespace is independent of the method used to resolve
924	   them.

926	   When requesting a PDI from a URN resolver, the omission of a
927	   version number is a request for the resolver to return to the
928	   highest version available for the PDI. When defaulting a PDI to the
929	   highest version, the resolver MUST indicate to the client the fully
930	   qualified PDI associated with the media object.  For THTTP, when a
931	   server returns an entity, the server MUST include a content
932	   location header [9] containing the fully-qualified PDI.  This
933	   allows the client to associate the entity body with the versioned
934	   PDI that specifically identifies it.

936	         When submitting a PDI to a URN resolver, wildcards may be used
937	   to obtain information for sets of PDIs. However, the set of returned
938	   PDIs MAY be complete only with regard to the specific knowledge about
939	   a document series available to a resolver at the time.

941	3.6.4 Lexical Equivalence

943	   PDIs can be lexically compared for equivalence after they are
944	   converted to canonical form using the following procedure:

946	        1.Unescape all escaped characters that are within <URN-CHARS> but which
947	        are not <RESERVED-CHARACTERS>.

949	        2. Downcase the two <HEX> characters following the escape character "%".

951	        3. Downcase all PDI components except <UNIQUE-ID> and <POSITION>.

953	        4. If it is an encapsulated identifier, canonicalize
954	        <UNIQUE-ID> according the rules for the foreign identifier.

956	        5. If <POSITION> is case-insensitive, as indicated by <FRAGMENT-SCHEME>,
957	        downcase position.

959	   Wildcards carry a semantic interpretation that is not relevant for
960	   lexical equivalence. Two PDIs are lexically equal if and only if any
961	   wildcards appear in exactly the same positions in both.

963	3.6.5 Assertions

965	   With the advent of URN standards, networked assertion infrastructures
966	   can now associate assertions (meta-data) with the unique identifier
967	   of a resource rather than replicating that information with every
968	   instance of a resource and creating a variety of synchronization
969	   problems.  On this view, each URN namespace may also become an
970	   assertion domain with specific semantics.

972	   A document series thus becomes an address space in which each PDI
973	   serves as a pointer. Assertions about PDIs are assertions about the
974	   digital objects or fragments to which they refer. The ability to make
975	   and retrieve assertions about PDIs provides a means to associate
976	   meta-data with digital objects.  For example, collaboration systems
977	   may use a typed link semantics to structure resources in meaningful
978	   ways.  Alternatively, security systems may assert digital signatures
979	   or other trust information about PDIs. For example, a digital
980	   signature may be attached to some mobile code to check its integrity
981	   regardless of its proximate source. Access control systems might even
982	   assert differential access to various components of a single digital
983	   object.

985	   Several groups are developing standards for associating meta-data
986	   with resources [11] [16]. These approaches are currently evolving and
987	   as yet lack a suitable persistent identifier model.  URNs [17] [19]
988	   [20] and specifically PDIs offer a suitable persistent identifier for
989	   use with assertion schemes.  The ability to refer to tokens and
990	   relations as first-class objects will dramatically simplify equality
991	   testing and significantly enhance the power and flexibility of the
992	   emerging standards for wide-area assertion infrastructures.

994	3.7 Registering New Fragment Specification Schemes

996	   [TBD: This section will provide rules for registering new fragment
997	   specification schemes with IANA.]

999	3.8 Interpreting PDIs as Uniform Resource Locators

1001	   Current URN resolver discovery [8] and URN resolution [7] standards
1002	   are experimental and lack wide deployment.  As these standards
1003	   evolve and become more wide spread, an interim resolution strategy
1004	   using existing servers and clients for HTTP may prove useful.  This
1005	   section defines use of PDIs as URLs [1] with standard HTTP servers.

1007	   PDIs MAY be interpreted as Uniform Resource Locators (URL) and MAY
1008	   be used in contexts where URLs are appropriate, such as with HTTP
1009	   servers. When interpreting PDIs as URLs, PDIs do not carry URN
1010	   prefix.

1012	                <URL> = pdi             ; Encoding in URL syntax

1014	   In all other ways, the syntax and semantics of PDIs remains exactly
1015	   the same as under the URN namespace interpretation. However,
1016	   resolver discovery and resolution under the URL interpretation are
1017	   somewhat different.

1019	3.8.1 Resolver Discovery

1021	   While an appropriate resolver for a PDI document series SHOULD be
1022	   found using current standards [8], an HTTP resolver MAY also be
1023	   discovered heuristically by interpreting the document series as a
1024	   domain name and looking up an IP address associated with the name.
1025	   If a PDI-aware HTTP server is operating at the IP address and it
1026	   successfully answers HTTP requests on the document series, then
1027	   assume that a resolver for the document series has been located.

1029	   A client MAY also use new resolver discovery standards as they
1030	   emerge or out of band methods to locate a resolver for a document
1031	   series.

1033	   Since the heuristic discovery method provides no indication
1034	   concerning the completeness or authority of the resolver, server
1035	   operators SHOULD ensure that any server providing PDI resolution
1036	   has complete knowledge of the document series, whether served
1037	   locally or proxied from remote servers.  Thus, if a server can
1038	   resolve one PDI in a document series, the server operator
1039	   guarantees the assumption that the server has sufficient knowledge
1040	   to resolve any PDI in the series. An HTTP 1.1 or higher client MAY
1041	   check whether an HTTP server resolves a PDI document series by
1042	   issuing a HTTP request using the OPTIONS method on the PDI.

1044	3.8.2 Resolution Methods

1046	   Since the HTTP standard [9] provides for use of any URI [2] in HTTP
1047	   requests, no immediate extensions to the HTTP standard are required
1048	   for PDI resolution.  Requests for PDIs are just like requests for
1049	   URLs, except that there are no relative PDIs; the scheme and
1050	   document series MUST always be provided.  Fully-specified PDIs
1051	   allow the server to distinguish PDIs from ordinary URLs using the
1052	   HTTP scheme. Servers SHOULD invoke augmented parsing for PDIs, for
1053	   example, as necessary for fragment resolution.

1055	        <HTTP-REQUEST-LINE> = <method> " " <pdi> " " <http-version>
1056	                              <crlf>

1058	   HTTP provides a series of methods on URIs. Below we define the
1059	   operations for each method in HTTP 1.1 with respect to PDIs.

1061	        GET Resolves the PDI and returns the resource.

1063	        HEAD Returns the meta-data of the PDI as headers.

1065	        OPTIONS Returns the HTTP operations supported for the PDI.

1067	        TRACE returns information on the path to the origin server
1068	        through proxies.

1070	   Non-idempotent HTTP methods interact with PDI semantics and require
1071	   special handling.

1073	        PUT associates a resource with a PDI. If the PDI already
1074	        exists, the server MUST increment the PDI version number and
1075	        store the resource under this new version. If the PDI does not
1076	        exist, the server SHOULD return 404 "Not Found". If the client
1077	        wishes to assign a new PDI, the PUT request MUST indicate the
1078	        document series by providing a partial PDI:

1080	        <DOCUMENT-SERIES-IDENTIFIER> = "pdi://" document-series "/"

1082	        When a server receives a <document-series-identifier> as the
1083	        URI in a PUT method, it MUST assign a unique PDI within the
1084	        document series using the current <ISO-DATE>, a daily
1085	        <UNIQUE-ID>, an appropriate <FORMAT>, and a <VERSION> equal to
1086	        1.

1088	        DELETE is not defined for PDIs as they are long-lived,
1089	        persistent identifiers. An HTTP server MUST return 405 "Method
1090	        Not Allowed".

1092	   When an HTTP method is applied to an unknown PDI, the server MUST
1093	   return 404 "Not Found."

1095	   For all HTTP methods, whenever the server is unable to apply the
1096	   method to an existing resource or to assign a new PDI, it MUST
1097	   return 405 "Method Not Allowed."

1099	3.8.3 Proxying HTTP Resolution

1101	   When a server is proxying PDI operations to an upstream HTTP
1102	   server, it MUST pass through all requests and responses.  The
1103	   server contributes its knowledge of an origin server that can
1104	   resolve the PDI.  A server unable or not wishing to proxy PDI
1105	   operations but aware of a server capable of handling the operations
1106	   SHOULD redirect the client to the PDI-capable server.

1108	3.8.4 Proxying THTTP Resolution

1110	   When the upstream server implements URN resolution, the proxy
1111	   SHOULD perform protocol translation. For THTTP [7], the server
1112	   SHOULD perform the following request translations.

1114	                Request         Proxy Request

1116	                GET <PDI>       GET "/uri-res/N2R?urn:" <PDI>

1118	                HEAD <PDI>      GET "/uri-res/N2C?urn:" <PDI>

1120	   Responses from the THTTP resolver SHOULD be passed through.  For
1121	   HTTP methods other than GET and HEAD, the server MUST return a 405
1122	   "Method Not Allowed"

1124	4. History

1126	   Persistent Document Identifiers (PDI) were developed in early 1994 in
1127	   order to provide persistent, location-independent identifiers for
1128	   electronic publications. PDIs were deployed in Fall, 1994 when the
1129	   second White House Electronic Publications System [21] was brought
1130	   online.  Every document published by the system since January 20,
1131	   1993 now carries a PDI.  An identifier that was independent of
1132	   protocol and transport proved extremely useful in managing this
1133	   document set and resolving delivery failures. The publications server
1134	   currently issues PDIs and resolves them using THTTP URN resolution
1135	   [7].

1137	   The White House publications are an excellent example of documents
1138	   that are monotonic (not subject to revision), widely mirrored, and
1139	   subject to eventual relocation.  Documents are never revised after
1140	   they issue.  Instead, they may be superseded by a corrected versions,
1141	   but corrections are limited to transcription errors.  Many sites
1142	   around the world archive and redistribute the documents.  These
1143	   include major online services, libraries, advocacy groups, government
1144	   entities.  As of March 1996, it was estimated that about one million
1145	   people around the world read at least some part of a document during
1146	   the course of a week.  At the end of an administration, the White
1147	   House ceases to serve a former President's documents and they must be
1148	   relocated to the National Archives, and normally, to his Presidential
1149	   library.  In sum, the White House documents are mirrored in many
1150	   locations from which they may be obtained and the primary document
1151	   repository must move after a period of time.

1153	   Persistent Document Identifiers were also used in an advanced
1154	   experiment in large-scale, asynchronous collaboration during the Vice
1155	   President's Open Meeting on Government Reinvention in December 1994.
1156	   [12] In the Open Meeting, PDIs not only identified resources but they
1157	   also associated a collaboration semantics with resources. A variety
1158	   of meta-data and annotations were attached to resources via PDIs.
1159	   More generally, PDIs were used to build the persistent semantic
1160	   network around resources that allowed arbitrary assertions using
1161	   first-class links, each with their own PDIs. Both the document
1162	   database and the semantic network were mirrored at another site using
1163	   PDIs to align all structures.

1165	5. Acknowledgments

1167	   This specification was improved by comments from Andrew J.
1168	   Blumberg, Mitchell N. Charity, Ron Daniel jr., Henrik Frystyk
1169	   Nielsen, Jerry Saltzar, Karen R.  Sollins, Christopher R.  Vincent.
1170	   This specification was revised and extended from an earlier draft
1171	   dated December, 1994.

1173	   This specification describes research done at the Artificial
1174	   Intelligence Laboratory of the Massachusetts Institute of
1175	   Technology.  Support for the M.I.T. Artificial Intelligence
1176	   Laboratory's artificial intelligence research is provided in part
1177	   by the Defense Advanced Research Projects Agency of the Department
1178	   of Defense under contract numbers MDA972-93-1-003N7 and
1179	   F30602-97-2-0239.

1181	6. References

1183	   [1] T. Berners-Lee, L. Masinter, M. McCahill, ``Uniform Resource
1184	   Locators (URL)'', RFC 1738, December, 1994.
1185	   http://ds.internic.net/rfc/rfc1738.txt

1187	   [2] T. Berners-Lee, R. Fielding, L. Masinter, "Uniform Resource
1188	   Identifiers (URI): Generic Syntax and Semantics", Internet Draft
1189	   (work in progress), November 5, 1997.

1191	   [3] N. Borenstein, N. Freed, "MIME (Multipurpose Internet Mail
1192	   Extensions) Part One: Mechanisms for Specifying and Describing the
1193	   Format of Internet Message Bodies", RFC 1521, September 1993.
1194	   http://ds.internic.net/rfc/rfc1521.txt

1196	   [4] T. Bray, J. Paoli, C. M. Sperberg-McQueen, ``Extensible Markup
1197	   Language (XML),'' World Wide Web Consortium, August, 1997,
1198	   http://www.w3.org/TR/WD-xml

1200	   [5] T. Bray, S. DeRose, ``Extensible Markup Language (XML): Part
1201	   2. Linking,'' World Wide Web Consortium, July, 1997.
1202	   http://www.w3.org/TR/WD-xml-link

1204	   [6] D. Crocker, P. Overell, "Augmented BNF for Syntax
1205	   Specifications: ABNF," Internet Draft (work in progress), January
1206	   1997.

1208	   [7] R. Daniel, "A Trivial Convention for using HTTP in URN
1209	   Resolution," RFC 2169, June, 1997.
1210	   http://ds.internic.net/rfc/rfc2169.txt

1212	   [8] R. Daniel, M. Mealling, "Resolution of Uniform Resource
1213	   Identifiers using the Domain Name System," RFC 2168, June, 1997.
1214	   http://ds.internic.net/rfc/rfc2168.txt

1216	   [9] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, T. Berners-Lee,
1217	   "Hypertext Transfer Protocol -- HTTP/1.1," RFC 2068, January, 1997.
1218	   http://ds.internic.net/rfc/rfc2068.txt

1220	   [10] N. Freed, J. Klensin, J. Postel, "Multipurpose Internet Mail
1221	   Extensions (MIME) Part Four: Registration Procedures", RFC 2048,
1222	   November, 1996.  http://ds.internic.net/rfc/rfc2048.txt
1223	   Registered media types can be found at:
1224	   ftp://ftp.isi.edu/in-notes/iana/assignments/media-types/media-types

1226	   [11] R. V. Guha, T. Bray, "Meta Content Framework Using XML," W3C
1227	   Technical Note NOTE-MCF-XML, June, 1997.

1229	   [12] R. Hurwitz, J. C. Mallery, ``The Open Meeting: A Web-Based
1230	   System for Conferencing and Collaboration,'' Proceedings of The
1231	   Fourth International Conference on The World-Wide Web, December 12,
1232	   1995. Also in Web Journal, Winter, 1996, 1(1): 17-36.

1234	   [13] ISO 3166:1988 (E/F) - Codes for the representation of names of
1235	   countries - The International Organization for Standardization, 3rd
1236	   edition, 1988-08-15. ISO 3166 country codes can be found at:
1237	   ftp://ftp.isi.edu/in-notes/iana/assignments/country-codes

1239	   [14] ISO 8879 Information Processing -- Text and Office Systems --
1240	   Standard Generalized Markup Language (SGML), ISO 8879:1986.  For
1241	   the list of SGML entities, consult
1242	   ftp://ftp.ifi.uio.no/pub/SGML/ENTITIES/.

1244	   [15] ISO-8859. International Standard -- Information Processing --
1245	     8-bit Single Byte Coded Graphic Character Sets --
1246	     Part 1: Latin alphabet No. 1, ISO 8859-1:1987.
1247	     Part 2: Latin alphabet No. 2, ISO 8859-2, 1987.
1248	     Part 3: Latin alphabet No. 3, ISO 8859-3, 1988.
1249	     Part 4: Latin alphabet No. 4, ISO 8859-4, 1988.
1250	     Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988.
1251	     Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987.
1252	     Part 7: Latin/Greek alphabet, ISO 8859-7, 1987.
1253	     Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988.
1254	     Part 9: Latin alphabet No. 5, ISO 8859-9, 1990.

1256	   [16] O. Lassila, R. R. Swick, "Resource Description Framework
1257	   (RDF): Model and Syntax," World Wide Web Consortium, Technical Note
1258	   RDF-Syntax (work in progress), August, 1997.

1260	   [17] R. Moats, "URN Syntax," RFC 2141, May 5, 1997.
1261	   http://ds.internic.net/rfc/rfc2141.txt

1263	   [18] J. Reynolds & J. Postel, ``Assigned Numbers,'' STD 2, RFC 1700,
1264	   October, 1994.

1266	   [19] K. Sollins, "Architectural Principles of Uniform Resource Name
1267	   Resolution," Internet Draft (work in progress), July, 1997.

1269	   [20] K. Sollins, L. Masinter, "Functional Requirements for Uniform
1270	   Resource Names," RFC 1737, December, 1994.
1271	   http://ds.internic.net/rfc/rfc1737.txt

1273	   [21] White House Electronic Publications System,
1274	   http://www.pub.whitehouse.gov

1276	6. Author's Address

1278	   John C. Mallery
1279	   Artificial Intelligence Laboratory
1280	   Massachusetts Institute of Technology
1281	   545 Technology Square, NE43-797
1282	   Cambridge, MA 02139 USA
1283	   Email: JCMa@ai.mit.edu
1284	   Phone: 617-253-5966