idnits 2.17.1 draft-mallery-urn-pdi-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 1296 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 25 instances of too long lines in the document, the longest one being 22 characters in excess of 72. ** The abstract seems to contain references ([21], [17], [12]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 223: '... MUST be encoded according to the ch...' RFC 2119 keyword, line 243: '...ument issues, it MUST increment the ve...' RFC 2119 keyword, line 260: '... MUST be a two digit ISO 3166 country code [10],...' RFC 2119 keyword, line 262: '...DOCUMENT-SERIES> SHOULD add a term to ...' RFC 2119 keyword, line 277: '... MAY utilize existing organizational...' (69 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 10, 1997) is 9664 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '5' is defined on line 1200, but no explicit reference was found in the text == Unused Reference: '13' is defined on line 1234, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1738 (ref. '1') (Obsoleted by RFC 4248, RFC 4266) -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 1521 (ref. '3') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' -- Possible downref: Non-RFC (?) normative reference: ref. '6' ** Downref: Normative reference to an Historic RFC: RFC 2169 (ref. '7') ** Obsolete normative reference: RFC 2168 (ref. '8') (Obsoleted by RFC 3401, RFC 3402, RFC 3403, RFC 3404) ** Obsolete normative reference: RFC 2068 (ref. '9') (Obsoleted by RFC 2616) ** Obsolete normative reference: RFC 2048 (ref. '10') (Obsoleted by RFC 4288, RFC 4289) -- Possible downref: Non-RFC (?) normative reference: ref. '11' -- Possible downref: Non-RFC (?) normative reference: ref. '12' -- Possible downref: Non-RFC (?) normative reference: ref. '13' -- Possible downref: Non-RFC (?) normative reference: ref. '14' -- Possible downref: Non-RFC (?) normative reference: ref. '15' -- Possible downref: Non-RFC (?) normative reference: ref. '16' ** Obsolete normative reference: RFC 2141 (ref. '17') (Obsoleted by RFC 8141) ** Obsolete normative reference: RFC 1700 (ref. '18') (Obsoleted by RFC 3232) -- Possible downref: Non-RFC (?) normative reference: ref. '19' ** Downref: Normative reference to an Informational RFC: RFC 1737 (ref. '20') -- Possible downref: Non-RFC (?) normative reference: ref. '21' Summary: 22 errors (**), 0 flaws (~~), 4 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet-Draft J. C. Mallery 3 draft-mallery-urn-pdi-00.txt M.I.T. 4 Expires in six months November 10, 1997 6 Persistent Document Identifiers 7 Filename: draft-mallery-urn-pdi-00.txt 9 Status of This Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its 13 areas, and its working groups. Note that other groups may also 14 distribute working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet- 19 Drafts as reference material or to cite them other than as ``work 20 in progress.'' 22 To learn the current status of any Internet-Draft, please check 23 the ``1id-abstracts.txt'' listing contained in the Internet- 24 Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net 25 (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East 26 Coast), or ftp.isi.edu (US West Coast). 28 Abstract 30 This document specifies the syntax and semantics of the Persistent 31 Document Identifier (PDI) namespace within the URN framework 32 defined by RFC 2141 [17]. PDIs provide a means to refer to digital 33 objects and fragments that does not depend their storage location 34 or the protocol used to access them. Since 1994, several 35 large-scale applications with these requirements have used PDIs 36 [12] [21]. 38 PDIs are intended primarily as permanent identifiers for archival 39 reference to long-lived documents. PDIs have a fragment syntax to 40 allow permanent references to parts of documents (within specific 41 formats) as well as a citation syntax to allow references to 42 appearances of such fragments in composite documents. 44 PDIs are most useful for any document series that is distributed via 45 multiple protocols, is available from multiple sources, migrates to 46 new locations, needs fragment references, or participates in 47 distributed assertion semantics related to collaboration or access 48 control. 50 1. Namespace Syntax 52 1.2 Design Goals 54 Persistent Document Identifiers provide a means to refer to digital 55 objects and fragments that does not depend their storage location 56 or the protocol used to access them. PDIs offer the following 57 capabilities: 59 * Multisourcing: The same resource can be stored in different 60 locations yet retrieved by a virtue of a shared identifier. 62 * Multiple Protocols: Identifiers are not tied to specific 63 transport protocols. 65 * Persistence: PDIs persist across relocation of a digital 66 object to different storage sites. The longevity of a PDI is 67 not limited by lifetime of a directory, domain name, or even, 68 a transport protocol. 70 * Organizational Delegation: PDIs define a hierarchical 71 encoding of the issuing authority that allows delegation in a 72 manner analogous to names in the Domain Name System names but 73 more akin to X.400. 75 * Chronological Delegation: PDIs incorporate a time hierarchy 76 that allows delegation of identifiers with different time 77 ranges to different authorities or to different resolution 78 regimes. 80 * Fragment Syntax: PDIs offer an extensible syntax for 81 referring to part of a resource. This evolutionary approach 82 allows different schemes according to media type as well as 83 multiple schemes per media type. Longevity of reference is 84 sought by defining fragment schemes that are independent of 85 machine representation. Referential consistency is guaranteed 86 by monotonic commitment of versioned PDIs to immutable 87 resource representations. 89 * Citation Syntax: PDIs include a syntax for referring to 90 appearances of document fragments as quoted in other composite 91 documents. This makes fragment quotations first-class objects, 92 about which assertions can be made. 94 * User Friendly: PDIs carry a relatively simple syntax with 95 some mnemonics so that, if need be, people can type them to 96 access a resource. 98 A guiding design principle for PDIs is to minimize the document 99 semantics carried within the identifier. Most semantics is better 100 encoded by assertions about PDIs. Not only is overloading of the 101 identifier avoided, but assertions can also be modified without 102 recourse to changing the identifier. 104 1. Namespace Syntax 106 Consistent with the URN syntax specification in RFC 2141 [17], each 107 namespace must specify syntax related information that is specific to 108 that namespace. This section provides these specifications for the 109 PDI namespace. The PDI grammar below uses the ABNF [6]. A URN using 110 the Persistent Document Identifier namespace has the form: 112 = "urn:" pdi ; Encoding in URN syntax 114 1.1. Namespace Identifier (NID) 116 The Namespace Identifier for this namespace is "pdi", which is case 117 insensitive. 119 = "pdi" ":" nss ; Persistent Document Identifier 121 1.2. Namespace Specific String (NSS) 123 The Namespace Specific String for this namespace is: 125 = resource-identifier [(citation-specifier / fragment-specifier)] 127 1.2.1 Resource Identifier 129 = "//" document-series "/" iso-date "/" specifier 131 = component *["." component] "." iso-country 133 = alpha-hyphen-digits 135 = 2*alpha ; See ISO Standard 3166 [10] 137 = year "/" month "/" day 139 = 4*digit / wildcard 141 = 2*digit / wildcard 143 = 2*digit / wildcard 145 = unique-id ["." format ["." version]] ;versions require formats 147 = daily-serial-number / encapulated-unique-id / digits / 148 wildcard 150 = digits 152 = unique-id-chars 154 = alpha / digit / other / "%" hex hex 156 = media-type-token / wildcard 158 = "text" / "html" / extension-token 160 = alpha-hyphen 162 = digits / wildcard 164 = "*" 166 1.2.2 Citation Specifier 168 = "@" origin-position "=" pdi 170 = position 172 1.2.3 Fragment Specifier 174 = "#" [fragment-scheme "="] position [*("," position)] 176 = "char" / "elt" / "name" / "rect" / "msec" / "sec" / 177 "crop" / "byte" / ext-fragment-scheme 179 = alpha-hyphen 181 = char-position / element-position / element-name / 182 2-dim-coordinate / frame-number / time / byte-position / 183 ext-position 185 = position-specifier / 186 "(" position-specifier *["," position-specifier] ")" 188 = alphadigits 190 1.2.4 Supporting Definitions 192 = %x41-5A / %x61-7A ; A-Z / a-z 194 = alphas / digits 196 = alpha-hyphen / digits 198 = alpha / "-" 200 = *alpha-hyphen 202 = *ALPHA 204 = %x30-39 ; 0-9 206 = *DIGIT 208 = trans / "%" hex hex ;RFC 2141 210 = alpha / digit / other / reserved 212 = digit / "A" / "B" / "C" / "D" / "E" / "F" / 213 "a" / "b" / "c" / "d" / "e" / "f" 215 = "(" / ")" / "-" / ":" / ";" / "$" / "_" / "!" / "'" 217 = "%" / "." / "," / "/" / "#" / "*" / "@" / 218 "=" / "?" / "+" 220 1.2.5 Reserved Characters 222 are used as special characters in the PDI grammar. They 223 MUST be encoded according to the character escaping method 224 described in RFC 2141 [17]. 226 2 Discussion 228 2.1 Minting PDIs 230 PDIs are issued by the authority named in . 231 is intended to look like a domain name for easy 232 parsing but there is no requirement to serve the name via the Domain 233 Name System (DNS) nor to assure that the name is not assigned for 234 other purposes by DNS. 236 The encoded date in is the date when the identifier is 237 minted. This date is based on Greenwich meantime. The encoded date 238 bears no relationship to dates associated with the resource that the 239 PDI denotes, even if there may be proximity between the time when the 240 resource issues and the time when the PDI is minted. 242 The PDI namespace is monotonic; PDIs cannot be retracted. If a new 243 version of the same document issues, it MUST increment the version 244 number for the previously issued PDI. This requirement assures that 245 any machine representation (byte sequence) associated with formats 246 of a versioned PDI never changes. 248 Byte equivalence for all resource formats denoted by a specific PDI 249 version ensures that digital signatures associated with a PDI check 250 for any uncorrupted resource. More significantly, byte equivalence 251 enables reliable, efficient fragment references for many media types. 252 It eliminates the potentially difficult problem of rolling fragment 253 references forward as a target resource is modified. 255 2.2 Issuing Authority 257 The issuing authority controls the name in a document series. These 258 names are hierarchical so that administration can be delegated within 259 authority domains. Unlike domain names, the right most component of a 260 MUST be a two digit ISO 3166 country code [10], 261 indicating the country in which the issuing organization resides. In 262 most cases, a SHOULD add a term to the issuing 263 authority in order to differentiate the series from other document 264 sets that the authority might issue. By specializing the document 265 series below the issuing authority, identifiers reflect the chain of 266 delegation. Additionally, it becomes easier to obsolesce an entire 267 document series, if that becomes necessary. 269 For wide use of PDIs, an issuing authority will need to issue 270 toplevel authority names to organizations wishing to mint PDIs in 271 their own document series. Once a toplevel document series name has 272 been obtained, an organization may issue PDIs itself or delegate 273 subseries. 275 A subseries is delegated by adding a name component to the left of 276 . The accretion of components on a document series 277 MAY utilize existing organizational names or acronyms whenever 278 feasible in order to preserve mnemonics in the document series 279 name. Additionally, dropping components from the left SHOULD lead 280 to ever more general issuing authorities in terms of organizational 281 scope. 283 Delegation SHOULD follow de jure organizational structure. Issuing 284 authority SHOULD NEVER be delegated outside the organization unless 285 the external agent is acting directly on behalf of the document 286 series owner. When organizational boundaries are crossed, a new 287 document series toplevel SHOULD be acquired. Within an organization, 288 issuing authority SHOULD be delegated to the level where 289 responsibility for content resides. This facilitates contact with 290 document originators. More importantly, it reduces administrative 291 scope, and thus, encourages more uniform document management policies 292 for a particular document series. 294 2.3 Hierarchical Date 296 of a PDI MUST be assigned when the identifier is minted. 297 The calendar date MUST correspond to Greenwich Mean time. 299 Inclusion of the ISO date conveys the time when the identifier was 300 minted. Beyond making it easier to guarantee identifier uniqueness, 301 hierarchicalization by date enables reference to ranges of 302 identifiers issued within specific time intervals. 304 Use of ISO dates also ensures that lexical sorts of identifiers 305 produce a chronological ordering of PDIs, making various listings 306 (e.g., directory lists) automatically appear in a meaningful 307 order. 309 Moreover, different administrative policies MAY be applied to any 310 particular time interval. For example, when responsibility for 311 resolving PDIs shifts to a different administrative authority, 312 intervals covered by the new policy are readily specifiable and 313 conveyed. For example, different intervals may be delegated to 314 different URN resolvers and these delegations recorded with 315 relevant URN discovery systems. 317 Operations may be applied to identifiers within an interval. For 318 example, a browser can provide a directory list of all the 319 documents in a year, a month, or on a day. 321 More generally, assertions can be made about identifiers within an 322 interval, such as where to find a resolver. 324 2.4 Daily Unique ID 326 An application may use a mnemonic name or a serial number as the 327 . The only requirement is that MUST be a 328 unique sequence of for and 329 . 331 If the unique ID is a , serial numbers SHOULD 332 start from 1 and SHOULD be incremented by 1 as each new PDI is 333 minted. When the calendar day is incremented at midnight GMT, the 334 unique ID of the day SHOULD be reset to start at 1 on the new day. 335 This prevents daily unique IDs from growing very large as it 336 enforces date semantics on the identifier. 338 2.4.1 Encapsulation of Foreign Identifiers 340 The specification of this field has been left open so that foreign 341 document identifiers MAY be incorporated within a PDI as the daily 342 unique ID. For our purposes, a foreign identifier is any identifier 343 used by other naming or reference regimes. Examples of foreign 344 identifiers include, serial numbers, invoice numbers, URIs, URLs or 345 other application-specific identifiers. 347 When encapsulating a foreign identifier, is required and 348 MUST use a that identifies the media type of the 349 resource and format of the encapsulated identifier. The media type 350 token is required in order to allow unambiguous interpretation by 351 applications aware of the identifier semantics. All other 352 applications, MUST treat the unique id as opaque. 354 2.5 Format 356 Format should use standard, controlled terms that indicate the 357 media type [3] of the resource to which the identifier refers or, 358 in the case of encapsulated identifiers, indicate the type of the 359 encapsulated identifier. is case insensitive. 361 The standards for MIME content types [10] do not as yet provide a 362 single controlled term per media type that can be used as a file 363 extension or here as a PDI format. Below we provide a rule for 364 constructing the . These tokens are created from 365 the registered media types [10] by using the if it is 366 unique, or otherwise, concatenating the and 367 . These tokens are case insensitive and MUST encode any 368 reserved characters () for PDIs. 370 = major-type "/" minor-type 371 [* (";" parameter ["=" value])] 373 = minor-type / (major-type "+" minor-type) 375 = alpha-hyphen-digits 377 = alpha-hyphen-digits 379 There are two media types for which is not 380 : 382 Token Content Type 384 text text/plain 385 header message/header ;RFC 822 message headers 387 is always required when: 389 * A PDI is minted and assigned to a specific resource. 390 * A foreign document ID is encapsulated in . 391 * References to resource fragments are made. 392 * A client requests a resource in a specific format. 394 The format indicates how to interpret encapsulated identifiers and 395 MUST be supplied whenever foreign document identifiers are 396 encapsulated. For example, if an HTTP URL was encapsulated, the 397 PDI might look like: 399 pdi://oma.eop.gov.us/1994/10/20/http%3a%2f%2fwww%2ewhitehouse%2egov%2f.html.1 401 This PDI encapsulates the URL http://www.whitehouse.gov/ and denotes 402 its content on October 20, 1994, when the site was unveiled. 404 When a PDI contains fragment syntax, a format MUST be provided in 405 order to convey the media type of the resource to which the 406 fragment reference applies. 408 A server may store any subset of formats for a resource. It may 409 compute unstored formats on demand. A client can specify the desired 410 format by using a PDI with the appropriate format field. 412 If format is omitted, the identifier refers to the generic resource 413 denoted by the PDI. Assertions about the generic resource apply to 414 all the instantiations in the various media types indicated by the 415 universe of format in which the resource is available. 417 2.6 Version 419 The PDI is an optional component indicating a specific 420 version of a resource. is a positive integer greater than 421 0. When is omitted, it defaults to version 1. 423 Version numbers refer to the generic resource and not the specific 424 format, but a resource cannot have a version without having at 425 least one format. When a resource is changed in any format, version 426 numbers for all formats MUST be updated. In general, when a 427 resource changes significantly, applications SHOULD generate new 428 PDIs. When changes are small or incremental, applications SHOULD 429 increment the version. Any change in the byte count of a resource 430 for a specific is a change and the version SHOULD be 431 incremented. Addition of a new with the same semantics as 432 an existing for the PDI is not a change and does not 433 require the version to be incremented. 435 Consequently, if an HTML document issues under 437 pdi://oma.eop.gov/1997/09/01.html.1 439 ,and later, the HTML is converted to text, the PDI for the text 440 version is 442 pdi://oma.eop.gov/1997/09/01.text.1 444 However, if a spelling mistake is corrected later, whether or not 445 it changes the byte count in any format, the version number is 446 incremented. 448 pdi://oma.eop.gov/1997/09/01.text.2 450 An editing application MAY write internal versions of a document in 451 progress and only commit to the versioned PDI at a point when the 452 editing completed and the document is ready for release. 454 Version numbers MUST be included when: 456 * PDIs are minted and associated with specific resources. 457 * PDIs contain a fragment references. 458 * PDIs contain a fragment citation. 460 Inclusion of a in a fragment references ensures that the 461 fragment reference is resolved against a consistent machine 462 representation of the resource. 464 3 Fragment Syntax 466 3.1 Motivation 468 The PDI namespace provides an extensible syntax for referring to 469 parts of resources. Fragment syntax must be extensible because: 471 * There are too many existing media types. 473 * Some media types require highly technical fragment syntax, 474 (e.g., multidimensional points, multiresolution channels). 476 * New media types are coming into existence all the time. 478 The approach adopted here is to allow additional RFCs to extend 479 fragment syntax by adding fragment specifiers as they are needed. 481 The availability of a syntax for referring to resource fragments 482 raises the problem of referring to citations of fragments by 483 composite resources. The PDI namespace provides a fragment citation 484 syntax to address this issue. 486 3.2 Philosophy 488 3.2.1 Media Representations 490 A fragment syntax SHOULD differentiate the media representation from 491 the machine representation. If fragment schemes for a particular 492 media type use a media representation, they can be retargeted at new 493 or different machine representations. Otherwise, fragment schemes may 494 become unresolvable in the future when machine representations 495 change. Consequently, although a byte fragment specifier is provided 496 below, it SHOULD be used only for short-term purposes when 497 alternatives are unavailable. 499 3.2.2 Immediate Fragments 501 URNs require a fragment syntax because the alternative of interning 502 every fragment PDI in a URN namespace does not scale. It requires 503 the resolver to store potentially all possible permutations of the 504 fragment specifier for every resource. Immediate fragments require 505 the fragment syntax to be part of the identifier. With immediate 506 fragments, resolvers need only store those fragment PDIs for which 507 there are assertions beyond the binding to the resource subset. 508 Additionally, immediate fragments enhance privacy by not storing 509 all references to resource subsets. They also conserve storage and 510 reduce computation on resolvers. 512 3.2.3 Fragment Conjunctions 514 The fragment syntax does not support conjunctions of fragments 515 because this introduces a source of ambiguity when assertions are 516 made about PDIs. Conjunctive fragments SHOULD be handled by creating 517 a new PDI and asserting that it is the conjunction of some fragments. 518 In this way, the set is explicitly represented and ambiguous 519 references are excluded from the syntax. 521 3.2.4 Decoupling from Reference Mechanics 523 Fragment reference could be accomplished by providing a program 524 that given a resource return the specified part. This is not the 525 approach advocated here. The fragment scheme MUST be a minimal set 526 of parameters required for a program to extract the relevant part. 527 Additionally, these parameters SHOULD be specified in the order of 528 importance for extracting the referent. This increases the 529 probability of finding a referent if an identifier is accidently 530 truncated. In general, new fragment specifiers SHOULD minimize the 531 syntax the of invariants and parameters they require. 533 3.3 Fragment Scheme 535 The indicates the position syntax used in 536 . A default position scheme should be defined for each 537 Content Type token used in PDIs. For example, text/plain uses 538 character positions as the default. The MAY be 539 omitted when it is the default position scheme for the content type 540 indicated by . In all other circumstances, 541 MUST be supplied in order to ensure unambiguous 542 interpretation of position specifiers. Position schemes are case 543 insensitive. 545 3.4 Fragment Specifiers 547 The following position reference schemes have been defined: 549 3.4.1 Text Fragment Specifier 551 Text fragments are defined for the MIME Content Type text/*. Each 552 text fragment is an interval bounded by two character positions in 553 the resource. The fragment is the set of characters from 554 upto but excluding . The first character position starts with 555 0. Character positions are relative to the canonical,CRLF encoded 556 text for the resource. Therefore, all text/* resources MUST be 557 CRLF encoded to ensure correct fragment references. The PDI 558 for text/plain is "text" and is 559 the default position specifier for the media type. 561 = "#" ["char" "="] start-char 562 "," end-char 564 = digits 566 = digits 568 Although wide-spread encodings for many alphabets use a single 8 bit 569 byte (e.g., ISO-8859 [15]), other encodings (e.g., unicode) employ 570 multi-byte encodings. Consequently, a server MUST be aware of the 571 character set used to encode a text resource. For 8 bit character 572 sets, char fragment resolution reduces to byte position. However, 573 multi-byte character sets require the server to perform appropriate 574 translation from the stored data representation. 576 The following PDI refers to the text starting at character 37 and 577 continuing upto but excluding character 51. 579 pdi://oma.eop.gov.us/1997/09/01/1.text.1#char=37,51 581 Since the default fragment specifier for text is 582 , the following PDI is equivalent: 584 pdi://oma.eop.gov.us/1997/09/01/1.text.1#37,51 586 When a text/plain content type uses a multi-byte character set, 587 MUST be the character set token as defined by the IANA 588 Character Set Registry [18]. 590 3.4.2 HTML Fragment Specifier 592 Fragments may be specified for the MIME Content Type text/html using 593 character fragment specifiers. The PDI for text/html is 594 "html". The default position specifier for text/html is "char" 595 because it simplifies serving fragments. 597 Although character references are simple and effective for HTML 598 document fragments, it is often more convenient to use HTML 599 elements to delimit an interval within a document. Specific HTML 600 elements can be identified using the name parameter value or the 601 position of the tag in the document. In either case, the fragment 602 consists of all text and HTML tags from to and 603 including . References to HTML containers is 604 facilitated by use of a closed interval, but it can be awkward for 605 tags that are not explicitly closed, especially if they are 606 implicitly closed (e.g.,

). Tag positions are counted from the 607 start of the resource, with the first being assigned 0. An 608 refers to the first element whose name parameter 609 value is equal to , which must be encoded according 610 to URN syntax [17], but decoded for case-sensitive equality testing. 612 = "#" start-element "," end-element 614 = char-fragment-scheme / 615 element-fragment-scheme / 616 named-fragment-scheme 618 = "elt" 620 = element-position / element-name 622 = element-position / element-name 624 = digits 626 = "name" 628 = urn-chars 630 Char, elt, and name position references MUST use the same position 631 scheme for and an HTML fragment 632 reference. 634 HTML fragments may depend on surrounding context that is not part 635 of the fragment. HTML rendition without this containing context may 636 produce different effects or incorrect HTML. Responsibility for 637 assuring legal and felicitous HTML must reside with the user or 638 application creating the fragment reference because document 639 authors cannot be expected to anticipate all possible citations. 640 Therefore, the user or application creating the fragment citation 641 MUST NOT create illegal HTML fragments. 643 When fragments require context, the user or application MAY create 644 an intermediate document that uses fragment references to extract 645 both the relevant context and the target fragment. This 646 intermediate document SHOULD be legal HTML capable of standing 647 alone. 649 3.4.2 SGML & XML Fragment Specifier 651 The element and char fragment schemes can be applied to the more 652 general Standard Generalized Markup Language (SGML) [14] and 653 Extensible Markup Language XML [4] mark up languages, of which it 654 is a subset. The BNF below give the fragment specification for 655 SGML and any subsets, such as XML. 657 = "#" sgml-start-element "," 658 sgml-end-element 660 = element-fragment-scheme / 661 char-fragment-scheme 663 = element-position 665 = element-position 667 The default fragment specifier for SGML and SGML subsets is "char". 668 The following content tokens are defined: 670 text/sgml sgml 671 text/xml xml 673 The context caveats for HTML fragments should be extended pari pasu 674 to SGML and XML fragments. 676 3.4.5 Image Fragment Specifier 678 Image media types use a variety of encoding schemes and some 679 include multiple frames. Fragment reference for image/* uses a two 680 dimensional cartesian coordinate system with the origin (0, 0) being 681 in the upper left hand corner. The scale of the coordinate system is 682 the pixel level scale of the containing image. References to 683 subrectangles are made by specifying for the image fragment the 684 as the upper left most point and 685 as the lower right most point. These x and y coordinates are in 686 coordinate system of the containing image. When multiple frames are 687 present in an image, the reference frame is specified by providing 688 , which is 0 based and defaults to 0 when omitted. 689 is the default fragment specifier for 690 the media types image/*. 692 = # ["rect" "="] start-coordinate 693 "," end-coordinate ["," frame] 695 = digits 697 = 2-dim-coordinate 699 = 2-dim-coordinate 701 <2-DIM-COORDINATE> = "(" x-coordinate "," y-coordinate ")" 703 = digits 705 = digits 707 The example below refers to an image fragment whose origin is x=5, 708 y=10 and extends to x=25, y=30. This yields the maximal rectangle 709 including the coordinates (5,10), (24,10), (24,29), (5,29). Note that 710 the zero-based coordinate system does not include the point denoted by 711 the . 713 pdi://images.satellite.nasa.gov.us/1997/09/30/1234.gif#(5,10),(25,30) 715 Since frame is unspecified, it defaults to zero and this PDI is equivalent to 717 pdi://images.satellite.nasa.gov.us/1997/09/30/1234.gif#(5,10),(25,30),0 719 The next PDI refers to the third frame in an animated GIF. As it 720 simplifies array references, the zero-based index shifts references 721 to the left by 1. 723 pdi://images.satellite.nasa.gov.us/1997/09/30/1234.gif#(5,10),(25,30),2 725 3.4.4 Audio Fragment Specifier 727 Audio media types use various encoding schemes (including variable 728 quality) that make byte ranges problematic for fragment references. 729 Start and end times provide a coordinate scheme that can be resolved 730 for any audio media type. A fragment reference includes data from and 731 including upto and excluding . The position 732 scheme for the temporal reference gives the time units. Two time 733 position schemes are defined. "msec" is millesconds and "sec" is 734 seconds. Temporal position schemes MUST NOT be intermixed. The 735 default time position scheme for audio/* is "sec". 736 is the default fragment specifier for 737 audio/*. 739 = # time-position-scheme "=" 740 start-time "," end-time 742 = "msec" / "sec" / 743 ext-time-position-scheme 745 = time 747 = time 749