idnits 2.17.1 draft-wilde-text-fragment-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 852. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 863. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 870. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 876. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC2046, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC2046, updated by this document, for RFC5378 checks: 1995-04-14) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 6, 2007) is 6139 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4234 (ref. '6') (Obsoleted by RFC 5234) -- Possible downref: Non-RFC (?) normative reference: ref. '7' ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '8') -- Duplicate reference: RFC3629, mentioned in '11', was also mentioned in '10'. -- Obsolete informational reference (is this intentional?): RFC 4288 (ref. '13') (Obsoleted by RFC 6838) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Wilde 3 Internet-Draft UC Berkeley 4 Updates: 2046 (if approved) M. Duerst 5 Intended status: Standards Track Aoyama Gakuin University 6 Expires: January 7, 2008 July 6, 2007 8 URI Fragment Identifiers for the text/plain Media Type 9 draft-wilde-text-fragment-07 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on January 7, 2008. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 This memo defines URI fragment identifiers for text/plain MIME 43 entities. These fragment identifiers make it possible to refer to 44 parts of a text/plain MIME entity, either identified by character 45 position or range, or by line position or range. Fragment 46 identifiers may also contain hash information to make them more 47 robust. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 1.1. What is text/plain? . . . . . . . . . . . . . . . . . . . 3 53 1.2. What is a URI Fragment Identifier? . . . . . . . . . . . . 4 54 1.3. Why text/plain Fragment Identifiers? . . . . . . . . . . . 4 55 1.4. Incremental Deployment . . . . . . . . . . . . . . . . . . 5 56 1.5. Notation Used in this Memo . . . . . . . . . . . . . . . . 5 57 2. Fragment Identification Methods . . . . . . . . . . . . . . . 5 58 2.1. Fragment Identification Principles . . . . . . . . . . . . 6 59 2.1.1. Positions and Ranges . . . . . . . . . . . . . . . . . 6 60 2.1.2. Characters and Lines . . . . . . . . . . . . . . . . . 7 61 2.2. Combining the Principles . . . . . . . . . . . . . . . . . 7 62 2.2.1. Character Position . . . . . . . . . . . . . . . . . . 7 63 2.2.2. Character Range . . . . . . . . . . . . . . . . . . . 8 64 2.2.3. Line Position . . . . . . . . . . . . . . . . . . . . 8 65 2.2.4. Line Range . . . . . . . . . . . . . . . . . . . . . . 8 66 2.3. Fragment Identifier Robustness . . . . . . . . . . . . . . 8 67 3. Fragment Identification Syntax . . . . . . . . . . . . . . . . 9 68 3.1. Hash Sums . . . . . . . . . . . . . . . . . . . . . . . . 9 69 4. Fragment Identifier Processing . . . . . . . . . . . . . . . . 10 70 4.1. Handling of Line Endings in text/plain MIME Entities . . . 10 71 4.2. Handling of Position Values . . . . . . . . . . . . . . . 11 72 4.3. Handling of Hash Sums . . . . . . . . . . . . . . . . . . 11 73 4.4. Syntax Errors in Fragment Identifiers . . . . . . . . . . 11 74 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 77 8. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 13 78 8.1. From -06 to -07 (addressing IETF Last Call Comments) . . . 13 79 8.2. From -05 to -06 . . . . . . . . . . . . . . . . . . . . . 14 80 8.3. From -04 to -05 . . . . . . . . . . . . . . . . . . . . . 15 81 8.4. From -03 to -04 . . . . . . . . . . . . . . . . . . . . . 15 82 8.5. From -02 to -03 . . . . . . . . . . . . . . . . . . . . . 16 83 8.6. From -01 to -02 . . . . . . . . . . . . . . . . . . . . . 16 84 8.7. From -00 to -01 . . . . . . . . . . . . . . . . . . . . . 16 85 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 86 9.1. Normative References . . . . . . . . . . . . . . . . . . . 17 87 9.2. Non-Normative References . . . . . . . . . . . . . . . . . 17 88 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 18 89 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 90 Intellectual Property and Copyright Statements . . . . . . . . . . 19 92 1. Introduction 94 This memo updates the text/plain MIME type defined in RFC 2046 [1] by 95 defining URI fragment identifiers for text/plain MIME entities. This 96 makes it possible to refer to parts of a text/plain MIME entity. 97 Such parts can be identified by either character position or range, 98 or by line position or range. Hash information can be added to a 99 fragment identifier to make it more robust, enabling applications to 100 detect changes of the entity. 102 This section gives an introduction to the general concepts of text/ 103 plain MIME entities and URI fragment identifiers, and discusses the 104 need for fragment identifiers for text/plain and deployment issues. 105 Section 2 discusses the principles and methods on which this memo is 106 based. Section 3 defines the syntax, and Section 4 discusses 107 processing of text/plain fragment identifiers. Section 5 shows some 108 examples. 110 1.1. What is text/plain? 112 Internet Media Types (often referred to as "MIME types") as defined 113 in RFC 2045 [2] and RFC 2046 [1] are used to identify different types 114 and sub-types of media. RFC 2046 [1] and RFC 3676 [3] specify the 115 text/plain media type, which is used for simple, unformatted text. 116 Quoting from RFC 2046 [1]: "Plain text does not provide for or allow 117 formatting commands, font attribute specifications, processing 118 instructions, interpretation directives, or content markup. Plain 119 text is seen simply as a linear sequence of characters, possibly 120 interrupted by line breaks or page breaks." 122 The text/plain media type does not restrict the character encoding; 123 any character encoding may be used. In the absence of an explicit 124 character encoding declaration, US-ASCII [10] is assumed as the 125 default character encoding. This variability of the character 126 encoding makes it impossible to count characters in a text/plain MIME 127 entity without taking the character encoding into account, because 128 there are many character encodings using more than one octet per 129 character. 131 The biggest advantage of text/plain MIME entities is their ease of 132 use and their portability among different platforms. As long as they 133 use popular character encodings (such as US-ASCII or UTF-8 [11]), 134 they can be displayed and processed on virtually every computer 135 system. The only remaining interoperability issue is the 136 representation of line endings, which is discussed in Section 4.1. 138 1.2. What is a URI Fragment Identifier? 140 URIs are the identification mechanism for resources on the Web. The 141 URI syntax specified in RFC 3986 [4] optionally includes a so-called 142 "fragment identifier", separated by a number sign ('#'). The 143 fragment identifier consists of additional reference information to 144 be interpreted by the user agent after the retrieval action has been 145 successfully completed. The semantics of a fragment identifier is a 146 property of the data resulting from a retrieval action, regardless of 147 the type of URI used in the reference. Therefore, the format and 148 interpretation of fragment identifiers is dependent on the media type 149 of the retrieval result. 151 The most popular fragment identifier is defined for text/html 152 (defined in RFC 2854 [12]), and makes it possible to refer to a 153 specific element (identified by the value of a 'name' or 'id' 154 attribute) of an HTML document. This makes it possible to reference 155 a specific part of a Web page, rather than a Web page as a whole. 157 1.3. Why text/plain Fragment Identifiers? 159 Referring to specific parts of a resource can be very useful, because 160 it enables users and applications to create more specific references. 161 Users can create references to the part they really are interested in 162 or want to talk about, rather than always pointing to a complete 163 resource. Even though it is suggested that fragment identification 164 methods are specified in a media type's MIME registration (see [13]), 165 many media types do not have fragment identification methods 166 associated with them. 168 Fragment identifiers are only useful if supported by the client, 169 because they are only interpreted by the client. Therefore, a new 170 fragment identification method will require some time to be adopted 171 by clients, and older clients will not support it. However, because 172 the URI still works even if the fragment identifier is not supported 173 (the resource is retrieved, but the fragment identifier is not 174 interpreted), rapid adoption is not highly critical to ensure the 175 success of a new fragment identification method. 177 Fragment identifiers for text/plain as defined in this memo make it 178 possible to refer to specific parts of a text/plain MIME entity, 179 using concepts of positions and ranges, which may be applied to 180 characters and lines. Thus, text/plain fragment identifiers enable 181 users to exchange information more specifically, thereby reducing 182 time and effort that is necessary to manually search for the relevant 183 part of a text/plain MIME entity. 185 The text/plain format does not support the embedding of links, so in 186 most environments, text/plain resources can only serve as targets for 187 links, and not as sources. However, when combining the text/plain 188 fragment identifiers specified in this memo with out-of-line linking 189 mechanisms such as XLink [14], it becomes possible to "bind" link 190 resources to text/plain resources and thereby "embed" links into 191 text/plain resources. Thus, the text/plain fragment identifiers 192 specified in this memo open a path for text/plain files to become 193 bidirectionally navigable resources in hypermedia systems such as the 194 Web. 196 1.4. Incremental Deployment 198 As long as text/plain fragment identifiers are not supported 199 universally, it is important to consider the implications of 200 incremental deployment. Clients (for example, Web browsers) not 201 supporting the text/plain fragment identifier described in this memo 202 will work with URI references to text/plain MIME entities, but they 203 will fail to locate the sub-resource identified by the fragment 204 identifier. This is a reasonable fallback behavior, and in general 205 users should take into account the possibility that a program 206 interpreting a given URI will fail to interpret the fragment 207 identifier part. Since fragment identifier evaluation is local to 208 the client (and happens after retrieving the MIME entity), there is 209 no reliable way for a server to determine whether a requesting client 210 is using a URI containing a fragment identifier. 212 1.5. Notation Used in this Memo 214 The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 215 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 216 "OPTIONAL" in this document are to be interpreted as described in RFC 217 2119 [5]. 219 2. Fragment Identification Methods 221 The identification of fragments of text/plain MIME entities can be 222 based on different foundations. Since it is not possible to insert 223 explicit, invisible identifiers into a text/plain MIME entity (as for 224 example used in HTML documents, implemented through dedicated 225 attributes), fragment identification has to rely on certain inherent 226 properties of the MIME entity. This memo specifies fragment 227 identification using four different methods, which are character 228 positions and ranges, and line positions and ranges, augmented by a 229 hash sum mechanism for improving the robustness of fragment 230 identifiers. 232 When interpreting character or line numbers, implementations MUST 233 take the character encoding of the MIME entity into account, because 234 character count and octet count may differ for the character encoding 235 being used. For example, a MIME entity using UTF-16 encoding (as 236 specified in RFC 2718 [15]) uses two octets per character in most 237 cases, and sometimes four octets per character. It can also have a 238 leading BOM (Byte-Order Mark), which does not count as a character 239 and thus also affects the mapping from a simple octet count to a 240 character count. 242 2.1. Fragment Identification Principles 244 Fragment identification can be done by combining two orthogonal 245 principles, which are positions and ranges, and characters and lines. 246 This section describes the principles themselves, while Section 2.2 247 describes the combination of the principles. 249 2.1.1. Positions and Ranges 251 A position does not identify an actual fragment of the MIME entity, 252 but a position inside the MIME entity, which can be regarded as a 253 fragment of length zero. The use case for positions is to provide 254 pointers for applications which may use them to implement 255 functionalities such as "insert some text here", which needs a 256 position rather than a fragment. Positions are counted from zero, 257 position zero being before the first character or line of a text/ 258 plain MIME entity. Thus a text/plain MIME entity having one 259 character has two positions, one before the first character (position 260 0), and one after the first character (position 1). 262 Since positions are fragments of length zero, applications SHOULD use 263 other methods than highlighting to indicate positions, the most 264 obvious way being the positioning of a cursor (if the application 265 supports the concept of a cursor). 267 Ranges, on the other hand, identify fragments of a MIME entity that 268 have a length that may be greater than zero. As a general principle 269 for ranges, they specify both a lower and an upper bound. The start 270 or the end of a range specification may be omitted, defaulting to the 271 first respectively last position of the MIME entity. The end of a 272 range must have a value greater than or equal to the start. A range 273 with identical start and end is legal, and identifies a range of 274 length zero, which is equivalent to a position. 276 Applications that support a concept such as highlighting SHOULD use 277 such a concept to indicate fragments of lengths greater than zero to 278 the user. 280 For positions and ranges it is implicitly assumed that if a number is 281 greater than the actual number of elements in the MIME entity, then 282 it is referring to the last element of the MIME entity (see Section 4 283 for details). 285 2.1.2. Characters and Lines 287 The concept of positions and ranges can be applied to characters or 288 lines. In both cases, positions indicate points between these 289 entities, while ranges identify zero or more of these entities by 290 indicating positions. 292 Character positions are numbered starting with zero (ignoring initial 293 BOM marks or similar concepts that are not part of the actual textual 294 content of a text/plain MIME entity), and counting each character 295 separately, with the exception of line endings, which are always 296 counted as one character (see Section 4.1 for details). 298 Line positions are numbered starting with zero (with line position 299 zero always being identical with character position zero), with 300 Section 4.1 describing how line endings are identified. Fragments 301 identified by lines include the line endings, so applications 302 identifying line-based fragments MUST include the line endings in the 303 fragment identification they are using (e.g., the highlighted 304 selection). If a MIME entity does not contain any line endings, then 305 it consists of a single (the first) line. 307 2.2. Combining the Principles 309 In the following sections, the principles described in the preceding 310 section (positions/ranges and characters/lines) are combined, 311 resulting in four use cases. The schemes mentioned below refer to 312 the fragment identifier syntax, described in detail in Section 3. 314 2.2.1. Character Position 316 To identify a character position (i.e., a fragment of length zero 317 between two characters), the 'char' scheme followed by a single 318 number is used. This method identifies a position between two 319 characters (or before the first or after the last character), rather 320 than identifying a fragment consisting of a number of characters. 321 Character position counting starts with 0, so the character position 322 before the first character of a text/plain MIME entity has the 323 character position 0, and a MIME entity containing n distinct 324 characters has n+1 distinct character positions, the last one having 325 the character position n. 327 2.2.2. Character Range 329 To identify a fragment of one or more characters (a character range), 330 the 'char' scheme followed by a range specification is used. A 331 character range is a consecutive region of the MIME entity that 332 extends from the starting character position of the range to the 333 ending character position of the range. 335 2.2.3. Line Position 337 To identify a line position (i.e., a fragment of length zero between 338 two lines), the 'line' scheme followed by a single number is used. 339 This method identifies a position between two lines (or before the 340 first or after the last line), rather than identifying a fragment 341 consisting of a number of lines. Line position counting starts with 342 0, so the line position before the first line of a text/plain MIME 343 entity has the line position 0, and a MIME entity containing n 344 distinct lines has n+1 distinct line positions, the last one having 345 the line position n. 347 2.2.4. Line Range 349 To identify a fragment of one or more lines (a line range), the 350 'line' scheme followed by a range specification is used. A line 351 range is a consecutive region of the MIME entity that extends from 352 the starting line position of the range to the ending line position 353 of the range. 355 2.3. Fragment Identifier Robustness 357 It is easily possible that a modification of the referenced resource 358 will break a fragment identifier. If applications want to create 359 more robust fragment identifiers, they may do so by adding hash sums 360 to fragment identifiers. These hash sums are used to detect changes 361 in the resource. Applications can then warn users about the 362 possibility that a fragment identifier might have been broken by a 363 modification of the resource. 365 Since fragment identifiers are interpreted by clients, hash sums are 366 defined on MIME entities rather than on the resource itself, and as 367 such are specific to a certain representation of the resource, in 368 case of text/plain resources the character encoding of the MIME 369 entity. 371 Hash sums may specify the character encoding that has been used when 372 creating the hash sums, and if such a specification is present, 373 clients MUST check whether the character encoding specified for the 374 hash sum and the character encoding of the retrieved MIME entity are 375 equal, and clients MUST NOT check the hash sum if these values 376 differ. However, clients MAY choose to transcode the retrieved MIME 377 entity in the case of differing character encodings, and after doing 378 so, check the hash sum. Please note that this method is inherently 379 unreliable, because certain characters or character sequences may 380 have been lost or normalized due to restrictions in one of the 381 character encodings used. 383 3. Fragment Identification Syntax 385 The syntax for the text/plain fragment identifiers is 386 straightforward. The syntax defines three schemes, 'char', 'line', 387 and hash (which can either be 'length' or 'md5'). The 'char' and 388 'line' schemes can be used in two different variants, either the 389 position variant (with a single number), or the range variant (with 390 two comma-separated numbers). The hash scheme can either use the 391 'length' or the 'md5' scheme to specify a hash value. 'length' in 392 this case serves as a very weak but easy to calculate hash function. 394 The following syntax definition uses ABNF as defined in RFC 4234 [6], 395 including the rules DIGIT and HEXDIG. The mime-charset rule is 396 defined in RFC 2978 [7]. 398 NOTE: In the descriptions that follow, specified text values MUST be 399 used exactly as given, using exactly the indicated lower-case 400 letters. In this respect, the ABNF usage differs from [6]. 402 text-fragment = text-scheme 0*( ";" hash-scheme) 403 text-scheme = ( char-scheme / line-scheme ) 404 char-scheme = "char=" ( position / range ) 405 line-scheme = "line=" ( position / range ) 406 hash-scheme = ( length-scheme / md5-scheme ) [ "," mime-charset ] 407 position = number 408 range = (position "," [ position ]) / ("," position ) 409 number = 0*( DIGIT ) 410 length-scheme = "length=" number 411 md5-scheme = "md5=" md5-value 412 md5-value = 32HEXDIG 414 3.1. Hash Sums 416 A hash sum can either specify a MIME entity's length, or its MD5 417 fingerprint. In both cases, it can optionally specify the character 418 encoding which had been used when calculating the hash sum, so that 419 clients interpreting the fragment identifier may check whether they 420 are using the same character encoding for their calculations. For 421 lengths, the character encoding can be necessary because it can 422 influence the character count. As an example, Unicode includes 423 precomposed characters for writing Vietnamese, but in the windows- 424 1258 encoding, also used for writing Vietnamese, some characters have 425 to be encoded with separate diacritics, which means that two 426 characters will be counted. Applying Unicode terminology, this means 427 that the length of a text/plain MIME entity is computed based on its 428 "code points". For MD5 fingerprints, the character encoding is 429 necessary because the MD5 algorithm works on the binary 430 representation of the text/plain resource. 432 The length of a text/plain MIME entity is calculated by using the 433 principles defined in Section 2.1.2. The MD5 fingerprint of a text/ 434 plain MIME entity is calculated by using the algorithm presented in 435 [8], encoding the result in 16 hexadecimal digits (using uppercase or 436 lowercase letters) as a representation of the 128 bits which are the 437 result of the MD5 algorithm. Calculation of hash sums is done after 438 stripping any potential content-encodings or content-transfer- 439 encodings of the transport mechanism. 441 4. Fragment Identifier Processing 443 Applications implementing support for the mechanism described in this 444 memo MUST behave as described in the following sections. 446 4.1. Handling of Line Endings in text/plain MIME Entities 448 In Internet messages, line endings in text/plain MIME entities are 449 represented by CR+LF character sequences (see RFC 2046 [1] and RFC 450 3676 [3]). However, some protocols (such as HTTP) in addition allow 451 other conventions for line endings. Also, some operating systems 452 store text/plain entities locally with different line endings (in 453 most cases, Unix uses LF, MacOS traditionally used CR, and Windows 454 uses CR+LF). 456 Independent of the number of bytes or characters used to represent a 457 line ending, each line ending MUST be counted as one single 458 character. Implementations interpreting text/plain fragment 459 identifiers MUST take into account the line ending conventions of the 460 protocols and other contexts that they work in. 462 As an example, an implementation working in the context of a Web 463 browser supporting http: URIs has to support the various line ending 464 conventions permitted by HTTP. As another example, an implementation 465 used on local files (e.g. with the file: URI scheme) has to support 466 the conventions used for local storage. All implementations SHOULD 467 support the Internet-wide CR+LF line ending convention, and MAY 468 support additional conventions not related to the protocols or 469 systems they work with. 471 Implementers should be aware of the fact that line endings in plain 472 text entities can be represented by other characters or character 473 sequences than CR+LF. Besides the abovementioned CR and LF, there 474 are also NEL and CR+NEL. In general, the encoding of line endings 475 can also depend on the character encoding of the MIME entity, and 476 implementations have to take this into account where necessary. 478 4.2. Handling of Position Values 480 If any position value (as a position or as part of a range) is 481 greater than the length of the actual MIME entity, then it identifies 482 the last character position or line position of the MIME entity. If 483 the first position value in a range is not present, then the range 484 extends from the start of the MIME entity. If the second position 485 value in a range is not present, then the range extends to the end of 486 the MIME entity. If a range scheme's positions are not properly 487 ordered (ie, the first number is less than the second), then the 488 fragment identifier MUST be ignored. 490 4.3. Handling of Hash Sums 492 Clients are not required to implement the handling of hash sums, so 493 they MAY choose to ignore hash sum information altogether. However, 494 if they do implement hash sum handling, the following applies: 496 If a fragment identifier contains a hash sum, and a client retrieves 497 a MIME entity and detects that the hash sum has changed (observing 498 the character encoding specification as described in Section 3.1, if 499 present), then the client SHOULD NOT interpret the text/plain 500 fragment identifier. A client MAY signal this situation to the user. 502 4.4. Syntax Errors in Fragment Identifiers 504 If a fragment identifier contains a syntax error (i.e., does not 505 conform to the syntax specified in Section 3), then it MUST be 506 ignored by clients. Clients MUST NOT make any attempt to correct or 507 guess fragment identifiers. Syntax errors MAY be reported by 508 clients. 510 5. Examples 512 The following examples show some usages for the fragment identifiers 513 defined in this memo. 515 http://example.com/text.txt#char=100 517 This URI identifies the position after the 100th character of the 518 text.txt MIME entity. It should be noted that it is not clear which 519 octet(s) of the MIME entity this will be without retrieving the MIME 520 entity and thus knowing which character encoding it is using (in case 521 of HTTP, this information will be given in the Content-Type header of 522 the response). If the MIME entity has fewer than 100 characters, the 523 URI identifies the position after the MIME entity's last character. 525 ftp://example.com/text.txt#line=10,20 527 This URI identifies lines 11 to 20 of the text.txt MIME entity. If 528 the MIME entity has fewer than 11 lines, it identifies the position 529 after the last line. If the MIME entity has less than 20 but at 530 least 11 lines, it identifies the range from line 11 to the last line 531 of the MIME entity. 533 ftp://example.com/text.txt#line=,1 535 This URI identifies the first line. 537 ftp://example.com/text.txt#line=10,20;length=9876,UTF-8 539 As in the second example, this URI identifies lines 11 to 20 of the 540 text.txt MIME entity. The additional length hash sum specifies that 541 the MIME entity has a length of 9876 characters when encoded in 542 UTF-8. If the client supports the length hash sum scheme, it may 543 test the retrieved MIME entity for its length, but only if the 544 retrieved MIME entity uses the UTF-8 encoding or has been locally 545 transcoded into this encoding. 547 6. IANA Considerations 549 Note to RFC Editor: Please change this section to read as follows 550 after the IANA action has been completed: "IANA has added a reference 551 to this specification in the Text/Plain Media Type registration." 553 IANA is requested to update the registration of the MIME Media type 554 text/plain at http://www.iana.org/assignments/media-types/text/ with 555 the fragment identifier defined in this memo by adding a reference to 556 this memo (with the appropriate RFC number once it is known). 558 7. Security Considerations 560 The fact that software implementing fragment identifiers for plain 561 text and software not implementing them differs in behavior, and the 562 fact that different software may show fragments to users in different 563 ways, can lead to misunderstandings on the part of users. Such 564 misunderstandings might be exploited in a way similar to spoofing or 565 phishing, although concrete examples of how this might be done are 566 not currently known. 568 Implementers and users of fragment identifiers for plain text should 569 also be aware of the security considerations in RFC 3986 [4] and RFC 570 3987 [9]. 572 8. Change Log 574 Note to RFC Editor: Please remove this section before publication. 576 8.1. From -06 to -07 (addressing IETF Last Call Comments) 578 o Completely removed regular expressions to simplify 579 implementations. 581 o Removed the possibility to combine multiple schemes. As a result, 582 fragments will always consist of consecutive characters. 584 o Changed "MacOS uses CR" to "MacOS traditionally used CR". 586 o Changed 'number' syntax rule from "number = 1*( DIGIT )" to 587 "number = 0*( DIGIT )" to take into account examples such as 588 "#line=,1". 590 o Added a sentence explaining that lengths are a weak but cheaply 591 calculated hash function. 593 o Moved UTF-8 reference to non-normative. 595 o Moved ABNF from %xdd.dd... back to direct literals, stating that 596 they are case-sensitive (see RFC 3862 for an example of this). 598 o Changed StringWithEscapedSemicolon to 599 , and said that it must not be quoted. 601 o In "Clients SHOULD NOT make any attempt to correct or guess 602 fragment identifiers.", changed "SHOULD NOT" to "MUST NOT". 604 o Removed some redundant normative text in Examples section. 606 o Added "Calculation of hash sums is done after stripping any 607 potential content-encodings or content-transfer-encodings." to 608 section on hash sums. 610 o Wording improvements and updates to Acknowledgements. 612 o Changed abstract for more clarity. 614 8.2. From -05 to -06 616 o Clarified that this is intended as an update of the text/plain 617 MIME type registration, in newly added IANA consideration section 618 and elsewhere. 620 o Added normative reference to UTF-8 (STD63/RFC3629). 622 o Fixed section about non-ASCII characters in regular expressions to 623 be more accurate re. IRIs. 625 o Fixed some text about decomposition and Unicode. 627 o Clarified that UTF-16 can also use 4 octets per character. 629 o Changed ABNF to make sure schemes are case-sensitive (string 630 literals in ABNF are case-insensitive). 632 o Used HEXDIG from RFC 4234, made clear DIGIT and HEXDIG are from 633 that spec. 635 o Specified order of decoding the various escapings. 637 o Moved section on line endings to the back, and changed 638 requirements to be more in line with practice. 640 o Added IANA Consideration section. 642 o Expanded Security Consideration section. 644 o Removed quote from RFC 3986, because the quoted text does not 645 actually exist there anymore; changed text appropriately. 647 o Reorganized section two to get rid of one section level. 649 o Added overview in introduction, and some glue text here and there. 651 o Changed to more IETF-like wording in some instances (e.g. intro to 652 this section; removing "Compliant software MUST follow this 653 specification." at the start of the Introduction,...). 655 o Removed 'where to send comments' section. 657 o Fixed wording is some cases, tried to make shorter sentences and 658 eliminate parenthesized expressions. 660 o Removed acknowledgement for xml2rfc; we are nevertheless very 661 grateful for this work! 663 8.3. From -04 to -05 665 o Added some explanatory text to the last paragraph of Section 2.3. 667 o Added a paragraph about the importance of having fragment 668 identification capabilities for out-of-line linking methods such 669 as XLink to Section 1.3. 671 o Added explanation of why the charset is important for length hash 672 sums to Section 3.1. 674 o Added text that makes hash sum handling optional and allows 675 clients to interpret fragment identifiers even if the hash sum did 676 not match (changed MUST NOT to SHOULD NOT) to Section 4.3. 678 o Added example using a length hash sum in Section 5. 680 o RFC 2234 (ABNF) has been obsoleted by [6]. 682 o Removed the "Open Issues" section for preparation of final draft 683 before submission as RFC. 685 8.4. From -03 to -04 687 o URIs are now defined by RFC 3986 [4], so the text and the 688 references have been updated. In particular, RFC3986 defines a 689 fragment identifier to be part of the URI, whereas in the 690 obsoleted RFC 2396 URI specification, it was not part of a URI as 691 such, but of a "URI reference". 693 o IRIs are now defined by RFC 3987 [9], so the text and the 694 references have been updated. 696 o Changed IPR clause from RFC 3667 to RFC 3978 (updated version of 697 RFC 3667). 699 8.5. From -02 to -03 701 o Replaced most occurrences of 'resource' with 'MIME entity', 702 because the result of dereferencing a URI is not the resource 703 itself, but some MIME entity (in our case of type text/plain) 704 representing it. Thanks to Sandro Hawke for pointing this out. 706 o Moved "Open Issues" to the very back of the document. 708 o Added Section 4 to define the processing model for fragment 709 identifiers (moved Section 4.2 from Section 3 to Section 4). 711 o Added hash scheme to make fragment identifiers more robust 712 (Section 2.3). 714 o Changed IPR clause from RFC 2026 to RFC 3667 (updated version of 715 RFC 2026). 717 8.6. From -01 to -02 719 o Fundamental change in semantics: counts turn into positions 720 (between characters or lines), so in order to identify a character 721 or line, ranges must be used (which now use positions to specify 722 the upper and lower bounds of the range). 724 o Made the first value of a range optional as well, so that line=,5 725 also is legal, identifying everything from the start of the MIME 726 entity to the 5th line. 728 o Changed the syntax from parenthesis-style to a more traditional 729 style using equals-signs. 731 8.7. From -00 to -01 733 o Made the second count value of ranges optional, so that something 734 like line(10,) is legal and properly defined. 736 o Added non-normative reference to Internet draft about non-ASCII 737 characters in search strings. 739 o Added Section 1.4 about incremental deployment. 741 o Added more elaborate examples. 743 o Added text about regex buffer overflow problems in Section 7. 745 o Added Section 4.1 about line endings in text/plain resources. 747 o Added "Open Issues" to collect open issues regarding this memo 748 (will be deleted in final RFC text). 750 9. References 752 9.1. Normative References 754 [1] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 755 Extensions (MIME) Part Two: Media Types", RFC 2046, 756 November 1996. 758 [2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 759 Extensions (MIME) Part One: Format of Internet Message Bodies", 760 RFC 2045, November 1996. 762 [3] Gellens, R., "The Text/Plain Format and DelSp Parameters", 763 RFC 3676, February 2004. 765 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 766 Resource Identifier (URI): Generic Syntax", RFC 3986, 767 January 2005. 769 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 770 Levels", RFC 2119, March 1997. 772 [6] Crocker, D. and P. Overell, "Augmented BNF for Syntax 773 Specifications: ABNF", RFC 4234, October 2005. 775 [7] Freed, N. and J. Postel, "IANA Charset Registration 776 Procedures", BCP 19, October 2000. 778 [8] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, 779 April 1992. 781 [9] Duerst, M. and M. Suignard, "Internationalized Resource 782 Identifiers (IRI)", RFC 3987, January 2005. 784 9.2. Non-Normative References 786 [10] ANSI X3.4-1986, "Coded Character Set - 7-Bit American National 787 Standard Code for Information Interchange", STD 63, RFC 3629, 788 1992. 790 [11] Yergeau, F., "UTF-8, a transformation format of ISO 10646", 791 STD 63, RFC 3629, November 2003. 793 [12] Connolly, D. and L. Masinter, "The 'text/html' Media Type", 794 RFC 2854, June 2000. 796 [13] Freed, N. and J. Klensin, "Media Type Specifications and 797 Registration Procedures", RFC 4288, December 2005. 799 [14] DeRose, S., Maler, E., and D. Orchard, "XML Linking Language 800 (XLink) Version 1.0", W3C Recommendation REC-xlink-20010627, 801 June 2001. 803 [15] Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646", 804 RFC 2781, February 2000. 806 Appendix A. Acknowledgements 808 Thanks for comments and suggestions provided by Marcel Baschnagel, 809 Stephane Bortzmeyer, Tim Bray, John Cowan, Spencer Dawkins, Benja 810 Fallenstein, Ted Hardie, Sandro Hawke, Jeffrey Hutzelman, Graham 811 Klyne, Dan Kohn, Henrik Levkowetz, Mark Nottingham, and Conrad 812 Parker. 814 Authors' Addresses 816 Erik Wilde 817 UC Berkeley 818 School of Information, 311 South Hall 819 Berkeley, CA 94720-4600 820 U.S.A. 822 Phone: +1-510-6432253 823 Email: dret@berkeley.edu 824 URI: http://dret.net/netdret/ 826 Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever 827 possible, for example as "Dürst" in XML and HTML.) 828 Aoyama Gakuin University 829 5-10-1 Fuchinobe 830 Sagamihara, Kanagawa 229-8558 831 Japan 833 Phone: +81 42 759 6329 834 Fax: +81 42 759 6495 835 Email: mailto:duerst@it.aoyama.ac.jp 836 URI: http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/ 838 Full Copyright Statement 840 Copyright (C) The IETF Trust (2007). 842 This document is subject to the rights, licenses and restrictions 843 contained in BCP 78, and except as set forth therein, the authors 844 retain all their rights. 846 This document and the information contained herein are provided on an 847 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 848 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 849 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 850 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 851 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 852 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 854 Intellectual Property 856 The IETF takes no position regarding the validity or scope of any 857 Intellectual Property Rights or other rights that might be claimed to 858 pertain to the implementation or use of the technology described in 859 this document or the extent to which any license under such rights 860 might or might not be available; nor does it represent that it has 861 made any independent effort to identify any such rights. Information 862 on the procedures with respect to rights in RFC documents can be 863 found in BCP 78 and BCP 79. 865 Copies of IPR disclosures made to the IETF Secretariat and any 866 assurances of licenses to be made available, or the result of an 867 attempt made to obtain a general license or permission for the use of 868 such proprietary rights by implementers or users of this 869 specification can be obtained from the IETF on-line IPR repository at 870 http://www.ietf.org/ipr. 872 The IETF invites any interested party to bring to its attention any 873 copyrights, patents or patent applications, or other proprietary 874 rights that may cover technology that may be required to implement 875 this standard. Please address the information to the IETF at 876 ietf-ipr@ietf.org. 878 Acknowledgment 880 Funding for the RFC Editor function is provided by the IETF 881 Administrative Support Activity (IASA).