idnits 2.17.1 

draft-wilde-text-fragment-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 852.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 863.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 870.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 876.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  -- The draft header indicates that this document updates RFC2046, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
     (Using the creation date from RFC2046, updated by this document, for
     RFC5378 checks: 1995-04-14)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 6, 2007) is 6139 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC 4234 (ref. '6') (Obsoleted by RFC 5234)

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  ** Downref: Normative reference to an Informational RFC: RFC 1321 (ref. '8')

  -- Duplicate reference: RFC3629, mentioned in '11', was also mentioned in
     '10'.

  -- Obsolete informational reference (is this intentional?): RFC 4288 (ref.
     '13') (Obsoleted by RFC 6838)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 11 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           E. Wilde
3	Internet-Draft                                               UC Berkeley
4	Updates: 2046 (if approved)                                    M. Duerst
5	Intended status: Standards Track                Aoyama Gakuin University
6	Expires: January 7, 2008                                    July 6, 2007

8	         URI Fragment Identifiers for the text/plain Media Type
9	                      draft-wilde-text-fragment-07

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on January 7, 2008.

36	Copyright Notice

38	   Copyright (C) The IETF Trust (2007).

40	Abstract

42	   This memo defines URI fragment identifiers for text/plain MIME
43	   entities.  These fragment identifiers make it possible to refer to
44	   parts of a text/plain MIME entity, either identified by character
45	   position or range, or by line position or range.  Fragment
46	   identifiers may also contain hash information to make them more
47	   robust.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
52	     1.1.  What is text/plain?  . . . . . . . . . . . . . . . . . . .  3
53	     1.2.  What is a URI Fragment Identifier? . . . . . . . . . . . .  4
54	     1.3.  Why text/plain Fragment Identifiers? . . . . . . . . . . .  4
55	     1.4.  Incremental Deployment . . . . . . . . . . . . . . . . . .  5
56	     1.5.  Notation Used in this Memo . . . . . . . . . . . . . . . .  5
57	   2.  Fragment Identification Methods  . . . . . . . . . . . . . . .  5
58	     2.1.  Fragment Identification Principles . . . . . . . . . . . .  6
59	       2.1.1.  Positions and Ranges . . . . . . . . . . . . . . . . .  6
60	       2.1.2.  Characters and Lines . . . . . . . . . . . . . . . . .  7
61	     2.2.  Combining the Principles . . . . . . . . . . . . . . . . .  7
62	       2.2.1.  Character Position . . . . . . . . . . . . . . . . . .  7
63	       2.2.2.  Character Range  . . . . . . . . . . . . . . . . . . .  8
64	       2.2.3.  Line Position  . . . . . . . . . . . . . . . . . . . .  8
65	       2.2.4.  Line Range . . . . . . . . . . . . . . . . . . . . . .  8
66	     2.3.  Fragment Identifier Robustness . . . . . . . . . . . . . .  8
67	   3.  Fragment Identification Syntax . . . . . . . . . . . . . . . .  9
68	     3.1.  Hash Sums  . . . . . . . . . . . . . . . . . . . . . . . .  9
69	   4.  Fragment Identifier Processing . . . . . . . . . . . . . . . . 10
70	     4.1.  Handling of Line Endings in text/plain MIME Entities . . . 10
71	     4.2.  Handling of Position Values  . . . . . . . . . . . . . . . 11
72	     4.3.  Handling of Hash Sums  . . . . . . . . . . . . . . . . . . 11
73	     4.4.  Syntax Errors in Fragment Identifiers  . . . . . . . . . . 11
74	   5.  Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
75	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
76	   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 13
77	   8.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 13
78	     8.1.  From -06 to -07 (addressing IETF Last Call Comments) . . . 13
79	     8.2.  From -05 to -06  . . . . . . . . . . . . . . . . . . . . . 14
80	     8.3.  From -04 to -05  . . . . . . . . . . . . . . . . . . . . . 15
81	     8.4.  From -03 to -04  . . . . . . . . . . . . . . . . . . . . . 15
82	     8.5.  From -02 to -03  . . . . . . . . . . . . . . . . . . . . . 16
83	     8.6.  From -01 to -02  . . . . . . . . . . . . . . . . . . . . . 16
84	     8.7.  From -00 to -01  . . . . . . . . . . . . . . . . . . . . . 16
85	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
86	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 17
87	     9.2.  Non-Normative References . . . . . . . . . . . . . . . . . 17
88	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 18
89	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18
90	   Intellectual Property and Copyright Statements . . . . . . . . . . 19

92	1.  Introduction

94	   This memo updates the text/plain MIME type defined in RFC 2046 [1] by
95	   defining URI fragment identifiers for text/plain MIME entities.  This
96	   makes it possible to refer to parts of a text/plain MIME entity.
97	   Such parts can be identified by either character position or range,
98	   or by line position or range.  Hash information can be added to a
99	   fragment identifier to make it more robust, enabling applications to
100	   detect changes of the entity.

102	   This section gives an introduction to the general concepts of text/
103	   plain MIME entities and URI fragment identifiers, and discusses the
104	   need for fragment identifiers for text/plain and deployment issues.
105	   Section 2 discusses the principles and methods on which this memo is
106	   based.  Section 3 defines the syntax, and Section 4 discusses
107	   processing of text/plain fragment identifiers.  Section 5 shows some
108	   examples.

110	1.1.  What is text/plain?

112	   Internet Media Types (often referred to as "MIME types") as defined
113	   in RFC 2045 [2] and RFC 2046 [1] are used to identify different types
114	   and sub-types of media.  RFC 2046 [1] and RFC 3676 [3] specify the
115	   text/plain media type, which is used for simple, unformatted text.
116	   Quoting from RFC 2046 [1]: "Plain text does not provide for or allow
117	   formatting commands, font attribute specifications, processing
118	   instructions, interpretation directives, or content markup.  Plain
119	   text is seen simply as a linear sequence of characters, possibly
120	   interrupted by line breaks or page breaks."

122	   The text/plain media type does not restrict the character encoding;
123	   any character encoding may be used.  In the absence of an explicit
124	   character encoding declaration, US-ASCII [10] is assumed as the
125	   default character encoding.  This variability of the character
126	   encoding makes it impossible to count characters in a text/plain MIME
127	   entity without taking the character encoding into account, because
128	   there are many character encodings using more than one octet per
129	   character.

131	   The biggest advantage of text/plain MIME entities is their ease of
132	   use and their portability among different platforms.  As long as they
133	   use popular character encodings (such as US-ASCII or UTF-8 [11]),
134	   they can be displayed and processed on virtually every computer
135	   system.  The only remaining interoperability issue is the
136	   representation of line endings, which is discussed in Section 4.1.

138	1.2.  What is a URI Fragment Identifier?

140	   URIs are the identification mechanism for resources on the Web. The
141	   URI syntax specified in RFC 3986 [4] optionally includes a so-called
142	   "fragment identifier", separated by a number sign ('#').  The
143	   fragment identifier consists of additional reference information to
144	   be interpreted by the user agent after the retrieval action has been
145	   successfully completed.  The semantics of a fragment identifier is a
146	   property of the data resulting from a retrieval action, regardless of
147	   the type of URI used in the reference.  Therefore, the format and
148	   interpretation of fragment identifiers is dependent on the media type
149	   of the retrieval result.

151	   The most popular fragment identifier is defined for text/html
152	   (defined in RFC 2854 [12]), and makes it possible to refer to a
153	   specific element (identified by the value of a 'name' or 'id'
154	   attribute) of an HTML document.  This makes it possible to reference
155	   a specific part of a Web page, rather than a Web page as a whole.

157	1.3.  Why text/plain Fragment Identifiers?

159	   Referring to specific parts of a resource can be very useful, because
160	   it enables users and applications to create more specific references.
161	   Users can create references to the part they really are interested in
162	   or want to talk about, rather than always pointing to a complete
163	   resource.  Even though it is suggested that fragment identification
164	   methods are specified in a media type's MIME registration (see [13]),
165	   many media types do not have fragment identification methods
166	   associated with them.

168	   Fragment identifiers are only useful if supported by the client,
169	   because they are only interpreted by the client.  Therefore, a new
170	   fragment identification method will require some time to be adopted
171	   by clients, and older clients will not support it.  However, because
172	   the URI still works even if the fragment identifier is not supported
173	   (the resource is retrieved, but the fragment identifier is not
174	   interpreted), rapid adoption is not highly critical to ensure the
175	   success of a new fragment identification method.

177	   Fragment identifiers for text/plain as defined in this memo make it
178	   possible to refer to specific parts of a text/plain MIME entity,
179	   using concepts of positions and ranges, which may be applied to
180	   characters and lines.  Thus, text/plain fragment identifiers enable
181	   users to exchange information more specifically, thereby reducing
182	   time and effort that is necessary to manually search for the relevant
183	   part of a text/plain MIME entity.

185	   The text/plain format does not support the embedding of links, so in
186	   most environments, text/plain resources can only serve as targets for
187	   links, and not as sources.  However, when combining the text/plain
188	   fragment identifiers specified in this memo with out-of-line linking
189	   mechanisms such as XLink [14], it becomes possible to "bind" link
190	   resources to text/plain resources and thereby "embed" links into
191	   text/plain resources.  Thus, the text/plain fragment identifiers
192	   specified in this memo open a path for text/plain files to become
193	   bidirectionally navigable resources in hypermedia systems such as the
194	   Web.

196	1.4.  Incremental Deployment

198	   As long as text/plain fragment identifiers are not supported
199	   universally, it is important to consider the implications of
200	   incremental deployment.  Clients (for example, Web browsers) not
201	   supporting the text/plain fragment identifier described in this memo
202	   will work with URI references to text/plain MIME entities, but they
203	   will fail to locate the sub-resource identified by the fragment
204	   identifier.  This is a reasonable fallback behavior, and in general
205	   users should take into account the possibility that a program
206	   interpreting a given URI will fail to interpret the fragment
207	   identifier part.  Since fragment identifier evaluation is local to
208	   the client (and happens after retrieving the MIME entity), there is
209	   no reliable way for a server to determine whether a requesting client
210	   is using a URI containing a fragment identifier.

212	1.5.  Notation Used in this Memo

214	   The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
215	   "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
216	   "OPTIONAL" in this document are to be interpreted as described in RFC
217	   2119 [5].

219	2.  Fragment Identification Methods

221	   The identification of fragments of text/plain MIME entities can be
222	   based on different foundations.  Since it is not possible to insert
223	   explicit, invisible identifiers into a text/plain MIME entity (as for
224	   example used in HTML documents, implemented through dedicated
225	   attributes), fragment identification has to rely on certain inherent
226	   properties of the MIME entity.  This memo specifies fragment
227	   identification using four different methods, which are character
228	   positions and ranges, and line positions and ranges, augmented by a
229	   hash sum mechanism for improving the robustness of fragment
230	   identifiers.

232	   When interpreting character or line numbers, implementations MUST
233	   take the character encoding of the MIME entity into account, because
234	   character count and octet count may differ for the character encoding
235	   being used.  For example, a MIME entity using UTF-16 encoding (as
236	   specified in RFC 2718 [15]) uses two octets per character in most
237	   cases, and sometimes four octets per character.  It can also have a
238	   leading BOM (Byte-Order Mark), which does not count as a character
239	   and thus also affects the mapping from a simple octet count to a
240	   character count.

242	2.1.  Fragment Identification Principles

244	   Fragment identification can be done by combining two orthogonal
245	   principles, which are positions and ranges, and characters and lines.
246	   This section describes the principles themselves, while Section 2.2
247	   describes the combination of the principles.

249	2.1.1.  Positions and Ranges

251	   A position does not identify an actual fragment of the MIME entity,
252	   but a position inside the MIME entity, which can be regarded as a
253	   fragment of length zero.  The use case for positions is to provide
254	   pointers for applications which may use them to implement
255	   functionalities such as "insert some text here", which needs a
256	   position rather than a fragment.  Positions are counted from zero,
257	   position zero being before the first character or line of a text/
258	   plain MIME entity.  Thus a text/plain MIME entity having one
259	   character has two positions, one before the first character (position
260	   0), and one after the first character (position 1).

262	   Since positions are fragments of length zero, applications SHOULD use
263	   other methods than highlighting to indicate positions, the most
264	   obvious way being the positioning of a cursor (if the application
265	   supports the concept of a cursor).

267	   Ranges, on the other hand, identify fragments of a MIME entity that
268	   have a length that may be greater than zero.  As a general principle
269	   for ranges, they specify both a lower and an upper bound.  The start
270	   or the end of a range specification may be omitted, defaulting to the
271	   first respectively last position of the MIME entity.  The end of a
272	   range must have a value greater than or equal to the start.  A range
273	   with identical start and end is legal, and identifies a range of
274	   length zero, which is equivalent to a position.

276	   Applications that support a concept such as highlighting SHOULD use
277	   such a concept to indicate fragments of lengths greater than zero to
278	   the user.

280	   For positions and ranges it is implicitly assumed that if a number is
281	   greater than the actual number of elements in the MIME entity, then
282	   it is referring to the last element of the MIME entity (see Section 4
283	   for details).

285	2.1.2.  Characters and Lines

287	   The concept of positions and ranges can be applied to characters or
288	   lines.  In both cases, positions indicate points between these
289	   entities, while ranges identify zero or more of these entities by
290	   indicating positions.

292	   Character positions are numbered starting with zero (ignoring initial
293	   BOM marks or similar concepts that are not part of the actual textual
294	   content of a text/plain MIME entity), and counting each character
295	   separately, with the exception of line endings, which are always
296	   counted as one character (see Section 4.1 for details).

298	   Line positions are numbered starting with zero (with line position
299	   zero always being identical with character position zero), with
300	   Section 4.1 describing how line endings are identified.  Fragments
301	   identified by lines include the line endings, so applications
302	   identifying line-based fragments MUST include the line endings in the
303	   fragment identification they are using (e.g., the highlighted
304	   selection).  If a MIME entity does not contain any line endings, then
305	   it consists of a single (the first) line.

307	2.2.  Combining the Principles

309	   In the following sections, the principles described in the preceding
310	   section (positions/ranges and characters/lines) are combined,
311	   resulting in four use cases.  The schemes mentioned below refer to
312	   the fragment identifier syntax, described in detail in Section 3.

314	2.2.1.  Character Position

316	   To identify a character position (i.e., a fragment of length zero
317	   between two characters), the 'char' scheme followed by a single
318	   number is used.  This method identifies a position between two
319	   characters (or before the first or after the last character), rather
320	   than identifying a fragment consisting of a number of characters.
321	   Character position counting starts with 0, so the character position
322	   before the first character of a text/plain MIME entity has the
323	   character position 0, and a MIME entity containing n distinct
324	   characters has n+1 distinct character positions, the last one having
325	   the character position n.

327	2.2.2.  Character Range

329	   To identify a fragment of one or more characters (a character range),
330	   the 'char' scheme followed by a range specification is used.  A
331	   character range is a consecutive region of the MIME entity that
332	   extends from the starting character position of the range to the
333	   ending character position of the range.

335	2.2.3.  Line Position

337	   To identify a line position (i.e., a fragment of length zero between
338	   two lines), the 'line' scheme followed by a single number is used.
339	   This method identifies a position between two lines (or before the
340	   first or after the last line), rather than identifying a fragment
341	   consisting of a number of lines.  Line position counting starts with
342	   0, so the line position before the first line of a text/plain MIME
343	   entity has the line position 0, and a MIME entity containing n
344	   distinct lines has n+1 distinct line positions, the last one having
345	   the line position n.

347	2.2.4.  Line Range

349	   To identify a fragment of one or more lines (a line range), the
350	   'line' scheme followed by a range specification is used.  A line
351	   range is a consecutive region of the MIME entity that extends from
352	   the starting line position of the range to the ending line position
353	   of the range.

355	2.3.  Fragment Identifier Robustness

357	   It is easily possible that a modification of the referenced resource
358	   will break a fragment identifier.  If applications want to create
359	   more robust fragment identifiers, they may do so by adding hash sums
360	   to fragment identifiers.  These hash sums are used to detect changes
361	   in the resource.  Applications can then warn users about the
362	   possibility that a fragment identifier might have been broken by a
363	   modification of the resource.

365	   Since fragment identifiers are interpreted by clients, hash sums are
366	   defined on MIME entities rather than on the resource itself, and as
367	   such are specific to a certain representation of the resource, in
368	   case of text/plain resources the character encoding of the MIME
369	   entity.

371	   Hash sums may specify the character encoding that has been used when
372	   creating the hash sums, and if such a specification is present,
373	   clients MUST check whether the character encoding specified for the
374	   hash sum and the character encoding of the retrieved MIME entity are
375	   equal, and clients MUST NOT check the hash sum if these values
376	   differ.  However, clients MAY choose to transcode the retrieved MIME
377	   entity in the case of differing character encodings, and after doing
378	   so, check the hash sum.  Please note that this method is inherently
379	   unreliable, because certain characters or character sequences may
380	   have been lost or normalized due to restrictions in one of the
381	   character encodings used.

383	3.  Fragment Identification Syntax

385	   The syntax for the text/plain fragment identifiers is
386	   straightforward.  The syntax defines three schemes, 'char', 'line',
387	   and hash (which can either be 'length' or 'md5').  The 'char' and
388	   'line' schemes can be used in two different variants, either the
389	   position variant (with a single number), or the range variant (with
390	   two comma-separated numbers).  The hash scheme can either use the
391	   'length' or the 'md5' scheme to specify a hash value. 'length' in
392	   this case serves as a very weak but easy to calculate hash function.

394	   The following syntax definition uses ABNF as defined in RFC 4234 [6],
395	   including the rules DIGIT and HEXDIG.  The mime-charset rule is
396	   defined in RFC 2978 [7].

398	   NOTE:  In the descriptions that follow, specified text values MUST be
399	      used exactly as given, using exactly the indicated lower-case
400	      letters.  In this respect, the ABNF usage differs from [6].

402	   text-fragment =  text-scheme 0*( ";" hash-scheme)
403	   text-scheme   =  ( char-scheme / line-scheme )
404	   char-scheme   =  "char=" ( position / range )
405	   line-scheme   =  "line=" ( position / range )
406	   hash-scheme   =  ( length-scheme / md5-scheme ) [ "," mime-charset ]
407	   position      =  number
408	   range         =  (position "," [ position ]) / ("," position )
409	   number        =  0*( DIGIT )
410	   length-scheme =  "length=" number
411	   md5-scheme    =  "md5=" md5-value
412	   md5-value     =  32HEXDIG

414	3.1.  Hash Sums

416	   A hash sum can either specify a MIME entity's length, or its MD5
417	   fingerprint.  In both cases, it can optionally specify the character
418	   encoding which had been used when calculating the hash sum, so that
419	   clients interpreting the fragment identifier may check whether they
420	   are using the same character encoding for their calculations.  For
421	   lengths, the character encoding can be necessary because it can
422	   influence the character count.  As an example, Unicode includes
423	   precomposed characters for writing Vietnamese, but in the windows-
424	   1258 encoding, also used for writing Vietnamese, some characters have
425	   to be encoded with separate diacritics, which means that two
426	   characters will be counted.  Applying Unicode terminology, this means
427	   that the length of a text/plain MIME entity is computed based on its
428	   "code points".  For MD5 fingerprints, the character encoding is
429	   necessary because the MD5 algorithm works on the binary
430	   representation of the text/plain resource.

432	   The length of a text/plain MIME entity is calculated by using the
433	   principles defined in Section 2.1.2.  The MD5 fingerprint of a text/
434	   plain MIME entity is calculated by using the algorithm presented in
435	   [8], encoding the result in 16 hexadecimal digits (using uppercase or
436	   lowercase letters) as a representation of the 128 bits which are the
437	   result of the MD5 algorithm.  Calculation of hash sums is done after
438	   stripping any potential content-encodings or content-transfer-
439	   encodings of the transport mechanism.

441	4.  Fragment Identifier Processing

443	   Applications implementing support for the mechanism described in this
444	   memo MUST behave as described in the following sections.

446	4.1.  Handling of Line Endings in text/plain MIME Entities

448	   In Internet messages, line endings in text/plain MIME entities are
449	   represented by CR+LF character sequences (see RFC 2046 [1] and RFC
450	   3676 [3]).  However, some protocols (such as HTTP) in addition allow
451	   other conventions for line endings.  Also, some operating systems
452	   store text/plain entities locally with different line endings (in
453	   most cases, Unix uses LF, MacOS traditionally used CR, and Windows
454	   uses CR+LF).

456	   Independent of the number of bytes or characters used to represent a
457	   line ending, each line ending MUST be counted as one single
458	   character.  Implementations interpreting text/plain fragment
459	   identifiers MUST take into account the line ending conventions of the
460	   protocols and other contexts that they work in.

462	   As an example, an implementation working in the context of a Web
463	   browser supporting http: URIs has to support the various line ending
464	   conventions permitted by HTTP.  As another example, an implementation
465	   used on local files (e.g. with the file: URI scheme) has to support
466	   the conventions used for local storage.  All implementations SHOULD
467	   support the Internet-wide CR+LF line ending convention, and MAY
468	   support additional conventions not related to the protocols or
469	   systems they work with.

471	   Implementers should be aware of the fact that line endings in plain
472	   text entities can be represented by other characters or character
473	   sequences than CR+LF.  Besides the abovementioned CR and LF, there
474	   are also NEL and CR+NEL.  In general, the encoding of line endings
475	   can also depend on the character encoding of the MIME entity, and
476	   implementations have to take this into account where necessary.

478	4.2.  Handling of Position Values

480	   If any position value (as a position or as part of a range) is
481	   greater than the length of the actual MIME entity, then it identifies
482	   the last character position or line position of the MIME entity.  If
483	   the first position value in a range is not present, then the range
484	   extends from the start of the MIME entity.  If the second position
485	   value in a range is not present, then the range extends to the end of
486	   the MIME entity.  If a range scheme's positions are not properly
487	   ordered (ie, the first number is less than the second), then the
488	   fragment identifier MUST be ignored.

490	4.3.  Handling of Hash Sums

492	   Clients are not required to implement the handling of hash sums, so
493	   they MAY choose to ignore hash sum information altogether.  However,
494	   if they do implement hash sum handling, the following applies:

496	   If a fragment identifier contains a hash sum, and a client retrieves
497	   a MIME entity and detects that the hash sum has changed (observing
498	   the character encoding specification as described in Section 3.1, if
499	   present), then the client SHOULD NOT interpret the text/plain
500	   fragment identifier.  A client MAY signal this situation to the user.

502	4.4.  Syntax Errors in Fragment Identifiers

504	   If a fragment identifier contains a syntax error (i.e., does not
505	   conform to the syntax specified in Section 3), then it MUST be
506	   ignored by clients.  Clients MUST NOT make any attempt to correct or
507	   guess fragment identifiers.  Syntax errors MAY be reported by
508	   clients.

510	5.  Examples

512	   The following examples show some usages for the fragment identifiers
513	   defined in this memo.

515	   http://example.com/text.txt#char=100

517	   This URI identifies the position after the 100th character of the
518	   text.txt MIME entity.  It should be noted that it is not clear which
519	   octet(s) of the MIME entity this will be without retrieving the MIME
520	   entity and thus knowing which character encoding it is using (in case
521	   of HTTP, this information will be given in the Content-Type header of
522	   the response).  If the MIME entity has fewer than 100 characters, the
523	   URI identifies the position after the MIME entity's last character.

525	   ftp://example.com/text.txt#line=10,20

527	   This URI identifies lines 11 to 20 of the text.txt MIME entity.  If
528	   the MIME entity has fewer than 11 lines, it identifies the position
529	   after the last line.  If the MIME entity has less than 20 but at
530	   least 11 lines, it identifies the range from line 11 to the last line
531	   of the MIME entity.

533	   ftp://example.com/text.txt#line=,1

535	   This URI identifies the first line.

537	   ftp://example.com/text.txt#line=10,20;length=9876,UTF-8

539	   As in the second example, this URI identifies lines 11 to 20 of the
540	   text.txt MIME entity.  The additional length hash sum specifies that
541	   the MIME entity has a length of 9876 characters when encoded in
542	   UTF-8.  If the client supports the length hash sum scheme, it may
543	   test the retrieved MIME entity for its length, but only if the
544	   retrieved MIME entity uses the UTF-8 encoding or has been locally
545	   transcoded into this encoding.

547	6.  IANA Considerations

549	   Note to RFC Editor: Please change this section to read as follows
550	   after the IANA action has been completed: "IANA has added a reference
551	   to this specification in the Text/Plain Media Type registration."

553	   IANA is requested to update the registration of the MIME Media type
554	   text/plain at http://www.iana.org/assignments/media-types/text/ with
555	   the fragment identifier defined in this memo by adding a reference to
556	   this memo (with the appropriate RFC number once it is known).

558	7.  Security Considerations

560	   The fact that software implementing fragment identifiers for plain
561	   text and software not implementing them differs in behavior, and the
562	   fact that different software may show fragments to users in different
563	   ways, can lead to misunderstandings on the part of users.  Such
564	   misunderstandings might be exploited in a way similar to spoofing or
565	   phishing, although concrete examples of how this might be done are
566	   not currently known.

568	   Implementers and users of fragment identifiers for plain text should
569	   also be aware of the security considerations in RFC 3986 [4] and RFC
570	   3987 [9].

572	8.  Change Log

574	   Note to RFC Editor: Please remove this section before publication.

576	8.1.  From -06 to -07 (addressing IETF Last Call Comments)

578	   o  Completely removed regular expressions to simplify
579	      implementations.

581	   o  Removed the possibility to combine multiple schemes.  As a result,
582	      fragments will always consist of consecutive characters.

584	   o  Changed "MacOS uses CR" to "MacOS traditionally used CR".

586	   o  Changed 'number' syntax rule from "number = 1*( DIGIT )" to
587	      "number = 0*( DIGIT )" to take into account examples such as
588	      "#line=,1".

590	   o  Added a sentence explaining that lengths are a weak but cheaply
591	      calculated hash function.

593	   o  Moved UTF-8 reference to non-normative.

595	   o  Moved ABNF from %xdd.dd... back to direct literals, stating that
596	      they are case-sensitive (see RFC 3862 for an example of this).

598	   o  Changed StringWithEscapedSemicolon to
599	      <StringWithEscapedSemicolon>, and said that it must not be quoted.

601	   o  In "Clients SHOULD NOT make any attempt to correct or guess
602	      fragment identifiers.", changed "SHOULD NOT" to "MUST NOT".

604	   o  Removed some redundant normative text in Examples section.

606	   o  Added "Calculation of hash sums is done after stripping any
607	      potential content-encodings or content-transfer-encodings." to
608	      section on hash sums.

610	   o  Wording improvements and updates to Acknowledgements.

612	   o  Changed abstract for more clarity.

614	8.2.  From -05 to -06

616	   o  Clarified that this is intended as an update of the text/plain
617	      MIME type registration, in newly added IANA consideration section
618	      and elsewhere.

620	   o  Added normative reference to UTF-8 (STD63/RFC3629).

622	   o  Fixed section about non-ASCII characters in regular expressions to
623	      be more accurate re.  IRIs.

625	   o  Fixed some text about decomposition and Unicode.

627	   o  Clarified that UTF-16 can also use 4 octets per character.

629	   o  Changed ABNF to make sure schemes are case-sensitive (string
630	      literals in ABNF are case-insensitive).

632	   o  Used HEXDIG from RFC 4234, made clear DIGIT and HEXDIG are from
633	      that spec.

635	   o  Specified order of decoding the various escapings.

637	   o  Moved section on line endings to the back, and changed
638	      requirements to be more in line with practice.

640	   o  Added IANA Consideration section.

642	   o  Expanded Security Consideration section.

644	   o  Removed quote from RFC 3986, because the quoted text does not
645	      actually exist there anymore; changed text appropriately.

647	   o  Reorganized section two to get rid of one section level.

649	   o  Added overview in introduction, and some glue text here and there.

651	   o  Changed to more IETF-like wording in some instances (e.g. intro to
652	      this section; removing "Compliant software MUST follow this
653	      specification." at the start of the Introduction,...).

655	   o  Removed 'where to send comments' section.

657	   o  Fixed wording is some cases, tried to make shorter sentences and
658	      eliminate parenthesized expressions.

660	   o  Removed acknowledgement for xml2rfc; we are nevertheless very
661	      grateful for this work!

663	8.3.  From -04 to -05

665	   o  Added some explanatory text to the last paragraph of Section 2.3.

667	   o  Added a paragraph about the importance of having fragment
668	      identification capabilities for out-of-line linking methods such
669	      as XLink to Section 1.3.

671	   o  Added explanation of why the charset is important for length hash
672	      sums to Section 3.1.

674	   o  Added text that makes hash sum handling optional and allows
675	      clients to interpret fragment identifiers even if the hash sum did
676	      not match (changed MUST NOT to SHOULD NOT) to Section 4.3.

678	   o  Added example using a length hash sum in Section 5.

680	   o  RFC 2234 (ABNF) has been obsoleted by [6].

682	   o  Removed the "Open Issues" section for preparation of final draft
683	      before submission as RFC.

685	8.4.  From -03 to -04

687	   o  URIs are now defined by RFC 3986 [4], so the text and the
688	      references have been updated.  In particular, RFC3986 defines a
689	      fragment identifier to be part of the URI, whereas in the
690	      obsoleted RFC 2396 URI specification, it was not part of a URI as
691	      such, but of a "URI reference".

693	   o  IRIs are now defined by RFC 3987 [9], so the text and the
694	      references have been updated.

696	   o  Changed IPR clause from RFC 3667 to RFC 3978 (updated version of
697	      RFC 3667).

699	8.5.  From -02 to -03

701	   o  Replaced most occurrences of 'resource' with 'MIME entity',
702	      because the result of dereferencing a URI is not the resource
703	      itself, but some MIME entity (in our case of type text/plain)
704	      representing it.  Thanks to Sandro Hawke for pointing this out.

706	   o  Moved "Open Issues" to the very back of the document.

708	   o  Added Section 4 to define the processing model for fragment
709	      identifiers (moved Section 4.2 from Section 3 to Section 4).

711	   o  Added hash scheme to make fragment identifiers more robust
712	      (Section 2.3).

714	   o  Changed IPR clause from RFC 2026 to RFC 3667 (updated version of
715	      RFC 2026).

717	8.6.  From -01 to -02

719	   o  Fundamental change in semantics: counts turn into positions
720	      (between characters or lines), so in order to identify a character
721	      or line, ranges must be used (which now use positions to specify
722	      the upper and lower bounds of the range).

724	   o  Made the first value of a range optional as well, so that line=,5
725	      also is legal, identifying everything from the start of the MIME
726	      entity to the 5th line.

728	   o  Changed the syntax from parenthesis-style to a more traditional
729	      style using equals-signs.

731	8.7.  From -00 to -01

733	   o  Made the second count value of ranges optional, so that something
734	      like line(10,) is legal and properly defined.

736	   o  Added non-normative reference to Internet draft about non-ASCII
737	      characters in search strings.

739	   o  Added Section 1.4 about incremental deployment.

741	   o  Added more elaborate examples.

743	   o  Added text about regex buffer overflow problems in Section 7.

745	   o  Added Section 4.1 about line endings in text/plain resources.

747	   o  Added "Open Issues" to collect open issues regarding this memo
748	      (will be deleted in final RFC text).

750	9.  References

752	9.1.  Normative References

754	   [1]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
755	         Extensions (MIME) Part Two: Media Types", RFC 2046,
756	         November 1996.

758	   [2]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
759	         Extensions (MIME) Part One: Format of Internet Message Bodies",
760	         RFC 2045, November 1996.

762	   [3]   Gellens, R., "The Text/Plain Format and DelSp Parameters",
763	         RFC 3676, February 2004.

765	   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
766	         Resource Identifier (URI): Generic Syntax", RFC 3986,
767	         January 2005.

769	   [5]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
770	         Levels", RFC 2119, March 1997.

772	   [6]   Crocker, D. and P. Overell, "Augmented BNF for Syntax
773	         Specifications: ABNF", RFC 4234, October 2005.

775	   [7]   Freed, N. and J. Postel, "IANA Charset Registration
776	         Procedures", BCP 19, October 2000.

778	   [8]   Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321,
779	         April 1992.

781	   [9]   Duerst, M. and M. Suignard, "Internationalized Resource
782	         Identifiers (IRI)", RFC 3987, January 2005.

784	9.2.  Non-Normative References

786	   [10]  ANSI X3.4-1986, "Coded Character Set - 7-Bit American National
787	         Standard Code for Information Interchange", STD 63, RFC 3629,
788	         1992.

790	   [11]  Yergeau, F., "UTF-8, a transformation format of ISO 10646",
791	         STD 63, RFC 3629, November 2003.

793	   [12]  Connolly, D. and L. Masinter, "The 'text/html' Media Type",
794	         RFC 2854, June 2000.

796	   [13]  Freed, N. and J. Klensin, "Media Type Specifications and
797	         Registration Procedures", RFC 4288, December 2005.

799	   [14]  DeRose, S., Maler, E., and D. Orchard, "XML Linking Language
800	         (XLink) Version 1.0", W3C Recommendation REC-xlink-20010627,
801	         June 2001.

803	   [15]  Hoffman, P. and F. Yergeau, "UTF-16, an encoding of ISO 10646",
804	         RFC 2781, February 2000.

806	Appendix A.  Acknowledgements

808	   Thanks for comments and suggestions provided by Marcel Baschnagel,
809	   Stephane Bortzmeyer, Tim Bray, John Cowan, Spencer Dawkins, Benja
810	   Fallenstein, Ted Hardie, Sandro Hawke, Jeffrey Hutzelman, Graham
811	   Klyne, Dan Kohn, Henrik Levkowetz, Mark Nottingham, and Conrad
812	   Parker.

814	Authors' Addresses

816	   Erik Wilde
817	   UC Berkeley
818	   School of Information, 311 South Hall
819	   Berkeley, CA 94720-4600
820	   U.S.A.

822	   Phone: +1-510-6432253
823	   Email: dret@berkeley.edu
824	   URI:   http://dret.net/netdret/

826	   Martin Duerst (Note: Please write "Duerst" with u-umlaut wherever
827	                 possible, for example as "D&#252;rst" in XML and HTML.)
828	   Aoyama Gakuin University
829	   5-10-1 Fuchinobe
830	   Sagamihara, Kanagawa  229-8558
831	   Japan

833	   Phone: +81 42 759 6329
834	   Fax:   +81 42 759 6495
835	   Email: mailto:duerst@it.aoyama.ac.jp
836	   URI:   http://www.sw.it.aoyama.ac.jp/D%C3%BCrst/

838	Full Copyright Statement

840	   Copyright (C) The IETF Trust (2007).

842	   This document is subject to the rights, licenses and restrictions
843	   contained in BCP 78, and except as set forth therein, the authors
844	   retain all their rights.

846	   This document and the information contained herein are provided on an
847	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
848	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
849	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
850	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
851	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
852	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

854	Intellectual Property

856	   The IETF takes no position regarding the validity or scope of any
857	   Intellectual Property Rights or other rights that might be claimed to
858	   pertain to the implementation or use of the technology described in
859	   this document or the extent to which any license under such rights
860	   might or might not be available; nor does it represent that it has
861	   made any independent effort to identify any such rights.  Information
862	   on the procedures with respect to rights in RFC documents can be
863	   found in BCP 78 and BCP 79.

865	   Copies of IPR disclosures made to the IETF Secretariat and any
866	   assurances of licenses to be made available, or the result of an
867	   attempt made to obtain a general license or permission for the use of
868	   such proprietary rights by implementers or users of this
869	   specification can be obtained from the IETF on-line IPR repository at
870	   http://www.ietf.org/ipr.

872	   The IETF invites any interested party to bring to its attention any
873	   copyrights, patents or patent applications, or other proprietary
874	   rights that may cover technology that may be required to implement
875	   this standard.  Please address the information to the IETF at
876	   ietf-ipr@ietf.org.

878	Acknowledgment

880	   Funding for the RFC Editor function is provided by the IETF
881	   Administrative Support Activity (IASA).