idnits 2.17.1 

draft-wilde-text-fragment-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  == Line 85 has weird spacing: '...  allow  forma...'

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 11, 2002) is 7957 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO9945-2'

  ** Obsolete normative reference: RFC 2234 (Obsoleted by RFC 4234)

  ** Obsolete normative reference: RFC 2396 (Obsoleted by RFC 3986)

  ** Obsolete normative reference: RFC 2646 (Obsoleted by RFC 3676)

  -- Obsolete informational reference (is this intentional?): RFC 2629
     (Obsoleted by RFC 7749)


     Summary: 5 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           E. Wilde
3	Internet-Draft                                Swiss Federal Institute of
4	Expires: January 9, 2003                                      Technology
5	                                                           July 11, 2002

7	         URI Fragment Identifiers for the text/plain Media Type
8	                      draft-wilde-text-fragment-00

10	Status of this Memo

12	   This document is an Internet-Draft and is in full conformance with
13	   all provisions of Section 10 of RFC2026.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six months
21	   and may be updated, replaced, or obsoleted by other documents at any
22	   time.  It is inappropriate to use Internet-Drafts as reference
23	   material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at http://
26	   www.ietf.org/ietf/1id-abstracts.txt.

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html.

31	   This Internet-Draft will expire on January 9, 2003.

33	Copyright Notice

35	   Copyright (C) The Internet Society (2002).  All Rights Reserved.

37	Abstract

39	   This memo defines URI fragment identifiers for text/plain resources.
40	   These fragment identifiers make it possible to refer to parts of a
41	   text resource, identified by character count or range, line count or
42	   range, or a regular expression.  These identification methods can be
43	   combined to identify more than one sub-resource of a text/plain
44	   resource.

46	Table of Contents

48	   1.    Introduction . . . . . . . . . . . . . . . . . . . . . . . .  3
49	   1.1   What is text/plain?  . . . . . . . . . . . . . . . . . . . .  3
50	   1.2   What is a URI Fragment Identifier? . . . . . . . . . . . . .  3
51	   1.3   Why text/plain Fragment Identifiers? . . . . . . . . . . . .  4
52	   2.    Fragment Identification Methods  . . . . . . . . . . . . . .  4
53	   2.1   Fragment Identification Schemes  . . . . . . . . . . . . . .  5
54	   2.1.1 Character Count  . . . . . . . . . . . . . . . . . . . . . .  5
55	   2.1.2 Character Range  . . . . . . . . . . . . . . . . . . . . . .  5
56	   2.1.3 Line Count . . . . . . . . . . . . . . . . . . . . . . . . .  5
57	   2.1.4 Line Range . . . . . . . . . . . . . . . . . . . . . . . . .  5
58	   2.1.5 Regular Expressions  . . . . . . . . . . . . . . . . . . . .  5
59	   2.1.6 Combining Fragment Identification Schemes  . . . . . . . . .  6
60	   3.    Fragment Identification Syntax . . . . . . . . . . . . . . .  6
61	   4.    Examples . . . . . . . . . . . . . . . . . . . . . . . . . .  7
62	   5.    Security Considerations  . . . . . . . . . . . . . . . . . .  8
63	         Normative References . . . . . . . . . . . . . . . . . . . .  8
64	         Non-Normative References . . . . . . . . . . . . . . . . . .  8
65	         Author's Address . . . . . . . . . . . . . . . . . . . . . .  9
66	   A.    POSIX BRE Syntax . . . . . . . . . . . . . . . . . . . . . .  9
67	   B.    Where to send Comments . . . . . . . . . . . . . . . . . . .  9
68	   C.    Acknowledgements . . . . . . . . . . . . . . . . . . . . . .  9
69	         Full Copyright Statement . . . . . . . . . . . . . . . . . . 10

71	1. Introduction

73	   Compliant software MUST follow this specification.  The capitalized
74	   key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
75	   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in this
76	   document are to be interpreted as described in RFC 2119 [RFC2119].

78	1.1 What is text/plain?

80	   Internet Media Types as defined in RFC 2045 [RFC2045] and RFC 2046
81	   [RFC2046] are used to identify different types and sub-types of
82	   media.  RFC 2046 [RFC2046] and RFC 2646 [RFC2646] specify the text/
83	   plain media type, which is used for simple, unformatted text.
84	   Quoting from RFC 2046 [RFC2046]: "Plain text does not provide for or
85	   allow  formatting commands, font attribute specifications, processing
86	   instructions, interpretation directives, or content markup.  Plain
87	   text is seen simply as a linear sequence of characters, possibly
88	   interrupted by line breaks or page breaks."

90	   The text/plain media type does not restrict the character encoding,
91	   any character encoding may be used.  In the absence of an explicit
92	   character encoding declaration, US-ASCII is assumed as the default
93	   character encoding.  This variability of the character encoding makes
94	   it impossible to count characters in a text/plain resource without
95	   taking the character encoding into account, because there are many
96	   character encodings using more than one octet per character.

98	   The biggest advantage of text/plain resources is their portability
99	   among different platforms.  As long as they use popular character
100	   encodings (such as US-ASCII), they can be displayed and processed on
101	   virtually every computer system.

103	1.2 What is a URI Fragment Identifier?

105	   URIs are the identification mechanism for resources on the Web.  The
106	   URI syntax specified in RFC 2396 [RFC2396] includes as part of a URI
107	   reference a fragment identifier, which (quoting from RFC 2396
108	   [RFC2396]) "consists of additional reference information to be
109	   interpreted by the user agent after the retrieval action has been
110	   successfully completed.  As such, it is not part of a URI, but is
111	   often used in conjunction with a URI".

113	   The most popular fragment identifier is defined for text/html
114	   (defined in RFC 2854 [RFC2854]), and makes it possible to refer to a
115	   specific element of an HTML document.

117	1.3 Why text/plain Fragment Identifiers?

119	   Referring to specific parts of a resource can be very useful, because
120	   it enables users to create more specific references.  Rather than
121	   pointing to a whole resource, users can create references to the part
122	   they really are interested in or want to talk about.  Even though it
123	   is suggested that fragment identification methods are specified in a
124	   media type's MIME registration, many media types do not have fragment
125	   identification methods associated with them.

127	   Fragment identifiers are only useful if supported by the client,
128	   because they are only interpreted by the client.  Therefore, a new
129	   fragment identification method will require some time to be adopted
130	   by clients, and older clients will not support it.  However, because
131	   the URI reference still works even if the fragment identifier is not
132	   supported (the resource is retrieved, but the fragment identifier is
133	   not interpreted), rapid adoption is not highly critical to ensure the
134	   success of a new fragment identification method.

136	   Fragment identifiers for text/plain make it possible to refer to
137	   specific parts of a text resource, either by line count, by character
138	   count, or by using a regular expression for searching for a specific
139	   character sequence.  Thus, text/plain fragment identifiers enable
140	   users to exchange information more specifically, thereby reducing
141	   time and effort that is necessary to manually search for the relevant
142	   part of a text/plain resource.

144	2. Fragment Identification Methods

146	   The identification of resource fragments of text/plain resources can
147	   be based on different foundations.  Since it is not necessary to
148	   insert explicit identifiers into a text/plain resource (as is
149	   possible with HTML documents by using special attributes), fragment
150	   identification has to rely on certain inherent criteria of the
151	   resource.  This memo specifies fragment identification using five
152	   different methods, character counts and ranges, line counts and
153	   ranges, and regular expression matching.

155	   When interpreting character or line numbers, implementations MUST
156	   take the character encoding of the resource into account, because
157	   character count and octet count may differ for the character encoding
158	   being used.  For example, a resource using UTF-16 encoding uses two
159	   octets per character, and it may have a leading BOM (Byte-Order Mark)
160	   which does not count as a character and thus also affects the mapping
161	   from a simple octet count to a character count.

163	2.1 Fragment Identification Schemes

165	2.1.1 Character Count

167	   The simplest way to identify a fragment is to point to a certain
168	   character of the resource.  Rather than identifying a fragment
169	   consisting of a number of characters, this method only identifies a
170	   single character, but this often is sufficient by referring to the
171	   start of a region of interest.  Character counting starts with 1, so
172	   the first character of a text/plain resource has the count 1.

174	2.1.2 Character Range

176	   If it is necessary to identify a fragment of multiple characters
177	   using character counting, this can be done by using a character
178	   range.  A character range is a consecutive region of the resource
179	   that extends from the starting character of the range to the ending
180	   character of the range.  The ending character of the range must have
181	   a greater number than the starting character.

183	2.1.3 Line Count

185	   Lines in text/plain resources are separated by CRLF sequences, and
186	   consequently it is easy to identify lines.  Because lines are the
187	   only structural property of text/plain resources, it is possible to
188	   identify a fragment of a resource by referring to a particular line.
189	   Line counting starts with 1, so the first line of a text/plain
190	   resource has the count 1.  If a resource does not contain any CRLF
191	   sequences, then it consists of a single (the first) line.

193	2.1.4 Line Range

195	   If it is necessary to identify a fragment of multiple lines using
196	   line counting, this can be done by using a line range.  A line range
197	   is a consecutive region of the resource that extends from the
198	   starting line of the range to the ending line of the range.  The
199	   ending line of the range must have a greater number than the starting
200	   line.

202	2.1.5 Regular Expressions

204	   A common problem with fragment identifiers is their robustness (to
205	   changes in the resource), and character and line counts can be broken
206	   very easily.  A more robust way of identifying a fragment is by
207	   searching for a specific pattern.  Thus, it is possible to use a
208	   Basic Regular Expression (BRE) as defined by ISO 9945-2 [ISO9945-2]
209	   (the POSIX standard) as a fragment identifier (Appendix A contains a
210	   short summary of the POSIX BRE syntax).

212	2.1.6 Combining Fragment Identification Schemes

214	   While in most cases only one fragment identification scheme will be
215	   used, it is possible to combine them.  By simply concatenating
216	   different fragment identification schemes, the whole fragment
217	   identifier refers to the union of all parts of the text resource
218	   identified by the individual fragment identification schemes.  This
219	   way, it is possible to identify disjoint ranges, such as multiple
220	   line ranges.

222	   It should be noticed that regular expressions by themselves may
223	   identify disjoint fragments, which is true in any case where the
224	   regular expression matches more than one occurrence in the resource.

226	   Since disjoint fragments can be identified, implementations SHOULD
227	   make sure that these fragments are appropriately marked, for example
228	   by highlighting the fragment (rather than only scrolling to some
229	   line, which only identifies a single location in the resource).
230	   However, the exact method of how implementations deal with disjoint
231	   fragments depends on the application and interface, and is beyond the
232	   scope of this memo.

234	3. Fragment Identification Syntax

236	   The syntax for the fragment identifiers is very straightforward.  The
237	   syntax defines three schemes, 'char', 'line', and 'match'.  The
238	   'char' and 'line' can be used in two different variants, either the
239	   count variant (with a single number), or the range variant (with two
240	   comma-separated numbers).  The 'match' scheme has a regular
241	   expression as parameter, which must be specified as a string with
242	   balanced parentheses.

244	   The following syntax definition uses ABNF as defined in RFC 2234
245	   [RFC2234].

247	   text-fragment =  1*text-scheme
248	   text-scheme   =  ( char-scheme / line-scheme / regex-scheme )
249	   char-scheme   =  "char(" ( count / range ) ")"
250	   line-scheme   =  "line(" ( count / range ) ")"
251	   match-scheme  =  "match(" regex ")"
252	   count         =  1*DIGIT
253	   range         =  count "," count
254	   regex         =  StringWithBalancedParens

256	   The StringWithBalancedParens may only contain balanced parentheses,
257	   if unbalanced parentheses need to be used, they must be escaped with
258	   a '^' character.  A literal '^' must be escaped as '^^'.  Thus,
259	   before interpreting the StringWithBalancedParens as a BRE, it must be
260	   searched for '^(', '^)', and '^^', and these strings must be
261	   substituted with their unescaped variants.

263	   If any count value is greater than the value for the actual resource,
264	   then it identifies the last character or line of the resource.  If a
265	   range scheme's counts are not properly ordered (ie, the first number
266	   is less than the second), then this scheme part has to be ignored.

268	4. Examples

270	   The following examples show some usages for the fragment identifiers
271	   defined in this memo.

273	   http://example.com/text.txt#char(100)

275	   This URI reference identifies the 100th character of the text.txt
276	   resource.  It should be noted that it is not clear which octet(s) of
277	   the resource this will be without retrieving the resource and thus
278	   knowing which character encoding is used for it (in case of HTTP,
279	   this information will be given in the response's Content-type
280	   header).

282	   http://example.com/text.txt#line(10,20)

284	   This URI reference identifies lines 10 to 20 of the text.txt
285	   resource.  If the resource has fewer than 10 lines, it identifies the
286	   last line.  If the resource has less than 20 but at least 10 lines,
287	   it identifies the lines 10 to the last line of the resource.

289	   http://example.com/text.txt#match(searchterm)

291	   This URI reference identifies all occurrences of the regular
292	   expression 'searchterm' in the resource, ie all occurrences of the
293	   string 'searchterm'.  If there is more than one occurrence, then this
294	   URI references a disjoint fragment, consisting of all of these
295	   occurrences.

297	   http://example.com/text.txt#line(1)match(searchterm)

299	   This URI reference identifies the first line and all occurrences of
300	   the regular expression 'searchterm' in the resource.  If there is an
301	   occurrence of 'searchterm' outside of the first line, then this URI
302	   references a disjoint fragment.

304	   http://example.com/text.txt#match(hello%5E()

306	   This URI reference identifies all occurrences of the regular
307	   expression 'hello(' in the resource.  It must first be URL decoded,
308	   which leads to the scheme part 'hello^('.  This is then interpreted
309	   according to the definition of a string with balanced parentheses,
310	   treating the '^(' as an escaped '(', so that the actual regular
311	   expression is 'hello('.  If there is more than one occurrence of this
312	   regular expression, then this URI references a disjoint fragment,
313	   consisting of all of these occurrences.

315	5. Security Considerations

317	   There are no relevant security considerations for this memo.

319	Normative References

321	   [ISO9945-2]  International Organization for Standardization,
322	                "Information technology - Portable Operating System
323	                Interface (POSIX) - Part 2: Shell and Utilities", ISO
324	                9945-2, xxxxx 1993.

326	   [RFC2045]    Freed, N. and N. Borenstein, "Multipurpose Internet Mail
327	                Extensions (MIME) Part One: Format of Internet Message
328	                Bodies", RFC 2045, November 1996.

330	   [RFC2046]    Freed, N. and N. Borenstein, "Multipurpose Internet Mail
331	                Extensions (MIME) Part Two: Media Types", RFC 2046,
332	                November 1996.

334	   [RFC2119]    Bradner, S., "Key words for use in RFCs to Indicate
335	                Requirement Levels", RFC 2119, March 1997.

337	   [RFC2234]    Crocker, D. and P. Overell, "Augmented BNF for Syntax
338	                Specifications: ABNF", RFC 2234, November 1997.

340	   [RFC2396]    Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
341	                Resource Identifiers (URI): Generic Syntax", RFC 2396,
342	                August 1998.

344	   [RFC2646]    Gellens, R., "The Text/Plain Format Parameter", RFC
345	                2646, August 1999.

347	Non-Normative References

349	   [RFC2629]  Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
350	              June 1999.

352	   [RFC2854]  Connolly, D. and L. Masinter, "The 'text/html' Media
353	              Type", RFC 2854, June 2000.

355	Author's Address

357	   Erik Wilde
358	   Swiss Federal Institute of Technology
359	   ETH-Zentrum
360	   8092 Zurich
361	   Switzerland

363	   Phone: +41-1-6325132
364	   EMail: ietf@dret.net
365	   URI:   http://dret.net/netdret/

367	Appendix A. POSIX BRE Syntax

369	   This section contains a short (and non-normative) summary of the
370	   POSIX BRE synatx defined in ISO 9945-2 [ISO9945-2].

372	   (tbd - is there some rfc that could be referenced instead?)

374	Appendix B. Where to send Comments

376	   Please send all comments about this document to Erik Wilde.

378	Appendix C. Acknowledgements

380	   This document has been written using the IETF document DTD described
381	   in RFC 2629 [RFC2629].

383	Full Copyright Statement

385	   Copyright (C) The Internet Society (2002).  All Rights Reserved.

387	   This document and translations of it may be copied and furnished to
388	   others, and derivative works that comment on or otherwise explain it
389	   or assist in its implementation may be prepared, copied, published
390	   and distributed, in whole or in part, without restriction of any
391	   kind, provided that the above copyright notice and this paragraph are
392	   included on all such copies and derivative works.  However, this
393	   document itself may not be modified in any way, such as by removing
394	   the copyright notice or references to the Internet Society or other
395	   Internet organizations, except as needed for the purpose of
396	   developing Internet standards in which case the procedures for
397	   copyrights defined in the Internet Standards process must be
398	   followed, or as required to translate it into languages other than
399	   English.

401	   The limited permissions granted above are perpetual and will not be
402	   revoked by the Internet Society or its successors or assigns.

404	   This document and the information contained herein is provided on an
405	   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
406	   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
407	   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
408	   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
409	   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

411	Acknowledgement

413	   Funding for the RFC Editor function is currently provided by the
414	   Internet Society.