idnits 2.17.1 

draft-hausenblas-csv-fragment-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The abstract seems to contain references ([11], [12]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.

  -- The draft header indicates that this document updates RFC4180, but the
     abstract doesn't seem to mention this, which it should.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document seems to lack the recommended RFC 2119 boilerplate, even if
     it appears to use RFC 2119 keywords. 

     (The document does seem to have the reference to RFC 2119 which the
     ID-Checklist requires).
     (Using the creation date from RFC4180, updated by this document, for
     RFC5378 checks: 2005-02-03)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 2, 2013) is 3951 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  ** Obsolete normative reference: RFC 4234 (ref. '6') (Obsoleted by RFC 5234)


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      M. Hausenblas
3	Internet-Draft                                          DERI, NUI Galway
4	Updates: 4180 (if approved)                                     E. Wilde
5	Intended status: Informational                           EMC Corporation
6	Expires: January 3, 2014                                     J. Tennison
7	                                                     Open Data Institute
8	                                                            July 2, 2013

10	          URI Fragment Identifiers for the text/csv Media Type
11	                    draft-hausenblas-csv-fragment-04

13	Abstract

15	   This memo defines URI fragment identifiers for text/csv MIME
16	   entities.  These fragment identifiers make it possible to refer to
17	   parts of a text/csv MIME entity, identified by row, column, or cell.
18	   Fragment identification can use single items, or ranges.

20	Note to Readers

22	   This draft should be discussed on the apps-discuss mailing list [11].

24	   Online access to all versions and files is available on github [12].

26	Status of this Memo

28	   This Internet-Draft is submitted in full conformance with the
29	   provisions of BCP 78 and BCP 79.

31	   Internet-Drafts are working documents of the Internet Engineering
32	   Task Force (IETF).  Note that other groups may also distribute
33	   working documents as Internet-Drafts.  The list of current Internet-
34	   Drafts is at http://datatracker.ietf.org/drafts/current/.

36	   Internet-Drafts are draft documents valid for a maximum of six months
37	   and may be updated, replaced, or obsoleted by other documents at any
38	   time.  It is inappropriate to use Internet-Drafts as reference
39	   material or to cite them other than as "work in progress."

41	   This Internet-Draft will expire on January 3, 2014.

43	Copyright Notice

45	   Copyright (c) 2013 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the Simplified BSD License.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	     1.1.  What is text/csv?  . . . . . . . . . . . . . . . . . . . .  3
62	     1.2.  Why text/csv Fragment Identifiers? . . . . . . . . . . . .  3
63	       1.2.1.  Motivation . . . . . . . . . . . . . . . . . . . . . .  3
64	       1.2.2.  Use Cases  . . . . . . . . . . . . . . . . . . . . . .  4
65	     1.3.  Incremental Deployment . . . . . . . . . . . . . . . . . .  4
66	     1.4.  Notation Used in this Memo . . . . . . . . . . . . . . . .  4
67	   2.  Fragment Identification Methods  . . . . . . . . . . . . . . .  4
68	     2.1.  Row-based selection  . . . . . . . . . . . . . . . . . . .  5
69	     2.2.  Column-based selection . . . . . . . . . . . . . . . . . .  5
70	     2.3.  Cell-based selection . . . . . . . . . . . . . . . . . . .  6
71	     2.4.  Multi-Selections . . . . . . . . . . . . . . . . . . . . .  6
72	   3.  Fragment Identification Syntax . . . . . . . . . . . . . . . .  7
73	   4.  Fragment Identifier Processing . . . . . . . . . . . . . . . .  7
74	     4.1.  Syntax Errors in Fragment Identifiers  . . . . . . . . . .  7
75	     4.2.  Semantics of Fragment Identifiers  . . . . . . . . . . . .  7
76	   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . .  8
77	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . .  8
78	   7.  Implementation Status  . . . . . . . . . . . . . . . . . . . .  9
79	   8.  Change Log . . . . . . . . . . . . . . . . . . . . . . . . . .  9
80	     8.1.  From -03 to -04  . . . . . . . . . . . . . . . . . . . . .  9
81	     8.2.  From -02 to -03  . . . . . . . . . . . . . . . . . . . . .  9
82	     8.3.  From -01 to -02  . . . . . . . . . . . . . . . . . . . . .  9
83	     8.4.  From -00 to -01  . . . . . . . . . . . . . . . . . . . . . 10
84	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
85	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 10
86	     9.2.  Non-Normative References . . . . . . . . . . . . . . . . . 10
87	   Appendix A.  Acknowledgements  . . . . . . . . . . . . . . . . . . 11
88	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11

90	1.  Introduction

92	   This memo updates the text/csv media type defined in RFC 4180 [1] by
93	   defining URI fragment identifiers for text/csv MIME entities.

95	   This section gives an introduction to the general concepts of text/
96	   csv MIME entities and URI fragment identifiers, and discusses the
97	   need for fragment identifiers for text/csv and deployment issues.
98	   Section 2 discusses the principles and methods on which this memo is
99	   based.  Section 3 defines the syntax, and Section 4 discusses
100	   processing of text/csv fragment identifiers.

102	1.1.  What is text/csv?

104	   Internet Media Types (often referred to as "MIME types") as defined
105	   in RFC 2045 [2] and RFC 2046 [3] are used to identify different types
106	   and sub-types of media.  The text/csv media type is defined in RFC
107	   4180 [1], using US-ASCII [8] as the default character encoding (other
108	   character encodings can be used as well).  Apart from a media type
109	   parameter for specifying the character encoding ("charset"), there is
110	   a second media type parameter ("header") that indicates whether there
111	   is a header row in the CSV document or not.

113	1.2.  Why text/csv Fragment Identifiers?

115	   URIs are the identification mechanism for resources on the Web. The
116	   URI syntax specified in RFC 3986 [4] optionally includes a so-called
117	   "fragment identifier", separated by a number sign ("#").  The
118	   fragment identifier consists of additional reference information to
119	   be interpreted by the client after the retrieval action has been
120	   successfully completed.  The semantics of a fragment identifier is a
121	   property of the media type resulting from a retrieval action,
122	   regardless of the URI scheme used in the URI reference.  Therefore,
123	   the format and interpretation of fragment identifiers is dependent on
124	   the media type of the retrieval result.

126	1.2.1.  Motivation

128	   Similar to the motivation in RFC 5147 [9], which defines fragment
129	   identifiers for plain text files, referring to specific parts of a
130	   resource can be very useful, because it enables users and
131	   applications to create more specific references.  Users can create
132	   references to the part they really are interested in or want to talk
133	   about, rather than always pointing to a complete resource.  Even
134	   though it is suggested that fragment identification methods are
135	   specified in a media type's registration (see [10]), many media types
136	   do not have fragment identification methods associated with them.

138	   Fragment identifiers are only useful if supported by the client,
139	   because they are only interpreted by the client.  Therefore, a new
140	   fragment identification method will require some time to be adopted
141	   by clients, and older clients will not support it.  However, because
142	   the URI still works even if the fragment identifier is not supported
143	   (the resource is retrieved, but the fragment identifier is not
144	   interpreted), rapid adoption is not highly critical to ensure the
145	   success of a new fragment identification method.

147	1.2.2.  Use Cases

149	   Fragment identifiers for text/csv as defined in this memo make it
150	   possible to refer to specific parts of a text/csv MIME entity.  Use
151	   cases include, but are not limited to, selecting a part for visual
152	   rendering, stream processing, making assertions about a certain value
153	   (provenance, confidence, comments, etc.), or data integration.

155	1.3.  Incremental Deployment

157	   As long as text/csv fragment identifiers are not supported
158	   universally, it is important to consider the implications of
159	   incremental deployment.  Clients (for example, Web browsers) not
160	   supporting the text/csv fragment identifier described in this memo
161	   will work with URI references to text/csv MIME entities, but they
162	   will fail to understand the identification of the sub-resource
163	   specified by the fragment identifier, and thus will behave as if the
164	   complete resource was referenced.  This is a reasonable fallback
165	   behavior, and in general users should take into account the
166	   possibility that a program interpreting a given URI will fail to
167	   interpret the fragment identifier part.  Since fragment identifier
168	   evaluation is local to the client (and happens after retrieving the
169	   MIME entity), there is no reliable way for a server to determine
170	   whether a requesting client is using a URI containing a fragment
171	   identifier.

173	1.4.  Notation Used in this Memo

175	   The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
176	   "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
177	   "OPTIONAL" in this document are to be interpreted as described in RFC
178	   2119 [5].

180	2.  Fragment Identification Methods

182	   This memo specifies fragment identification using following methods:
183	   "row" for row selections, "col" for columns selections, and "cell"
184	   for cell selections.

186	   Throughout the sections below, the following example table in CSV
187	   (having 7 rows, including one header row, and 3 columns) is used:
188	   date,temperature,place
189	   2011-01-01,1,Galway
190	   2011-01-02,-1,Galway
191	   2011-01-03,0,Galway
192	   2011-01-01,6,Berkeley
193	   2011-01-02,8,Berkeley
194	   2011-01-03,5,Berkeley

196	2.1.  Row-based selection

198	   To select a specific record, the "row" scheme followed by a single
199	   number is used (the first row is at position 1).
200	   http://example.com/data.csv#row=4

202	   The above CSV fragment identifies the fourth row:
203	   2011-01-03,0,Galway

205	   Fragments can also select ranges of rows:
206	   http://example.com/data.csv#row=5-7

208	   The above CSV fragment identifies three consecutive rows:
209	   2011-01-01,6,Berkeley
210	   2011-01-02,8,Berkeley
211	   2011-01-03,5,Berkeley

213	   The value "*" can be used to indicate the last row, so the previous
214	   URI is equivalent to:
215	   http://example.com/data.csv#row=5-*

217	2.2.  Column-based selection

219	   To select values from a certain column, the "col" scheme is used,
220	   followed by a position (the first column is at position 1):
221	   http://example.com/data.csv#col=2

223	   The above CSV fragment addresses the second column, identifying the
224	   column:
225	   temperature
226	   1
227	   -1
228	   0
229	   6
230	   8
231	   5

233	   The "col" scheme can also be used to identify ranges of columns:

235	   http://example.com/data.csv#col=1-2

237	   The above CSV fragment addresses the first and second column:
238	   date,temperature
239	   2011-01-01,1
240	   2011-01-02,-1
241	   2011-01-03,0
242	   2011-01-01,6
243	   2011-01-02,8
244	   2011-01-03,5

246	   As for rows, the value "*" can be used to indicate the last column.

248	2.3.  Cell-based selection

250	   To select particular fields, the "cell" scheme is used, followed by a
251	   row number, a comma, and a column number.
252	   http://example.com/data.csv#cell=4,1

254	   The above CSV fragment addresses the field in the first column within
255	   the fourth row, yielding:
256	   2011-01-03

258	   It is also possible to select cell-based fragments that have more
259	   than just one cell, in which case the cell selection uses the same
260	   range syntax as for row and column range selections.  For these
261	   selections, the syntax uses the upper-lefthand cell as the starting
262	   point of the selection, followed by a minus sign, and then the lower-
263	   righthand cell as the end point of the selection.
264	   http://example.com/data.csv#cell=4,1-6,2

266	   The above CSV fragment selects a region that starts at the fourth row
267	   and the first column, and ends at the sixth row and the second
268	   column:
269	   2011-01-03,0
270	   2011-01-01,6
271	   2011-01-02,8

273	2.4.  Multi-Selections

275	   Row, column, and cell selections can make more than one selection, in
276	   which case the individual selections are separated by semicolons.  In
277	   these cases, the resulting fragment may be a disjoint fragment, such
278	   as the selection "#row=3;6" for the example CSV, which would select
279	   the third and the sixth row.  It is up to the user agent to decide
280	   how to handle disjoint fragments, but since they are allowed, user
281	   agents should be prepared to handle disjoint fragments.

283	3.  Fragment Identification Syntax

285	   The syntax for the text/csv fragment identifiers is as follows.

287	   The following syntax definition uses ABNF as defined in RFC 4234 [6],
288	   including the rule DIGIT.

290	   NOTE:  In the descriptions that follow, specified text values MUST be
291	      used exactly as given, using exactly the indicated lower-case
292	      letters.  In this respect, the ABNF usage differs from [6].

294	   csv-fragment =  rowsel / colsel / cellsel
295	   rowsel       =  "row=" singlespec 0*( ";" singlespec)
296	   colsel       =  "col=" singlespec 0*( ";" singlespec)
297	   cellsel      =  "cell=" cellspec 0*( ";" cellspec)
298	   singlespec   =  position [ "-" position ]
299	   cellspec     =  cellrow "," cellcol [ "-" cellrow "," cellcol ]
300	   cellrow      =  position
301	   cellcol      =  position
302	   position     =  number / "*"
303	   number       =  1*( DIGIT )

305	4.  Fragment Identifier Processing

307	   Applications implementing support for the mechanism described in this
308	   memo MUST behave as described in the following sections.

310	4.1.  Syntax Errors in Fragment Identifiers

312	   If a fragment identifier contains a syntax error (i.e., does not
313	   conform to the syntax specified in Section 3), then it MUST be
314	   ignored by clients.  Clients MUST NOT make any attempt to correct or
315	   guess fragment identifiers.  Syntax errors MAY be reported by
316	   clients.

318	4.2.  Semantics of Fragment Identifiers

320	   Rows and columns in CSV are counted from one.  Positions thus refer
321	   to the rows and columns starting from position 1, which identifies
322	   the first row or column of a CSV.  The special character "*" can be
323	   used to refer to the last row or column of a CSV, thus allowing
324	   fragment identifiers to easily identify ranges that extend to the
325	   last row or column.

327	   If single selections refer to non-existing rows or columns (i.e.,
328	   beyond the size of of the CSV), they MUST be ignored.

330	   If ranges extend beyond the size of the CSV (by extending to rows or
331	   columns beyond the size of the CSV), they MUST be interpreted to only
332	   extend to the actual size of the CSV.

334	   If selections of ranges of rows or columns or selections of cell
335	   ranges are specified in a way so that they select "inversely" (i.e.,
336	   "#row=10-5" or "#cell=10,10-5,5"), they MUST be ignored.

338	   Each specification of an identified region is processed
339	   independently, and ignored specifications (because of reason listed
340	   in the previous paragraphs) do not cause the whole fragment
341	   identifier to fail, they just mean that this single specification is
342	   ignored.  For the example file, the fragment identifier "#row=1-2;5-
343	   4;13-16" does identify the first two rows: the second specification
344	   is an "inverse" specification and thus is ignored, and the third
345	   specification targets rows beyond the actual size of the CSV and thus
346	   is also ignored.

348	   The complete fragment identifier identifies all the successfully
349	   evaluated identified parts as a single identified fragment.  This
350	   fragment can be disjoint because of multiple selections.  Multiple
351	   selections also can result in overlapping individual parts, and it is
352	   up to the user agent how to process such a fragment, and whether the
353	   individual parts are still made accessible (i.e., visualized in
354	   visual user agents), or are presented as one unit.  For example, the
355	   fragment identifier "#row=3-6;4-5" contains a second identified part
356	   that is completely contained in the first identified part.  Whether a
357	   user agent maintains this selection as two parts, or simply signals
358	   that the identified fragment spans from the third to the sixth row,
359	   is up for the user agent to decide.

361	5.  IANA Considerations

363	   Note to RFC Editor: Please change this section to read as follows
364	   after the IANA action has been completed: "IANA has added a reference
365	   to this specification in the text/csv Media Type registration."

367	   IANA is requested to update the registration of the MIME Media type
368	   text/csv at http://www.iana.org/assignments/media-types/text/ with
369	   the fragment identifier defined in this memo by adding a reference to
370	   this memo (with the appropriate RFC number once it is known).

372	6.  Security Considerations

374	   The fact that software implementing fragment identifiers for CSV and
375	   software not implementing them differs in behavior, and the fact that
376	   different software may show documents or fragments to users in
377	   different ways, can lead to misunderstandings on the part of users.
378	   Such misunderstandings might be exploited in a way similar to
379	   spoofing or phishing.

381	   Implementers and users of fragment identifiers for CSV text should
382	   also be aware of the security considerations in RFC 3986 [4] and RFC
383	   3987 [7].

385	7.  Implementation Status

387	   Note to RFC Editor: Please remove this section before publication.

389	   As explained in a draft currently under development
390	   <http://tools.ietf.org/html/draft-sheffer-running-code>, this section
391	   contains information about implementation status, so that reviews of
392	   the draft document can take implementation reports into account as
393	   well.  If you are implementing this draft, please contact this
394	   draft's authors.  Any implementation status reports are intended for
395	   draft publications only; the section will be removed when the draft
396	   is published in RFC form.

398	8.  Change Log

400	   Note to RFC Editor: Please remove this section before publication.

402	8.1.  From -03 to -04

404	   o  Switched category from "std" to "info".

406	   o  Changed the definition of positions to start counting from 1
407	      instead of 0.

409	8.2.  From -02 to -03

411	   o  Added section on "Implementation Status" (Section 7).

413	   o  Added examples of ranges of rows and columns.

415	   o  Corrected errors in examples.

417	8.3.  From -01 to -02

419	   o  Removed slices ("#where:") as fragment identification method.

421	   o  Removed any special support for headers, which means that they are
422	      now treated as a regular (the first) row (if a header row is
423	      present).

425	   o  Changed semantics and syntax to allow multiple selection of rows,
426	      columns, and cells, and to allow ranges of rows and columns.

428	8.4.  From -00 to -01

430	   o  Added cell-based selections.

432	   o  Added Jeni Tennison as author; updated Erik Wilde's affiliation to
433	      EMC.

435	9.  References

437	9.1.  Normative References

439	   [1]   Shafranovich, Y., "Common Format and MIME Type for Comma-
440	         Separated Values (CSV) Files", RFC 4180, October 2005.

442	   [2]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
443	         Extensions (MIME) Part One: Format of Internet Message Bodies",
444	         RFC 2045, November 1996.

446	   [3]   Freed, N. and N. Borenstein, "Multipurpose Internet Mail
447	         Extensions (MIME) Part Two: Media Types", RFC 2046,
448	         November 1996.

450	   [4]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
451	         Resource Identifier (URI): Generic Syntax", RFC 3986,
452	         January 2005.

454	   [5]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
455	         Levels", RFC 2119, March 1997.

457	   [6]   Crocker, D. and P. Overell, "Augmented BNF for Syntax
458	         Specifications: ABNF", RFC 4234, October 2005.

460	   [7]   Duerst, M. and M. Suignard, "Internationalized Resource
461	         Identifiers (IRI)", RFC 3987, January 2005.

463	9.2.  Non-Normative References

465	   [8]   ANSI X3.4-1986, "Coded Character Set - 7-Bit American National
466	         Standard Code for Information Interchange", STD 63, RFC 3629,
467	         1992.

469	   [9]   Wilde, E. and M. Duerst, "URI Fragment Identifiers for the
470	         text/plain Media Type", RFC 5147, April 2008.

472	   [10]  Freed, N., Klensin, J., and T. Hansen, "Media Type
473	         Specifications and Registration Procedures", BCP 13, RFC 6838,
474	         January 2013.

476	URIs

478	   [11]  <https://www.ietf.org/mailman/listinfo/apps-discuss>

480	   [12]  <https://github.com/dret/I-D/tree/master/csv-fragment>

482	Appendix A.  Acknowledgements

484	   Thanks for comments and suggestions provided by Richard Cyganiak, Ian
485	   Davis, Leigh Dodds, and Gannon Dick.

487	Authors' Addresses

489	   Michael Hausenblas
490	   DERI, NUI Galway
491	   IDA Business Park
492	   Galway
493	   Ireland

495	   Phone: +353-91-495730
496	   Email: michael.hausenblas@deri.org
497	   URI:   http://sw-app.org/about.html

499	   Erik Wilde
500	   EMC Corporation
501	   6801 Koll Center Parkway
502	   Pleasanton, CA 94566
503	   U.S.A.

505	   Phone: +1-925-6006244
506	   Email: erik.wilde@emc.com
507	   URI:   http://dret.net/netdret/
508	   Jeni Tennison
509	   Open Data Institute
510	   65 Clifton Street
511	   London EC2A 4JE
512	   U.K.

514	   Phone: +44-797-4420482
515	   Email: jeni@jenitennison.com
516	   URI:   http://www.jenitennison.com/blog/