idnits 2.17.1 draft-hausenblas-csv-fragment-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([12]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC4180, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC4180, updated by this document, for RFC5378 checks: 2005-02-03) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 29, 2012) is 4135 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 4180 (ref. '1') ** Obsolete normative reference: RFC 4234 (ref. '6') (Obsoleted by RFC 5234) -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Obsolete informational reference (is this intentional?): RFC 4288 (ref. '11') (Obsoleted by RFC 6838) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Hausenblas 3 Internet-Draft DERI, NUI Galway 4 Updates: 4180 (if approved) E. Wilde 5 Intended status: Standards Track EMC Corporation 6 Expires: July 2, 2013 J. Tennison 7 Open Data Institute 8 December 29, 2012 10 URI Fragment Identifiers for the text/csv Media Type 11 draft-hausenblas-csv-fragment-01 13 Abstract 15 This memo defines URI fragment identifiers for text/csv MIME 16 entities. These fragment identifiers make it possible to refer to 17 parts of a text/csv MIME entity, identified by cell, row, column, or 18 slice. 20 Note to Readers 22 This draft should be discussed on the apps-discuss mailing list [12]. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on July 2, 2013. 41 Copyright Notice 43 Copyright (c) 2012 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. What is text/csv? . . . . . . . . . . . . . . . . . . . . . 3 60 1.2. Why text/csv Fragment Identifiers? . . . . . . . . . . . . 3 61 1.2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . 3 62 1.2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . 4 63 1.3. Incremental Deployment . . . . . . . . . . . . . . . . . . 4 64 1.4. Notation Used in this Memo . . . . . . . . . . . . . . . . 4 65 2. Fragment Identification Methods . . . . . . . . . . . . . . . . 4 66 2.1. Header . . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 2.2. Row-based selection . . . . . . . . . . . . . . . . . . . . 5 68 2.3. Column-based selection . . . . . . . . . . . . . . . . . . 5 69 2.4. Cell-based selection . . . . . . . . . . . . . . . . . . . 6 70 2.5. Slice-based selection . . . . . . . . . . . . . . . . . . . 6 71 3. Fragment Identification Syntax . . . . . . . . . . . . . . . . 6 72 4. Fragment Identifier Processing . . . . . . . . . . . . . . . . 7 73 4.1. Syntax Errors in Fragment Identifiers . . . . . . . . . . . 7 74 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 75 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 76 7. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 8 77 7.1. From -00 to -01 . . . . . . . . . . . . . . . . . . . . . . 8 78 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 79 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 80 8.2. Non-Normative References . . . . . . . . . . . . . . . . . 9 81 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . . 9 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 84 1. Introduction 86 This memo updates the text/csv media type defined in RFC 4180 [1] by 87 defining URI fragment identifiers for text/csv MIME entities. 89 This section gives an introduction to the general concepts of text/ 90 csv MIME entities and URI fragment identifiers, and discusses the 91 need for fragment identifiers for text/csv and deployment issues. 92 Section 2 discusses the principles and methods on which this memo is 93 based. Section 3 defines the syntax, and Section 4 discusses 94 processing of text/csv fragment identifiers. 96 1.1. What is text/csv? 98 Internet Media Types (often referred to as "MIME types") as defined 99 in RFC 2045 [2] and RFC 2046 [3] are used to identify different types 100 and sub-types of media. The text/csv media type is defined in RFC 101 4180 [1], using US-ASCII [9] as the default character encoding (other 102 character encodings can be used as well). 104 1.2. Why text/csv Fragment Identifiers? 106 URIs are the identification mechanism for resources on the Web. The 107 URI syntax specified in RFC 3986 [4] optionally includes a so-called 108 "fragment identifier", separated by a number sign ('#'). The 109 fragment identifier consists of additional reference information to 110 be interpreted by the user agent after the retrieval action has been 111 successfully completed. The semantics of a fragment identifier is a 112 property of the data resulting from a retrieval action, regardless of 113 the type of URI used in the reference. Therefore, the format and 114 interpretation of fragment identifiers is dependent on the media type 115 of the retrieval result. 117 1.2.1. Motivation 119 Similar to the motivation in RFC 5147 [10], referring to specific 120 parts of a resource can be very useful, because it enables users and 121 applications to create more specific references. Users can create 122 references to the part they really are interested in or want to talk 123 about, rather than always pointing to a complete resource. Even 124 though it is suggested that fragment identification methods are 125 specified in a media type's MIME registration (see [11]), many media 126 types do not have fragment identification methods associated with 127 them. 129 Fragment identifiers are only useful if supported by the client, 130 because they are only interpreted by the client. Therefore, a new 131 fragment identification method will require some time to be adopted 132 by clients, and older clients will not support it. However, because 133 the URI still works even if the fragment identifier is not supported 134 (the resource is retrieved, but the fragment identifier is not 135 interpreted), rapid adoption is not highly critical to ensure the 136 success of a new fragment identification method. 138 1.2.2. Use Cases 140 Fragment identifiers for text/csv as defined in this memo make it 141 possible to refer to specific parts of a text/csv MIME entity. Use 142 cases include, but are not limited to, discovery (what column 143 headings or how many rows are available), selecting a part for visual 144 rendering, stream processing, making assertions about a certain value 145 (provenance, confidence, etc.), or data integration. 147 1.3. Incremental Deployment 149 As long as text/csv fragment identifiers are not supported 150 universally, it is important to consider the implications of 151 incremental deployment. Clients (for example, Web browsers) not 152 supporting the text/csv fragment identifier described in this memo 153 will work with URI references to text/csv MIME entities, but they 154 will fail to to understand the identification of the sub-resource 155 specified by the fragment identifier, and thus will behave as if the 156 complete resource was referenced. This is a reasonable fallback 157 behavior, and in general users should take into account the 158 possibility that a program interpreting a given URI will fail to 159 interpret the fragment identifier part. Since fragment identifier 160 evaluation is local to the client (and happens after retrieving the 161 MIME entity), there is no reliable way for a server to determine 162 whether a requesting client is using a URI containing a fragment 163 identifier. 165 1.4. Notation Used in this Memo 167 The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 168 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 169 "OPTIONAL" in this document are to be interpreted as described in RFC 170 2119 [5]. 172 2. Fragment Identification Methods 174 This memo specifies fragment identification using following methods: 175 header, row, column, cell and slice. As of RFC 4180 [1] the header 176 line is optional and hence the application of the method is dependent 177 on the actual format of the text/csv MIME entity. 179 Throughout the sections below the following table in CSV is used: 180 date,temperature,place 181 2011-01-01,1,Galway 182 2011-01-02,-1,Galway 183 2011-01-03,0,Galway 184 2011-01-01,6,Berkeley 185 2011-01-02,8,Berkeley 186 2011-01-03,5,Berkeley 188 2.1. Header 190 For discovery purposes, the "head" scheme is used, returning the 191 first row. If the "header" parameter per RFC 4180 [1] is available 192 and its value is "present" the client can reliable determine that it 193 is a header. 194 http://example.com/data.csv#head 196 Applied to the reference table, the above CSV fragment would select 197 the header row, yielding: 198 date,temperature,place 200 2.2. Row-based selection 202 To select a specific record, the "row" scheme followed by a single 203 number is used (the first record has the index 0). If the fragment 204 is given in the form row:*, then no record is selected but the 205 overall number of records is returned. 206 http://example.com/data.csv#row:2 208 The above CSV fragment yields: while the following computes the 209 number of records (which equals 6, in the reference table) 210 2011-01-03,0,Galway 212 The following computes the number of records (which equals 6, in the 213 reference table): 214 http://example.com/data.csv#row:* 216 2.3. Column-based selection 218 To select values from a certain column, the "col" scheme, followed 219 either by a single number or the value of a header field is used. 220 http://example.com/data.csv#col:temperature 222 The above CSV fragment addresses a column by name, yielding: 223 1,-1,0,6,8,5 225 A column can also be addressed by position as shown in the next 226 example: 228 http://example.com/data.csv#col:2 230 The above CSV fragment selects the third column: 231 Galway,Galway,Galway,Berkeley,Berkeley,Berkeley 233 2.4. Cell-based selection 235 To select a particular field within a row, use the "cell" scheme, 236 followed by a row number, a comma, and either a single number or the 237 value of a header field. 238 http://example.com/data.csv#cell:2,date 240 The above CSV fragment addresses the field in the date column within 241 the third row, yeilding: 242 2011-01-03 244 A field can also be addressed by position as shown in the next 245 example: 246 http://example.com/data.csv#cell:3,1 248 The above CSV fragment selects the second column in the fourth row: 249 6 251 2.5. Slice-based selection 253 To select a part of table, called a slice in the following, the 254 "where" scheme is used. The allowed values are a comma-separated 255 list of header fields with corresponding field values in the table. 256 http://example.com/data.csv#where:date=2011-01-01 258 The above CSV fragment selects a slice, yielding another CSV table as 259 follows: 260 temperature,place 261 1,Galway 262 6,Berkeley 264 3. Fragment Identification Syntax 266 The syntax for the text/csv fragment identifiers is as follows. 268 The following syntax definition uses ABNF as defined in RFC 4234 [6], 269 including the rules DIGIT and HEXDIG. The mime-charset rule is 270 defined in RFC 2978 [7]. 272 NOTE: In the descriptions that follow, specified text values MUST be 273 used exactly as given, using exactly the indicated lower-case 274 letters. In this respect, the ABNF usage differs from [6]. 276 csv-fragment = headersel / wheresel / colsel / rowsel / cellsel 277 headersel = "head" 278 rowsel = "row:" rowspec 279 colsel = "col:" colspec 280 cellsel = "cell:" cellspec 281 wheresel = "where:" kvpairs 282 kvpairs = 1*( col "=" val 0*1(",") ) 283 col = 1*TEXTDATA 284 val = 1*TEXTDATA 285 colspec = column 286 rowspec = "*" / rownum 287 cellspec = rownum "," column 288 column = 1*TEXTDATA / 1*DIGIT 289 rownum = 1*DIGIT 290 TEXTDATA = %x23-2B / %x2D-3C / %x3E-7E 291 DIGIT = %x30-39 293 4. Fragment Identifier Processing 295 Applications implementing support for the mechanism described in this 296 memo MUST behave as described in the following sections. 298 4.1. Syntax Errors in Fragment Identifiers 300 If a fragment identifier contains a syntax error (i.e., does not 301 conform to the syntax specified in Section 3), then it MUST be 302 ignored by clients. Clients MUST NOT make any attempt to correct or 303 guess fragment identifiers. Syntax errors MAY be reported by 304 clients. 306 5. IANA Considerations 308 Note to RFC Editor: Please change this section to read as follows 309 after the IANA action has been completed: "IANA has added a reference 310 to this specification in the text/csv Media Type registration." 312 IANA is requested to update the registration of the MIME Media type 313 text/csv at http://www.iana.org/assignments/media-types/text/ with 314 the fragment identifier defined in this memo by adding a reference to 315 this memo (with the appropriate RFC number once it is known). 317 6. Security Considerations 319 The fact that software implementing fragment identifiers for CSV and 320 software not implementing them differs in behavior, and the fact that 321 different software may show documents or fragments to users in 322 different ways, can lead to misunderstandings on the part of users. 323 Such misunderstandings might be exploited in a way similar to 324 spoofing or phishing. 326 ... 328 Implementers and users of fragment identifiers for CSV text should 329 also be aware of the security considerations in RFC 3986 [4] and RFC 330 3987 [8]. 332 7. Change Log 334 Note to RFC Editor: Please remove this section before publication. 336 7.1. From -00 to -01 338 o Added cell-based selections. 340 o Added Jeni Tennison as author; updated Erik Wilde's affiliation to 341 EMC. 343 8. References 345 8.1. Normative References 347 [1] Shafranovich, Y., "Common Format and MIME Type for Comma- 348 Separated Values (CSV) Files", RFC 4180, October 2005. 350 [2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 351 Extensions (MIME) Part One: Format of Internet Message Bodies", 352 RFC 2045, November 1996. 354 [3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 355 Extensions (MIME) Part Two: Media Types", RFC 2046, 356 November 1996. 358 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 359 Resource Identifier (URI): Generic Syntax", RFC 3986, 360 January 2005. 362 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 363 Levels", RFC 2119, March 1997. 365 [6] Crocker, D. and P. Overell, "Augmented BNF for Syntax 366 Specifications: ABNF", RFC 4234, October 2005. 368 [7] Freed, N. and J. Postel, "IANA Charset Registration 369 Procedures", BCP 19, October 2000. 371 [8] Duerst, M. and M. Suignard, "Internationalized Resource 372 Identifiers (IRI)", RFC 3987, January 2005. 374 8.2. Non-Normative References 376 [9] ANSI X3.4-1986, "Coded Character Set - 7-Bit American National 377 Standard Code for Information Interchange", STD 63, RFC 3629, 378 1992. 380 [10] Wilde, E. and M. Duerst, "URI Fragment Identifiers for the 381 text/plain Media Type", RFC 5147, April 2008. 383 [11] Freed, N. and J. Klensin, "Media Type Specifications and 384 Registration Procedures", RFC 4288, December 2005. 386 URIs 388 [12] 390 Appendix A. Acknowledgements 392 Thanks for comments and suggestions provided by Richard, Ian, Gannon. 394 Authors' Addresses 396 Michael Hausenblas 397 DERI, NUI Galway 398 IDA Business Park 399 Galway 400 Ireland 402 Phone: +353-91-495730 403 Email: michael.hausenblas@deri.org 404 URI: http://sw-app.org/about.html 405 Erik Wilde 406 EMC Corporation 407 6801 Koll Center Parkway 408 Pleasanton, CA 94566 409 U.S.A. 411 Phone: +1-925-6006244 412 Email: erik.wilde@emc.com 413 URI: http://dret.net/netdret/ 415 Jeni Tennison 416 Open Data Institute 417 65 Clifton Street 418 London EC2A 4JE 419 U.K. 421 Phone: +44-797-4420482 422 Email: jeni@jenitennison.com 423 URI: http://www.jenitennison.com/blog/