idnits 2.17.1 draft-hausenblas-csv-fragment-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([11], [12]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC4180, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC4180, updated by this document, for RFC5378 checks: 2005-02-03) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 12, 2013) is 4093 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Downref: Normative reference to an Informational RFC: RFC 4180 (ref. '1') ** Obsolete normative reference: RFC 4234 (ref. '6') (Obsoleted by RFC 5234) -- Obsolete informational reference (is this intentional?): RFC 4288 (ref. '10') (Obsoleted by RFC 6838) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Hausenblas 3 Internet-Draft DERI, NUI Galway 4 Updates: 4180 (if approved) E. Wilde 5 Intended status: Standards Track EMC Corporation 6 Expires: July 16, 2013 J. Tennison 7 Open Data Institute 8 January 12, 2013 10 URI Fragment Identifiers for the text/csv Media Type 11 draft-hausenblas-csv-fragment-02 13 Abstract 15 This memo defines URI fragment identifiers for text/csv MIME 16 entities. These fragment identifiers make it possible to refer to 17 parts of a text/csv MIME entity, identified by row, column, or cell. 18 Fragment identification can use single items, or ranges. 20 Note to Readers 22 This draft should be discussed on the apps-discuss mailing list [11]. 24 Online access to all versions and files is available at github [12]. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on July 16, 2013. 43 Copyright Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. What is text/csv? . . . . . . . . . . . . . . . . . . . . 3 62 1.2. Why text/csv Fragment Identifiers? . . . . . . . . . . . . 3 63 1.2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . 4 65 1.3. Incremental Deployment . . . . . . . . . . . . . . . . . . 4 66 1.4. Notation Used in this Memo . . . . . . . . . . . . . . . . 4 67 2. Fragment Identification Methods . . . . . . . . . . . . . . . 4 68 2.1. Selections . . . . . . . . . . . . . . . . . . . . . . . . 5 69 2.2. Row-based selection . . . . . . . . . . . . . . . . . . . 5 70 2.3. Column-based selection . . . . . . . . . . . . . . . . . . 5 71 2.4. Cell-based selection . . . . . . . . . . . . . . . . . . . 5 72 2.5. Multi-Selections . . . . . . . . . . . . . . . . . . . . . 6 73 3. Fragment Identification Syntax . . . . . . . . . . . . . . . . 6 74 4. Fragment Identifier Processing . . . . . . . . . . . . . . . . 7 75 4.1. Syntax Errors in Fragment Identifiers . . . . . . . . . . 7 76 4.2. Semantics of Fragment Identifiers . . . . . . . . . . . . 7 77 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 78 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 79 7. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 8 80 7.1. From -01 to -02 . . . . . . . . . . . . . . . . . . . . . 8 81 7.2. From -00 to -01 . . . . . . . . . . . . . . . . . . . . . 9 82 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 83 8.1. Normative References . . . . . . . . . . . . . . . . . . . 9 84 8.2. Non-Normative References . . . . . . . . . . . . . . . . . 9 85 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 10 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 10 88 1. Introduction 90 This memo updates the text/csv media type defined in RFC 4180 [1] by 91 defining URI fragment identifiers for text/csv MIME entities. 93 This section gives an introduction to the general concepts of text/ 94 csv MIME entities and URI fragment identifiers, and discusses the 95 need for fragment identifiers for text/csv and deployment issues. 96 Section 2 discusses the principles and methods on which this memo is 97 based. Section 3 defines the syntax, and Section 4 discusses 98 processing of text/csv fragment identifiers. 100 1.1. What is text/csv? 102 Internet Media Types (often referred to as "MIME types") as defined 103 in RFC 2045 [2] and RFC 2046 [3] are used to identify different types 104 and sub-types of media. The text/csv media type is defined in RFC 105 4180 [1], using US-ASCII [8] as the default character encoding (other 106 character encodings can be used as well). Apart from a media type 107 parameter for specifying the character encoding ("charset"), there is 108 a second media type parameter ("header") that indicates whether there 109 is a header row in the CSV document or not. 111 1.2. Why text/csv Fragment Identifiers? 113 URIs are the identification mechanism for resources on the Web. The 114 URI syntax specified in RFC 3986 [4] optionally includes a so-called 115 "fragment identifier", separated by a number sign ("#"). The 116 fragment identifier consists of additional reference information to 117 be interpreted by the client after the retrieval action has been 118 successfully completed. The semantics of a fragment identifier is a 119 property of the media type resulting from a retrieval action, 120 regardless of the URI scheme used in the URI reference. Therefore, 121 the format and interpretation of fragment identifiers is dependent on 122 the media type of the retrieval result. 124 1.2.1. Motivation 126 Similar to the motivation in RFC 5147 [9], which defines fragment 127 identifiers for plain text files, referring to specific parts of a 128 resource can be very useful, because it enables users and 129 applications to create more specific references. Users can create 130 references to the part they really are interested in or want to talk 131 about, rather than always pointing to a complete resource. Even 132 though it is suggested that fragment identification methods are 133 specified in a media type's MIME registration (see [10]), many media 134 types do not have fragment identification methods associated with 135 them. 137 Fragment identifiers are only useful if supported by the client, 138 because they are only interpreted by the client. Therefore, a new 139 fragment identification method will require some time to be adopted 140 by clients, and older clients will not support it. However, because 141 the URI still works even if the fragment identifier is not supported 142 (the resource is retrieved, but the fragment identifier is not 143 interpreted), rapid adoption is not highly critical to ensure the 144 success of a new fragment identification method. 146 1.2.2. Use Cases 148 Fragment identifiers for text/csv as defined in this memo make it 149 possible to refer to specific parts of a text/csv MIME entity. Use 150 cases include, but are not limited to, selecting a part for visual 151 rendering, stream processing, making assertions about a certain value 152 (provenance, confidence, comments, etc.), or data integration. 154 1.3. Incremental Deployment 156 As long as text/csv fragment identifiers are not supported 157 universally, it is important to consider the implications of 158 incremental deployment. Clients (for example, Web browsers) not 159 supporting the text/csv fragment identifier described in this memo 160 will work with URI references to text/csv MIME entities, but they 161 will fail to understand the identification of the sub-resource 162 specified by the fragment identifier, and thus will behave as if the 163 complete resource was referenced. This is a reasonable fallback 164 behavior, and in general users should take into account the 165 possibility that a program interpreting a given URI will fail to 166 interpret the fragment identifier part. Since fragment identifier 167 evaluation is local to the client (and happens after retrieving the 168 MIME entity), there is no reliable way for a server to determine 169 whether a requesting client is using a URI containing a fragment 170 identifier. 172 1.4. Notation Used in this Memo 174 The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 175 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 176 "OPTIONAL" in this document are to be interpreted as described in RFC 177 2119 [5]. 179 2. Fragment Identification Methods 181 This memo specifies fragment identification using following methods: 182 "row" for row selections, "col" for columns selections, and "cell" 183 for cell selections. 185 Throughout the sections below, the following example table in CSV 186 (having 7 rows, including one header row, and 3 columns) is used: 187 date,temperature,place 188 2011-01-01,1,Galway 189 2011-01-02,-1,Galway 190 2011-01-03,0,Galway 191 2011-01-01,6,Berkeley 192 2011-01-02,8,Berkeley 193 2011-01-03,5,Berkeley 195 2.1. Selections 197 2.2. Row-based selection 199 To select a specific record, the "row" scheme followed by a single 200 number is used (the first row has the position 0). 201 http://example.com/data.csv#row=3 203 The above CSV fragment yields: 204 2011-01-03,0,Galway 206 2.3. Column-based selection 208 To select values from a certain column, the "col" scheme, followed by 209 a position: 210 http://example.com/data.csv#col=1 212 The above CSV fragment addresses the second column, yielding the 213 column: 214 temperature 215 1 216 -1 217 0 218 6 219 8 220 5 222 2.4. Cell-based selection 224 To select particular fields, use the "cell" scheme, followed by a row 225 number, a comma, and a column number. 226 http://example.com/data.csv#cell=3,0 228 The above CSV fragment addresses the field in the first column within 229 the fourth row, yeilding: 230 2011-01-03 232 It is also possible to select ranges that have more than one row or 233 column, in which case the cell selection uses the same range syntax 234 as for row and column selections. For these selections, the syntax 235 uses the upper-lefthand call as the starting point of the selection, 236 followed by a minus sign, and then the lower-righthand cell as the 237 end point of the selection. 238 http://example.com/data.csv#cell=3,0-5,1 240 The above CSV fragment selects a region that starts at the fourth row 241 and the first column, and ends at the sixth row and the second 242 column: 243 2011-01-03,0 244 2011-01-01,6 245 2011-01-02,8 247 2.5. Multi-Selections 249 Row, column, and cell selections can make more than one selection, in 250 which case the individual selections are separated by semicolons. In 251 these cases, the resulting fragment may be a disjoint fragment, such 252 as the selection "#row=2;5" for the example CSV, which would select 253 the third and the sixth row. It is up to the user agent to decide 254 how to handle disjoint fragments, but since they are allowed, user 255 agents should be prepared to handle disjoint fragments. 257 3. Fragment Identification Syntax 259 The syntax for the text/csv fragment identifiers is as follows. 261 The following syntax definition uses ABNF as defined in RFC 4234 [6], 262 including the rule DIGIT. 264 NOTE: In the descriptions that follow, specified text values MUST be 265 used exactly as given, using exactly the indicated lower-case 266 letters. In this respect, the ABNF usage differs from [6]. 268 csv-fragment = rowsel / colsel / cellsel 269 rowsel = "row=" singlespec 0*( ";" singlespec) 270 colsel = "col=" singlespec 0*( ";" singlespec) 271 cellsel = "cell=" cellspec 0*( ";" cellspec) 272 singlespec = position [ "-" position ] 273 cellspec = cellrow "," cellcol [ "-" cellrow "," cellcol ] 274 cellrow = position 275 cellcol = position 276 position = number / "*" 277 number = 1*( DIGIT ) 279 4. Fragment Identifier Processing 281 Applications implementing support for the mechanism described in this 282 memo MUST behave as described in the following sections. 284 4.1. Syntax Errors in Fragment Identifiers 286 If a fragment identifier contains a syntax error (i.e., does not 287 conform to the syntax specified in Section 3), then it MUST be 288 ignored by clients. Clients MUST NOT make any attempt to correct or 289 guess fragment identifiers. Syntax errors MAY be reported by 290 clients. 292 4.2. Semantics of Fragment Identifiers 294 Rows and columns in CSV are counted from zero. Positions thus refer 295 to the rows and columns starting from position 0, which identifies 296 the first row or column of a CSV. The special character "*" can be 297 used to refer to the last rwo or column of a CSV, thus allowing 298 fragment identifiers to easily identify ranges that extend to the 299 last row or column. 301 If single selections refer to non-existing rows or columns (i.e., 302 beyond the size of of the CSV), they MUST be ignored. 304 If ranges extend beyond the size of the CSV (by extending to row or 305 columns beyond the size of the CSV), they MUST be interpreted to only 306 extend to the actual size of the CSV. 308 If selections of ranges of rows or columns or selections of cell 309 ranges are specified in a way so that they select "inversely" (i.e., 310 "#row=10-5" or "#cell=10,10-5,5"), they MUST be ignored. 312 Each specification of an identified region is processed 313 independently, and ignored specifications (because of reason listed 314 in the previous paragraphs) to not cause the whole fragment 315 identifier to fail, they just mean that this single specification is 316 ignored. For the example file, the fragment identifier "#row=0-1,4- 317 3,12-15" does identify the frist two rows, because the second 318 specification is an "inverse" one and thus ignored, and the third 319 specification selects rows beyond the actual size of the CSV. 321 The result of evaluating the complete fragment identifier joins all 322 the successfully evaluated identified parts, and then treats this 323 joint fragment as the single identified fragment. This fragment can 324 be disjoint because of multiple selections. Multiple selections also 325 can result in overlapping individual parts, and it is up to the user 326 agent how to process such a fragment, and whether the individual 327 parts are still made accessible (i.e., visualized in visual user 328 agents), or are presented as one unit. For example, the fragment 329 identifier "#row=2-5,3-4" contains a second identified part that is 330 completely contained in the first identified part. Whether a user 331 agent maintains this selection as two parts, or simply signals that 332 the identified fragment spans from the third to the sixth row, is up 333 for the user agent to decide. 335 5. IANA Considerations 337 Note to RFC Editor: Please change this section to read as follows 338 after the IANA action has been completed: "IANA has added a reference 339 to this specification in the text/csv Media Type registration." 341 IANA is requested to update the registration of the MIME Media type 342 text/csv at http://www.iana.org/assignments/media-types/text/ with 343 the fragment identifier defined in this memo by adding a reference to 344 this memo (with the appropriate RFC number once it is known). 346 6. Security Considerations 348 The fact that software implementing fragment identifiers for CSV and 349 software not implementing them differs in behavior, and the fact that 350 different software may show documents or fragments to users in 351 different ways, can lead to misunderstandings on the part of users. 352 Such misunderstandings might be exploited in a way similar to 353 spoofing or phishing. 355 ... 357 Implementers and users of fragment identifiers for CSV text should 358 also be aware of the security considerations in RFC 3986 [4] and RFC 359 3987 [7]. 361 7. Change Log 363 Note to RFC Editor: Please remove this section before publication. 365 7.1. From -01 to -02 367 o Removed slices ("#where:") as fragment identification method. 369 o Removed any special support for headers, which means that they are 370 now treated as a regular (the first) row (if a header row is 371 present). 373 o Changed semantics and syntax to allow multiple selection of rows, 374 columns, and cells, and to allow ranges of rows and columns. 376 7.2. From -00 to -01 378 o Added cell-based selections. 380 o Added Jeni Tennison as author; updated Erik Wilde's affiliation to 381 EMC. 383 8. References 385 8.1. Normative References 387 [1] Shafranovich, Y., "Common Format and MIME Type for Comma- 388 Separated Values (CSV) Files", RFC 4180, October 2005. 390 [2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 391 Extensions (MIME) Part One: Format of Internet Message Bodies", 392 RFC 2045, November 1996. 394 [3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 395 Extensions (MIME) Part Two: Media Types", RFC 2046, 396 November 1996. 398 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 399 Resource Identifier (URI): Generic Syntax", RFC 3986, 400 January 2005. 402 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 403 Levels", RFC 2119, March 1997. 405 [6] Crocker, D. and P. Overell, "Augmented BNF for Syntax 406 Specifications: ABNF", RFC 4234, October 2005. 408 [7] Duerst, M. and M. Suignard, "Internationalized Resource 409 Identifiers (IRI)", RFC 3987, January 2005. 411 8.2. Non-Normative References 413 [8] ANSI X3.4-1986, "Coded Character Set - 7-Bit American National 414 Standard Code for Information Interchange", STD 63, RFC 3629, 415 1992. 417 [9] Wilde, E. and M. Duerst, "URI Fragment Identifiers for the 418 text/plain Media Type", RFC 5147, April 2008. 420 [10] Freed, N. and J. Klensin, "Media Type Specifications and 421 Registration Procedures", RFC 4288, December 2005. 423 URIs 425 [11] 427 [12] 429 Appendix A. Acknowledgements 431 Thanks for comments and suggestions provided by Richard, Ian, Gannon. 433 Authors' Addresses 435 Michael Hausenblas 436 DERI, NUI Galway 437 IDA Business Park 438 Galway 439 Ireland 441 Phone: +353-91-495730 442 Email: michael.hausenblas@deri.org 443 URI: http://sw-app.org/about.html 445 Erik Wilde 446 EMC Corporation 447 6801 Koll Center Parkway 448 Pleasanton, CA 94566 449 U.S.A. 451 Phone: +1-925-6006244 452 Email: erik.wilde@emc.com 453 URI: http://dret.net/netdret/ 454 Jeni Tennison 455 Open Data Institute 456 65 Clifton Street 457 London EC2A 4JE 458 U.K. 460 Phone: +44-797-4420482 461 Email: jeni@jenitennison.com 462 URI: http://www.jenitennison.com/blog/