idnits 2.17.1 draft-hausenblas-csv-fragment-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([11], [12]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. -- The draft header indicates that this document updates RFC4180, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC4180, updated by this document, for RFC5378 checks: 2005-02-03) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2, 2013) is 3951 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 4234 (ref. '6') (Obsoleted by RFC 5234) Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Hausenblas 3 Internet-Draft DERI, NUI Galway 4 Updates: 4180 (if approved) E. Wilde 5 Intended status: Informational EMC Corporation 6 Expires: January 3, 2014 J. Tennison 7 Open Data Institute 8 July 2, 2013 10 URI Fragment Identifiers for the text/csv Media Type 11 draft-hausenblas-csv-fragment-04 13 Abstract 15 This memo defines URI fragment identifiers for text/csv MIME 16 entities. These fragment identifiers make it possible to refer to 17 parts of a text/csv MIME entity, identified by row, column, or cell. 18 Fragment identification can use single items, or ranges. 20 Note to Readers 22 This draft should be discussed on the apps-discuss mailing list [11]. 24 Online access to all versions and files is available on github [12]. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on January 3, 2014. 43 Copyright Notice 45 Copyright (c) 2013 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. What is text/csv? . . . . . . . . . . . . . . . . . . . . 3 62 1.2. Why text/csv Fragment Identifiers? . . . . . . . . . . . . 3 63 1.2.1. Motivation . . . . . . . . . . . . . . . . . . . . . . 3 64 1.2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . 4 65 1.3. Incremental Deployment . . . . . . . . . . . . . . . . . . 4 66 1.4. Notation Used in this Memo . . . . . . . . . . . . . . . . 4 67 2. Fragment Identification Methods . . . . . . . . . . . . . . . 4 68 2.1. Row-based selection . . . . . . . . . . . . . . . . . . . 5 69 2.2. Column-based selection . . . . . . . . . . . . . . . . . . 5 70 2.3. Cell-based selection . . . . . . . . . . . . . . . . . . . 6 71 2.4. Multi-Selections . . . . . . . . . . . . . . . . . . . . . 6 72 3. Fragment Identification Syntax . . . . . . . . . . . . . . . . 7 73 4. Fragment Identifier Processing . . . . . . . . . . . . . . . . 7 74 4.1. Syntax Errors in Fragment Identifiers . . . . . . . . . . 7 75 4.2. Semantics of Fragment Identifiers . . . . . . . . . . . . 7 76 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 77 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 78 7. Implementation Status . . . . . . . . . . . . . . . . . . . . 9 79 8. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 9 80 8.1. From -03 to -04 . . . . . . . . . . . . . . . . . . . . . 9 81 8.2. From -02 to -03 . . . . . . . . . . . . . . . . . . . . . 9 82 8.3. From -01 to -02 . . . . . . . . . . . . . . . . . . . . . 9 83 8.4. From -00 to -01 . . . . . . . . . . . . . . . . . . . . . 10 84 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 85 9.1. Normative References . . . . . . . . . . . . . . . . . . . 10 86 9.2. Non-Normative References . . . . . . . . . . . . . . . . . 10 87 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 11 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 90 1. Introduction 92 This memo updates the text/csv media type defined in RFC 4180 [1] by 93 defining URI fragment identifiers for text/csv MIME entities. 95 This section gives an introduction to the general concepts of text/ 96 csv MIME entities and URI fragment identifiers, and discusses the 97 need for fragment identifiers for text/csv and deployment issues. 98 Section 2 discusses the principles and methods on which this memo is 99 based. Section 3 defines the syntax, and Section 4 discusses 100 processing of text/csv fragment identifiers. 102 1.1. What is text/csv? 104 Internet Media Types (often referred to as "MIME types") as defined 105 in RFC 2045 [2] and RFC 2046 [3] are used to identify different types 106 and sub-types of media. The text/csv media type is defined in RFC 107 4180 [1], using US-ASCII [8] as the default character encoding (other 108 character encodings can be used as well). Apart from a media type 109 parameter for specifying the character encoding ("charset"), there is 110 a second media type parameter ("header") that indicates whether there 111 is a header row in the CSV document or not. 113 1.2. Why text/csv Fragment Identifiers? 115 URIs are the identification mechanism for resources on the Web. The 116 URI syntax specified in RFC 3986 [4] optionally includes a so-called 117 "fragment identifier", separated by a number sign ("#"). The 118 fragment identifier consists of additional reference information to 119 be interpreted by the client after the retrieval action has been 120 successfully completed. The semantics of a fragment identifier is a 121 property of the media type resulting from a retrieval action, 122 regardless of the URI scheme used in the URI reference. Therefore, 123 the format and interpretation of fragment identifiers is dependent on 124 the media type of the retrieval result. 126 1.2.1. Motivation 128 Similar to the motivation in RFC 5147 [9], which defines fragment 129 identifiers for plain text files, referring to specific parts of a 130 resource can be very useful, because it enables users and 131 applications to create more specific references. Users can create 132 references to the part they really are interested in or want to talk 133 about, rather than always pointing to a complete resource. Even 134 though it is suggested that fragment identification methods are 135 specified in a media type's registration (see [10]), many media types 136 do not have fragment identification methods associated with them. 138 Fragment identifiers are only useful if supported by the client, 139 because they are only interpreted by the client. Therefore, a new 140 fragment identification method will require some time to be adopted 141 by clients, and older clients will not support it. However, because 142 the URI still works even if the fragment identifier is not supported 143 (the resource is retrieved, but the fragment identifier is not 144 interpreted), rapid adoption is not highly critical to ensure the 145 success of a new fragment identification method. 147 1.2.2. Use Cases 149 Fragment identifiers for text/csv as defined in this memo make it 150 possible to refer to specific parts of a text/csv MIME entity. Use 151 cases include, but are not limited to, selecting a part for visual 152 rendering, stream processing, making assertions about a certain value 153 (provenance, confidence, comments, etc.), or data integration. 155 1.3. Incremental Deployment 157 As long as text/csv fragment identifiers are not supported 158 universally, it is important to consider the implications of 159 incremental deployment. Clients (for example, Web browsers) not 160 supporting the text/csv fragment identifier described in this memo 161 will work with URI references to text/csv MIME entities, but they 162 will fail to understand the identification of the sub-resource 163 specified by the fragment identifier, and thus will behave as if the 164 complete resource was referenced. This is a reasonable fallback 165 behavior, and in general users should take into account the 166 possibility that a program interpreting a given URI will fail to 167 interpret the fragment identifier part. Since fragment identifier 168 evaluation is local to the client (and happens after retrieving the 169 MIME entity), there is no reliable way for a server to determine 170 whether a requesting client is using a URI containing a fragment 171 identifier. 173 1.4. Notation Used in this Memo 175 The capitalized key words "MUST", "MUST NOT", "REQUIRED", "SHALL", 176 "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 177 "OPTIONAL" in this document are to be interpreted as described in RFC 178 2119 [5]. 180 2. Fragment Identification Methods 182 This memo specifies fragment identification using following methods: 183 "row" for row selections, "col" for columns selections, and "cell" 184 for cell selections. 186 Throughout the sections below, the following example table in CSV 187 (having 7 rows, including one header row, and 3 columns) is used: 188 date,temperature,place 189 2011-01-01,1,Galway 190 2011-01-02,-1,Galway 191 2011-01-03,0,Galway 192 2011-01-01,6,Berkeley 193 2011-01-02,8,Berkeley 194 2011-01-03,5,Berkeley 196 2.1. Row-based selection 198 To select a specific record, the "row" scheme followed by a single 199 number is used (the first row is at position 1). 200 http://example.com/data.csv#row=4 202 The above CSV fragment identifies the fourth row: 203 2011-01-03,0,Galway 205 Fragments can also select ranges of rows: 206 http://example.com/data.csv#row=5-7 208 The above CSV fragment identifies three consecutive rows: 209 2011-01-01,6,Berkeley 210 2011-01-02,8,Berkeley 211 2011-01-03,5,Berkeley 213 The value "*" can be used to indicate the last row, so the previous 214 URI is equivalent to: 215 http://example.com/data.csv#row=5-* 217 2.2. Column-based selection 219 To select values from a certain column, the "col" scheme is used, 220 followed by a position (the first column is at position 1): 221 http://example.com/data.csv#col=2 223 The above CSV fragment addresses the second column, identifying the 224 column: 225 temperature 226 1 227 -1 228 0 229 6 230 8 231 5 233 The "col" scheme can also be used to identify ranges of columns: 235 http://example.com/data.csv#col=1-2 237 The above CSV fragment addresses the first and second column: 238 date,temperature 239 2011-01-01,1 240 2011-01-02,-1 241 2011-01-03,0 242 2011-01-01,6 243 2011-01-02,8 244 2011-01-03,5 246 As for rows, the value "*" can be used to indicate the last column. 248 2.3. Cell-based selection 250 To select particular fields, the "cell" scheme is used, followed by a 251 row number, a comma, and a column number. 252 http://example.com/data.csv#cell=4,1 254 The above CSV fragment addresses the field in the first column within 255 the fourth row, yielding: 256 2011-01-03 258 It is also possible to select cell-based fragments that have more 259 than just one cell, in which case the cell selection uses the same 260 range syntax as for row and column range selections. For these 261 selections, the syntax uses the upper-lefthand cell as the starting 262 point of the selection, followed by a minus sign, and then the lower- 263 righthand cell as the end point of the selection. 264 http://example.com/data.csv#cell=4,1-6,2 266 The above CSV fragment selects a region that starts at the fourth row 267 and the first column, and ends at the sixth row and the second 268 column: 269 2011-01-03,0 270 2011-01-01,6 271 2011-01-02,8 273 2.4. Multi-Selections 275 Row, column, and cell selections can make more than one selection, in 276 which case the individual selections are separated by semicolons. In 277 these cases, the resulting fragment may be a disjoint fragment, such 278 as the selection "#row=3;6" for the example CSV, which would select 279 the third and the sixth row. It is up to the user agent to decide 280 how to handle disjoint fragments, but since they are allowed, user 281 agents should be prepared to handle disjoint fragments. 283 3. Fragment Identification Syntax 285 The syntax for the text/csv fragment identifiers is as follows. 287 The following syntax definition uses ABNF as defined in RFC 4234 [6], 288 including the rule DIGIT. 290 NOTE: In the descriptions that follow, specified text values MUST be 291 used exactly as given, using exactly the indicated lower-case 292 letters. In this respect, the ABNF usage differs from [6]. 294 csv-fragment = rowsel / colsel / cellsel 295 rowsel = "row=" singlespec 0*( ";" singlespec) 296 colsel = "col=" singlespec 0*( ";" singlespec) 297 cellsel = "cell=" cellspec 0*( ";" cellspec) 298 singlespec = position [ "-" position ] 299 cellspec = cellrow "," cellcol [ "-" cellrow "," cellcol ] 300 cellrow = position 301 cellcol = position 302 position = number / "*" 303 number = 1*( DIGIT ) 305 4. Fragment Identifier Processing 307 Applications implementing support for the mechanism described in this 308 memo MUST behave as described in the following sections. 310 4.1. Syntax Errors in Fragment Identifiers 312 If a fragment identifier contains a syntax error (i.e., does not 313 conform to the syntax specified in Section 3), then it MUST be 314 ignored by clients. Clients MUST NOT make any attempt to correct or 315 guess fragment identifiers. Syntax errors MAY be reported by 316 clients. 318 4.2. Semantics of Fragment Identifiers 320 Rows and columns in CSV are counted from one. Positions thus refer 321 to the rows and columns starting from position 1, which identifies 322 the first row or column of a CSV. The special character "*" can be 323 used to refer to the last row or column of a CSV, thus allowing 324 fragment identifiers to easily identify ranges that extend to the 325 last row or column. 327 If single selections refer to non-existing rows or columns (i.e., 328 beyond the size of of the CSV), they MUST be ignored. 330 If ranges extend beyond the size of the CSV (by extending to rows or 331 columns beyond the size of the CSV), they MUST be interpreted to only 332 extend to the actual size of the CSV. 334 If selections of ranges of rows or columns or selections of cell 335 ranges are specified in a way so that they select "inversely" (i.e., 336 "#row=10-5" or "#cell=10,10-5,5"), they MUST be ignored. 338 Each specification of an identified region is processed 339 independently, and ignored specifications (because of reason listed 340 in the previous paragraphs) do not cause the whole fragment 341 identifier to fail, they just mean that this single specification is 342 ignored. For the example file, the fragment identifier "#row=1-2;5- 343 4;13-16" does identify the first two rows: the second specification 344 is an "inverse" specification and thus is ignored, and the third 345 specification targets rows beyond the actual size of the CSV and thus 346 is also ignored. 348 The complete fragment identifier identifies all the successfully 349 evaluated identified parts as a single identified fragment. This 350 fragment can be disjoint because of multiple selections. Multiple 351 selections also can result in overlapping individual parts, and it is 352 up to the user agent how to process such a fragment, and whether the 353 individual parts are still made accessible (i.e., visualized in 354 visual user agents), or are presented as one unit. For example, the 355 fragment identifier "#row=3-6;4-5" contains a second identified part 356 that is completely contained in the first identified part. Whether a 357 user agent maintains this selection as two parts, or simply signals 358 that the identified fragment spans from the third to the sixth row, 359 is up for the user agent to decide. 361 5. IANA Considerations 363 Note to RFC Editor: Please change this section to read as follows 364 after the IANA action has been completed: "IANA has added a reference 365 to this specification in the text/csv Media Type registration." 367 IANA is requested to update the registration of the MIME Media type 368 text/csv at http://www.iana.org/assignments/media-types/text/ with 369 the fragment identifier defined in this memo by adding a reference to 370 this memo (with the appropriate RFC number once it is known). 372 6. Security Considerations 374 The fact that software implementing fragment identifiers for CSV and 375 software not implementing them differs in behavior, and the fact that 376 different software may show documents or fragments to users in 377 different ways, can lead to misunderstandings on the part of users. 378 Such misunderstandings might be exploited in a way similar to 379 spoofing or phishing. 381 Implementers and users of fragment identifiers for CSV text should 382 also be aware of the security considerations in RFC 3986 [4] and RFC 383 3987 [7]. 385 7. Implementation Status 387 Note to RFC Editor: Please remove this section before publication. 389 As explained in a draft currently under development 390 , this section 391 contains information about implementation status, so that reviews of 392 the draft document can take implementation reports into account as 393 well. If you are implementing this draft, please contact this 394 draft's authors. Any implementation status reports are intended for 395 draft publications only; the section will be removed when the draft 396 is published in RFC form. 398 8. Change Log 400 Note to RFC Editor: Please remove this section before publication. 402 8.1. From -03 to -04 404 o Switched category from "std" to "info". 406 o Changed the definition of positions to start counting from 1 407 instead of 0. 409 8.2. From -02 to -03 411 o Added section on "Implementation Status" (Section 7). 413 o Added examples of ranges of rows and columns. 415 o Corrected errors in examples. 417 8.3. From -01 to -02 419 o Removed slices ("#where:") as fragment identification method. 421 o Removed any special support for headers, which means that they are 422 now treated as a regular (the first) row (if a header row is 423 present). 425 o Changed semantics and syntax to allow multiple selection of rows, 426 columns, and cells, and to allow ranges of rows and columns. 428 8.4. From -00 to -01 430 o Added cell-based selections. 432 o Added Jeni Tennison as author; updated Erik Wilde's affiliation to 433 EMC. 435 9. References 437 9.1. Normative References 439 [1] Shafranovich, Y., "Common Format and MIME Type for Comma- 440 Separated Values (CSV) Files", RFC 4180, October 2005. 442 [2] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 443 Extensions (MIME) Part One: Format of Internet Message Bodies", 444 RFC 2045, November 1996. 446 [3] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 447 Extensions (MIME) Part Two: Media Types", RFC 2046, 448 November 1996. 450 [4] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 451 Resource Identifier (URI): Generic Syntax", RFC 3986, 452 January 2005. 454 [5] Bradner, S., "Key words for use in RFCs to Indicate Requirement 455 Levels", RFC 2119, March 1997. 457 [6] Crocker, D. and P. Overell, "Augmented BNF for Syntax 458 Specifications: ABNF", RFC 4234, October 2005. 460 [7] Duerst, M. and M. Suignard, "Internationalized Resource 461 Identifiers (IRI)", RFC 3987, January 2005. 463 9.2. Non-Normative References 465 [8] ANSI X3.4-1986, "Coded Character Set - 7-Bit American National 466 Standard Code for Information Interchange", STD 63, RFC 3629, 467 1992. 469 [9] Wilde, E. and M. Duerst, "URI Fragment Identifiers for the 470 text/plain Media Type", RFC 5147, April 2008. 472 [10] Freed, N., Klensin, J., and T. Hansen, "Media Type 473 Specifications and Registration Procedures", BCP 13, RFC 6838, 474 January 2013. 476 URIs 478 [11] 480 [12] 482 Appendix A. Acknowledgements 484 Thanks for comments and suggestions provided by Richard Cyganiak, Ian 485 Davis, Leigh Dodds, and Gannon Dick. 487 Authors' Addresses 489 Michael Hausenblas 490 DERI, NUI Galway 491 IDA Business Park 492 Galway 493 Ireland 495 Phone: +353-91-495730 496 Email: michael.hausenblas@deri.org 497 URI: http://sw-app.org/about.html 499 Erik Wilde 500 EMC Corporation 501 6801 Koll Center Parkway 502 Pleasanton, CA 94566 503 U.S.A. 505 Phone: +1-925-6006244 506 Email: erik.wilde@emc.com 507 URI: http://dret.net/netdret/ 508 Jeni Tennison 509 Open Data Institute 510 65 Clifton Street 511 London EC2A 4JE 512 U.K. 514 Phone: +44-797-4420482 515 Email: jeni@jenitennison.com 516 URI: http://www.jenitennison.com/blog/