idnits 2.17.1 draft-nottingham-http-structure-retrofit-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (6 January 2022) is 835 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Nottingham 3 Internet-Draft 6 January 2022 4 Intended status: Informational 5 Expires: 10 July 2022 7 Retrofit Structured Fields for HTTP 8 draft-nottingham-http-structure-retrofit-02 10 Abstract 12 This specification defines how a selection of existing HTTP fields 13 can be handled as Structured Fields. 15 About This Document 17 This note is to be removed before publishing as an RFC. 19 Status information for this document may be found at 20 https://datatracker.ietf.org/doc/draft-nottingham-http-structure- 21 retrofit/. 23 information can be found at https://mnot.github.io/I-D/. 25 Source for this draft and an issue tracker can be found at 26 https://github.com/mnot/I-D/labels/http-structure-retrofit. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on 10 July 2022. 45 Copyright Notice 47 Copyright (c) 2022 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 52 license-info) in effect on the date of publication of this document. 53 Please review these documents carefully, as they describe your rights 54 and restrictions with respect to this document. Code Components 55 extracted from this document must include Revised BSD License text as 56 described in Section 4.e of the Trust Legal Provisions and are 57 provided without warranty as described in the Revised BSD License. 59 Table of Contents 61 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 62 1.1. Notational Conventions . . . . . . . . . . . . . . . . . 3 63 2. Compatible Fields . . . . . . . . . . . . . . . . . . . . . . 3 64 3. Mapped Fields . . . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1. URLs . . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 3.2. Dates . . . . . . . . . . . . . . . . . . . . . . . . . . 7 67 3.3. ETags . . . . . . . . . . . . . . . . . . . . . . . . . . 8 68 3.4. Links . . . . . . . . . . . . . . . . . . . . . . . . . . 8 69 3.5. Cookies . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 71 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 72 6. Normative References . . . . . . . . . . . . . . . . . . . . 10 73 Appendix A. Data Supporting Field Compatibility . . . . . . . . 11 74 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 14 76 1. Introduction 78 Structured Field Values for HTTP [STRUCTURED-FIELDS] introduced a 79 data model with associated parsing and serialisation algorithms for 80 use by new HTTP field values. Header fields that are defined as 81 Structured Fields can realise a number of benefits, including: 83 * Improved interoperability and security: precisely defined parsing 84 and serialisation algorithms are typically not available for 85 fields defined with just ABNF and/or prose. 87 * Reuse of common implementations: many parsers for other fields are 88 specific to a single field or a small family of fields 90 * Canonical form: because a deterministic serialisation algorithm is 91 defined for each type, Structure Fields have a canonical 92 representation 94 * Enhanced API support: a regular data model makes it easier to 95 expose field values as a native data structure in implementations 97 * Alternative serialisations: While [STRUCTURED-FIELDS] defines a 98 textual serialisation of that data model, other, more efficient 99 serialisations of the underlying data model are also possible. 101 However, a field needs to be defined as a Structured Field for these 102 benefits to be realised. Many existing fields are not, making up the 103 bulk of header and trailer fields seen in HTTP traffic on the 104 Internet. 106 This specification defines how a selection of existing HTTP fields 107 can be handled as Structured Fields, so that these benefits can be 108 realised -- thereby making them Retrofit Structured Fields. 110 It does so using two techniques. Section 2 lists compatible fields 111 -- those that can be handled as if they were Structured Fields due to 112 the similarity of their defined syntax to that in Structured Fields. 113 Section 3 lists mapped fields -- those whose syntax needs to be 114 transformed into an underlying data model which is then mapped into 115 that defined by Structured Fields. 117 While implementations can parse and serialise Compatible Fields as 118 Structured Fields subject to the caveats in Section 2, a sender 119 cannot generate mapped fields from Section 3 and expect them to be 120 understood and acted upon by the recipient without prior negotiation. 121 This specification does not define such a mechanism. 123 1.1. Notational Conventions 125 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 126 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 127 "OPTIONAL" in this document are to be interpreted as described in BCP 128 14 [RFC2119] [RFC8174] when, and only when, they appear in all 129 capitals, as shown here. 131 2. Compatible Fields 133 HTTP fields with the following names can usually have their values 134 handled as Structured Fields according to the listed parsing and 135 serialisation algorithms in [STRUCTURED-FIELDS], subject to the 136 listed caveats. 138 The listed types are chosen for compatibility with the defined syntax 139 of the field as well as with actual Internet traffic (see 140 Appendix A). However, not all instances of these fields will 141 successfully parse. This might be because the field value is clearly 142 invalid, or it might be because it is valid but not parseable as a 143 Structured Field. 145 An application using this specification will need to consider how to 146 handle such field values. Depending on its requirements, it might be 147 advisable to reject such values, treat them as opaque strings, or 148 attempt to recover a structured value from them in an ad hoc fashion. 150 * Accept - List 152 * Accept-Encoding - List 154 * Accept-Language - List 156 * Accept-Patch - List 158 * Accept-Ranges - List 160 * Access-Control-Allow-Credentials - Item 162 * Access-Control-Allow-Headers - List 164 * Access-Control-Allow-Methods - List 166 * Access-Control-Allow-Origin - Item 168 * Access-Control-Expose-Headers - List 170 * Access-Control-Max-Age - Item 172 * Access-Control-Request-Headers - List 174 * Access-Control-Request-Method - Item 176 * Age - Item 178 * Allow - List 180 * ALPN - List 182 * Alt-Svc - Dictionary 184 * Alt-Used - Item 186 * Cache-Control - Dictionary 188 * Connection - List 190 * Content-Encoding - List 192 * Content-Language - List 193 * Content-Length - List 195 * Content-Type - Item 197 * Cross-Origin-Resource-Policy - Item 199 * Expect - Item 201 * Expect-CT - Dictionary 203 * Host - Item 205 * Keep-Alive - Dictionary 207 * Origin - Item 209 * Pragma - Dictionary 211 * Prefer - Dictionary 213 * Preference-Applied - Dictionary 215 * Retry-After - Item 217 * Surrogate-Control - Dictionary 219 * TE - List 221 * Timing-Allow-Origin: List 223 * Trailer - List 225 * Transfer-Encoding - List 227 * Vary - List 229 * X-Content-Type-Options - Item 231 * X-Frame-Options - Item 233 * X-XSS-Protection - List 235 Note the following caveats: 237 Parameter names: HTTP parameter names are case-insensitive (as per 238 Section 5.6.6 of [HTTP]), but Structured Fields require them to be 239 all-lowercase. Although the vast majority of parameters seen in 240 typical traffic are all-lowercase, compatibility can be improved 241 by force-lowercasing parameters when encountered. 243 Empty Field Values: Empty and whitespace-only field values are 244 considered errors in Structured Fields. For compatible fields, an 245 empty field indicates that the field should be silently ignored. 247 Alt-Svc: Some ALPN tokens (e.g., h3-Q43) do not conform to key's 248 syntax. Since the final version of HTTP/3 uses the h3 token, this 249 shouldn't be a long-term issue, although future tokens may again 250 violate this assumption. 252 Cache-Control, Expect-CT, Pragma, Prefer, Preference-Applied, 253 Surrogate-Control: These Dictionary-based fields consider the key to 254 be case-insensitive, but Structured Fields requires keys to be 255 all-lowercase. Although the vast majority of values seen in 256 typical traffic are all-lowercase, compatibility can be improved 257 by force-lowercasing these Dictionary keys when encountered. 259 Content-Length: Content-Length is defined as a List because it is 260 not uncommon for implementations to mistakenly send multiple 261 values. See Section 8.6 of [HTTP] for handling requirements. 263 Retry-After: Only the delta-seconds form of Retry-After is 264 supported; a Retry-After value containing a http-date will need to 265 be either converted into delta-seconds or represented as a raw 266 value. 268 3. Mapped Fields 270 Some HTTP fields can have their values represented in Structured 271 Fields by mapping them into its data types and then serialising the 272 result using an alternative field name. 274 For example, the Date HTTP header field carries a string representing 275 a date: 277 Date: Sun, 06 Nov 1994 08:49:37 GMT 279 Its value is more efficiently represented as an integer number of 280 delta seconds from the Unix epoch (00:00:00 UTC on 1 January 1970, 281 minus leap seconds). Thus, the example above would be mapped as: 283 SF-Date: 784072177 284 As in Section 2, these fields are unable to represent values that are 285 not parseable, and so an application using this specification will 286 need to how to support such values. Typically, handling them using 287 the original field name is sufficient. 289 Each field name listed below indicates a replacement field name and a 290 means of mapping its original value into a Structured Field. 292 3.1. URLs 294 The following field names (paired with their replacement field names) 295 have values that can be represented as Structured Fields by 296 considering the original field's value as a string. 298 * Content-Location - SF-Content-Location 300 * Location - SF-Location 302 * Referer - SF-Referer 304 For example, a Location field could be represented as: 306 SF-Location: "https://example.com/foo" 308 3.2. Dates 310 The following field names (paired with their replacement field names) 311 have values that can be represented as Structured Fields by parsing 312 their payload according to Section 5.6.7 of [HTTP] and representing 313 the result as an integer number of seconds delta from the Unix Epoch 314 (00:00:00 UTC on 1 January 1970, minus leap seconds). 316 * Date - SF-Date 318 * Expires - SF-Expires 320 * If-Modified-Since - SF-IMS 322 * If-Unmodified-Since - SF-IUS 324 * Last-Modified - SF-LM 326 For example, an Expires field could be represented as: 328 SF-Expires: 1571965240 330 3.3. ETags 332 The field value of the ETag header field can be represented as a 333 String Structured Field by representing the entity-tag as a string, 334 and the weakness flag as a boolean "w" parameter on it, where true 335 indicates that the entity-tag is weak; if 0 or unset, the entity-tag 336 is strong. 338 For example: 340 SF-ETag: "abcdef"; w=?1 342 If-None-Match's field value can be represented as SF-INM, which is a 343 List of the structure described above. 345 For example: 347 SF-INM: "abcdef"; w=?1, "ghijkl" 349 3.4. Links 351 The field value of the Link header field [RFC8288] can be represented 352 in the SF-Link List Structured Field by representing the URI- 353 Reference as a string, and link-param as parameters. 355 For example: 357 SF-Link: "/terms"; rel="copyright"; anchor="#foo" 359 3.5. Cookies 361 The field values of the Cookie and Set-Cookie fields [RFC6265] can be 362 represented in the SF-Cookie Structured Field (a List) and SF-Set- 363 Cookie Structured Field (a Dictionary), respectively. 365 In each case, cookie names are serialized as tokens, whereas their 366 values are serialised as Strings, unless they can be represented 367 accurately and unambiguously using the textual representation of 368 another structured types (e.g., an Integer or Decimal). 370 Set-Cookie parameters map to parameters on the appropriate SF-Set- 371 Cookie member, with the parameter name being forced to lowercase. 372 Set-Cookie parameter values are Strings unless a specific type is 373 defined. This specification defines the following parameter types: 375 * Max-Age: Integer 377 * Secure: Boolean 378 * HttpOnly: Boolean 380 * SameSite: Token 382 Note that cookies in both fields are separated by commas, not 383 semicolons, and multiple cookies can appear in each field. 385 For example: 387 SF-Set-Cookie: lang=en-US; expires="Wed, 09 Jun 2021 10:18:14 GMT"; 388 samesite=Strict 389 SF-Cookie: SID=31d4d96e407aad42, lang=en-US 391 4. IANA Considerations 393 Please add the following note to the HTTP Field Name Registry: 395 The "Structured Type" column indicates the type of the field as 396 per RFC8941, if any, and may be "Dictionary", "List" or "Item". A 397 prefix of "*" indicates that it is a retrofit type (i.e., not 398 natively Structured); see [this specification]. 400 Then, add a new column, "Structured Type", with the values from 401 Section 2 assigned to the nominated registrations, prefixing each 402 with "*" to indicate that it is a retrofit type. 404 Then, add the following field names into the HTTP Field Name 405 Registry, with the corresponding Structured Type as indicated, a 406 status of "permanent" and referring to this document: 408 * SF-Content-Location - String 410 * SF-Location - String 412 * SF-Referer - String 414 * SF-Date - Integer 416 * SF-Expires - Integer 418 * SF-IMS - Integer 420 * SF-IUS - Integer 422 * SF-LM - Integer 424 * SF-ETag - Item 425 * SF-INM - List 427 * SF-Link - List 429 * SF-Set-Cookie - Dictionary 431 * SF-Cookie - List 433 5. Security Considerations 435 Section 2 identifies existing HTTP fields that can be parsed and 436 serialised with the algorithms defined in [STRUCTURED-FIELDS]. 437 Variances from other implementations might be exploitable, 438 particularly if they allow an attacker to target one implementation 439 in a chain (e.g., an intermediary). However, given the considerable 440 variance in parsers already deployed, convergence towards a single 441 parsing algorithm is likely to have a net security benefit in the 442 longer term. 444 Section 3 defines alternative representations of existing fields. 445 Because downstream consumers might interpret the message differently 446 based upon whether they recognise the alternative representation, 447 implementations are prohibited from generating such fields unless 448 they have negotiated support for them with their peer. This 449 specification does not define such a mechanism, but any such 450 definition needs to consider the implications of doing so carefully. 452 6. Normative References 454 [HTTP] Fielding, R. T., Nottingham, M., and J. Reschke, "HTTP 455 Semantics", Work in Progress, Internet-Draft, draft-ietf- 456 httpbis-semantics-19, 12 September 2021, 457 . 460 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 461 Requirement Levels", BCP 14, RFC 2119, 462 DOI 10.17487/RFC2119, March 1997, 463 . 465 [RFC6265] Barth, A., "HTTP State Management Mechanism", RFC 6265, 466 DOI 10.17487/RFC6265, April 2011, 467 . 469 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 470 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 471 May 2017, . 473 [RFC8288] Nottingham, M., "Web Linking", RFC 8288, 474 DOI 10.17487/RFC8288, October 2017, 475 . 477 [STRUCTURED-FIELDS] 478 Nottingham, M. and P-H. Kamp, "Structured Field Values for 479 HTTP", RFC 8941, DOI 10.17487/RFC8941, February 2021, 480 . 482 Appendix A. Data Supporting Field Compatibility 484 To help guide decisions about compatible fields, the HTTP response 485 headers captured by the HTTP Archive https://httparchive.org 486 (https://httparchive.org) in September 2021 (representing more than 487 528,000,000 HTTP exchanges) were parsed as Structured Fields using 488 the types listed in Section 2, with the indicated number of 489 successful header instances, failures, and the resulting failure 490 rate: 492 accept 9,099 / 34 = 0.372%* 493 accept-encoding 116,708 / 58 = 0.050%* 494 accept-language 127,710 / 95 = 0.074%* 495 accept-patch 281 / 0 = 0.000% 496 accept-ranges 289,341,375 / 7,776 = 0.003% 497 access-control-allow-credentials 36,159,371 / 2,671 = 0.007% 498 access-control-allow-headers 25,980,519 / 23,181 = 0.089% 499 access-control-allow-methods 32,071,437 / 17,424 = 0.054% 500 access-control-allow-origin 165,719,859 / 130,247 = 0.079% 501 access-control-expose-headers 20,787,683 / 1,973 = 0.009% 502 access-control-max-age 9,549,494 / 9,846 = 0.103% 503 access-control-request-headers 165,882 / 503 = 0.302%* 504 access-control-request-method 346,135 / 30,680 = 8.142%* 505 age 107,395,872 / 36,649 = 0.034% 506 allow 579,822 / 281 = 0.048% 507 alt-svc 56,773,977 / 4,914,119 = 7.966% 508 cache-control 395,402,834 / 1,146,080 = 0.289% 509 connection 112,017,641 / 3,491 = 0.003% 510 content-encoding 225,568,224 / 237 = 0.000% 511 content-language 3,339,291 / 1,744 = 0.052% 512 content-length 422,415,406 / 126 = 0.000% 513 content-type 503,950,894 / 507,133 = 0.101% 514 cross-origin-resource-policy 102,483,430 / 799 = 0.001% 515 expect 0 / 53 = 100.000%* 516 expect-ct 54,129,244 / 80,333 = 0.148% 517 host 57,134 / 1,486 = 2.535%* 518 keep-alive 50,606,877 / 1,509 = 0.003% 519 origin 32,438 / 1,396 = 4.126%* 520 pragma 66,321,848 / 97,328 = 0.147% 521 preference-applied 189 / 0 = 0.000% 522 referrer-policy 14,274,787 / 8,091 = 0.057% 523 retry-after 523,533 / 7,585 = 1.428% 524 surrogate-control 282,846 / 976 = 0.344% 525 te 1 / 0 = 0.000% 526 timing-allow-origin 91,979,983 / 8 = 0.000% 527 trailer 1,171 / 0 = 0.000% 528 transfer-encoding 15,098,518 / 0 = 0.000% 529 vary 246,483,644 / 69,607 = 0.028% 530 x-content-type-options 166,063,072 / 237,255 = 0.143% 531 x-frame-options 56,863,322 / 1,014,464 = 1.753% 532 x-xss-protection 132,739,109 / 347,133 = 0.261% 534 Note that this data set only includes response headers, although some 535 request headers are present, indicated with an asterisk (because, the 536 Web). Also, Dictionary and Parameter keys have not been force- 537 lowercased, with the result that any values containing uppercase keys 538 are considered to fail. 540 The top thirty header fields in that data set that were not 541 considered compatible are (* indicates that the field is mapped in 542 Section 3): 544 * *date: 524,810,577 546 * server: 470,777,294 548 * *last-modified: 383,437,099 550 * *expires: 292,109,781 552 * *etag: 255,788,799 554 * strict-transport-security: 111,993,787 556 * x-cache: 70,713,258 558 * via: 55,983,914 560 * cf-ray: 54,556,881 562 * p3p: 54,479,183 564 * report-to: 54,056,804 566 * cf-cache-status: 53,536,789 568 * nel: 44,815,769 570 * x-powered-by: 37,281,354 572 * content-security-policy-report-only: 33,104,387 574 * *location: 30,533,957 576 * x-amz-cf-pop: 28,549,182 578 * x-amz-cf-id: 28,444,359 580 * content-security-policy: 25,404,401 582 * x-served-by: 23,277,252 584 * x-cache-hits: 21,842,899 586 * *link: 20,761,372 587 * x-timer: 18,780,130 589 * content-disposition: 18,516,671 591 * x-request-id: 16,048,668 593 * referrer-policy: 15,596,734 595 * x-cdn: 10,153,756 597 * x-amz-version-id: 9,786,024 599 * x-amz-request-id: 9,680,689 601 * x-dc: 9,557,728 603 Author's Address 605 Mark Nottingham 606 Prahran 607 VIC 608 Australia 610 Email: mnot@mnot.net 611 URI: https://www.mnot.net/