idnits 2.17.1 draft-montenegro-httpbis-uri-encoding-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (14 February 2014) is 3724 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group O. Mazahir 2 Internet Draft D. Thaler 3 Intended status: Standards Track M. Cox 4 Expires: August 2014 G. Montenegro 5 Microsoft Corporation 6 14 February 2014 8 Deterministic URI Encoding 9 draft-montenegro-httpbis-uri-encoding-00 11 Abstract 13 The "http" and "https" URI schemes do not have a fixed character 14 encoding. This document defines HTTP headers to enable an 15 explicit indication of the character encoding. 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance 20 with the provisions of BCP 78 and BCP 79. This document may 21 contain material from IETF Documents or IETF Contributions 22 published or made publicly available before November 10, 2008. 23 The person(s) controlling the copyright in some of this material 24 may not have granted the IETF Trust the right to allow 25 modifications of such material outside the IETF Standards 26 Process. Without obtaining an adequate license from the 27 person(s) controlling the copyright in such materials, this 28 document may not be modified outside the IETF Standards Process, 29 and derivative works of it may not be created outside the IETF 30 Standards Process, except to format it for publication as an RFC 31 or to translate it into languages other than English. 33 Internet-Drafts are working documents of the Internet 34 Engineering Task Force (IETF), its areas, and its working 35 groups. Note that other groups may also distribute working 36 documents as Internet-Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six 39 months and may be updated, replaced, or obsoleted by other 40 documents at any time. It is inappropriate to use Internet- 41 Drafts as reference material or to cite them other than as "work 42 in progress." 44 The list of current Internet-Drafts can be accessed at 45 http://www.ietf.org/ietf/1id-abstracts.txt. 47 The list of Internet-Draft Shadow Directories can be accessed at 48 http://www.ietf.org/shadow.html. 50 This Internet-Draft will expire on August, 2014. 52 Copyright 54 Copyright (c) 2014 IETF Trust and the persons identified as the 55 document authors. All rights reserved. 57 This document is subject to BCP 78 and the IETF Trust's Legal 58 Provisions Relating to IETF Documents 59 (http://trustee.ietf.org/license-info) in effect on the date of 60 publication of this document. Please review these documents 61 carefully, as they describe your rights and restrictions with 62 respect to this document. Code Components extracted from this 63 document must include Simplified BSD License text as described 64 in Section 4.e of the Trust Legal Provisions and are provided 65 without warranty as described in the Simplified BSD License. 67 Table of Contents 69 1. Introduction...................................................2 70 1.1. Requirements Language.....................................3 71 2. URI Path and Query Encoding Headers............................3 72 3. IANA Considerations............................................4 73 3.1. URI-Path-Encoding.........................................4 74 3.2. URI-Query-Encoding........................................4 75 4. Security Considerations........................................5 76 5. Acknowledgments................................................5 77 6. References.....................................................5 78 6.1. Normative References......................................5 79 6.2. Informative References....................................5 80 7. Author's Addresses.............................................6 82 1. Introduction 84 The "http" and "https" URI schemes don't have a fixed character 85 encoding. The URI RFC [RFC3986] talks about the generic syntax 86 for URI components: 88 . Legacy URI components (before 2005) tend to use UTF-8 "or 89 some other superset of the US-ASCII character encoding" 90 . New schemes (after 2005) use UTF-8 with percent encoding for 91 reserved characters. 93 The first bullet explains why the character encoding for "http" 94 and "https" URIs is not deterministic. This is particularly 95 problematic when parsing URIs at the server side or at 96 intermediate proxies (e.g., when looking for a cache hit). 98 URI's have different components with different character 99 encoding issues. 101 Per the IDNA rules in [RFC5890], the host component is encoded 102 using A-labels. 104 There is more non-determinism with respect to the path and query 105 components. Furthermore, these two components are not 106 necessarily encoded the same way [Handbook]. 108 This document defines HTTP headers that explicitly state the 109 character encoding for the path and query components. 111 1.1. Requirements Language 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 114 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 115 "OPTIONAL" in this document are to be interpreted as described 116 in RFC 2119 [RFC2119]. 118 2. URI Path and Query Encoding Headers 120 The URI Path encoding is conveyed in the following header: 122 URI-Path-Encoding = "URI-Path-Encoding" ":" 1charset 124 The URI Query encoding is conveyed in the following header: 126 URI-Query-Encoding = "URI-Query-Encoding" ":" 1charset 128 charset is defined in section 3.4 of [RFC2616]. The expected value 129 indicates the character encoding for the path or query component in 130 the URI prior to percent encoding. (A value of UTF-8 does not mean 131 that the URI carries raw UTF-8.) 133 If the user agent is certain that the path component was formed from 134 percent-encoded UTF-8, it sets the header as follows: 136 URI-Path-Encoding: UTF-8 138 Similarly, for the query component: 140 URI-Query-Encoding: UTF-8 142 This signals that the query component in the URI is in UTF-8 with 143 percent encoding. 145 Absence of the URI-Path-Encoding or URI-Query-Encoding header is 146 equivalent to the legacy situation of non-determinism with respect 147 to the path or query component, respectively, as mentioned above in 148 section 1. 150 Likewise, if the URI-Path-Encoding or URI-Query-Encoding header is 151 set to an invalid value or unrecognized charset, this is equivalent 152 to the legacy situation of non-determinism with respect to the path 153 or query component, respectively, mentioned above in section 1. 155 3. IANA Considerations 157 IANA is requested to add these headers to the "Permanent Message 158 Header Field Names" registry. Per [RFC3864], the template for 159 these headers is specified below. 161 3.1. URI-Path-Encoding 163 Applicable protocol: http 165 Status: standard 167 Author/change controller: 169 IETF (iesg@ietf.org) 171 Specification document(s): 173 This document. 175 3.2. URI-Query-Encoding 177 Applicable protocol: http 179 Status: standard 181 Author/change controller: 183 IETF (iesg@ietf.org) 185 Specification document(s): 187 This document. 189 4. Security Considerations 191 Due to the non-deterministic character encoding of URI's, URI 192 parsing at servers or proxies currently may involve trying 193 different possible character encodings searching for a match. 194 This represents a potential attack vector [RFC6943]. The headers 195 proposed in this document could be used to reduce the attack 196 surface by enabling a more explicit interpretation of the data 197 within a URI, thus preventing unintended consequences. 199 5. Acknowledgments 201 Thanks to Ivan Pashov and Wade Hilmo for useful discussions in 202 this space. 204 This document was prepared using 2-Word-v2.0.template.doc. 206 6. References 208 6.1. Normative References 210 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 211 Requirement Levels", BCP 14, RFC 2119, March 1997. 213 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 214 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 215 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 217 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, 218 "Uniform Resource Identifier (URI): Generic Syntax", 219 STD 66, RFC 3986, January 2005. 221 6.2. Informative References 223 [Handbook] Zalewski, M., "Browser Security Handbook, part 1", 224 http://code.google.com/p/browsersec/wiki/Part1 225 March 2011. 227 [RFC3864] Klyne, G., Nottingham, M., and J. Mogul, "Registration 228 Procedures for Message Header Fields", BCP 90, RFC 3864, 229 September 2004. 231 [RFC5890] Klensin, J., "Internationalized Domain Names for 232 Applications (IDNA): Definitions and Document Framework", 233 RFC 5890, August 2010. 235 [RFC6943] Thaler, D., Ed., "Issues in Identifier Comparison for 236 Security Purposes", RFC 6943, May 2013. 238 7. Author's Addresses 240 Osama Mazahir 241 Microsoft Corporation 243 Email: OsamaM@microsoft.com 245 Dave Thaler 246 Microsoft Corporation 248 Email: DThaler@microsoft.com 250 Matthew Cox 251 Microsoft Corporation 253 Email: MaCox@microsoft.com 255 Gabriel Montenegro 256 Microsoft Corporation 258 Phone: 259 Email: gabriel.montenegro@microsoft.com