idnits 2.17.1 

draft-duerst-query-i18n-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-26) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 181: '... query component SHOULD be sent back e...'
     RFC 2119 keyword, line 184: '...   nent MUST be sent back according to...'
     RFC 2119 keyword, line 190: '...est-header field MUST be sent back whe...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 1997) is 9782 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'Fileupload' on line 246 looks like a
     reference

  -- Missing reference section? 'RFC1738' on line 264 looks like a reference

  -- Missing reference section? 'URLsyntax' on line 289 looks like a reference

  -- Missing reference section? 'RFC2044' on line 267 looks like a reference

  -- Missing reference section? 'URLprocess' on line 285 looks like a
     reference

  -- Missing reference section? 'IMAPURL' on line 261 looks like a reference

  -- Missing reference section? 'FTPINT' on line 249 looks like a reference

  -- Missing reference section? 'RFC2130' on line 280 looks like a reference

  -- Missing reference section? 'RFC2070' on line 274 looks like a reference

  -- Missing reference section? 'RFC2045' on line 270 looks like a reference

  -- Missing reference section? 'IANA' on line 112 looks like a reference


     Summary: 9 errors (**), 0 flaws (~~), 1 warning (==), 13 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Internet Draft                                               M. Duerst
3	<draft-duerst-query-i18n-00.txt>                  University of Zurich
4	Expires January 1998                                         July 1997

6	          Handling Internationalized Query Components in URLs

8	Status of this Memo

10	   This document is an Internet-Draft.  Internet-Drafts are working doc-
11	   uments of the Internet Engineering Task Force (IETF), its areas, and
12	   its working groups. Note that other groups may also distribute work-
13	   ing documents as Internet-Drafts.

15	   Internet-Drafts are draft documents valid for a maximum of six
16	   months. Internet-Drafts may be updated, replaced, or obsoleted by
17	   other documents at any time.  It is not appropriate to use Internet-
18	   Drafts as reference material or to cite them other than as a "working
19	   draft" or "work in progress".

21	   To learn the current status of any Internet-Draft, please check the
22	   1id-abstracts.txt listing contained in the Internet-Drafts Shadow
23	   Directories on ds.internic.net (US East Coast), nic.nordu.net
24	   (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific
25	   Rim).

27	   Distribution of this document is unlimited.  Please send comments to
28	   the author at <mduerst@ifi.unizh.ch> or to the uri mailing list at
29	   uri@bunyip.com. This document is currently a pre-draft, for
30	   restricted discussion only. It is intended to become part of a suite
31	   of documents related to the internationalization of URLs.

33	Abstract

35	   HTTP and HTML provide the facility to query the user and return the
36	   results. This is usually done in the query component of an URL. This
37	   mechanisms works with full satisfaction for characters of the us-
38	   ascii repertoire. Due to the lack of an agreed encoding for other
39	   characters, the situation is much less satisfactory for characters
40	   outside the us-ascii repertoire.

42	   This document makes two contributions to the problem: (1) It
43	   describes an application convention mostly already respected, and
44	   sufficient in many cases. (2) It introduces an addition to HTTP to
45	   ease the transition to a general internationalized URL architecture.

47	Table of contents

49	   1. Introduction ................................................... 2
50	     1.1 General ......................................................2
51	     1.2 Terms ........................................................3
52	   2. A Simple Application Convention for Browsers ....................3
53	   4. Upgrading of Query Component to UTF-8 ...........................4
54	     3.1 The Query-UTF-8 Request/Response-Header Field ................4
55	     3.2 Rationale ....................................................5
56	   Bibliography .......................................................6
57	   Author's Address ...................................................7

59	1. Introduction

61	1.1 General

63	   HTTP (HyperText Transfer Protocol [HTTP1.1]) and HTML (HyperText
64	   Markup Language [HTML4.0]) provide the facility to query the user
65	   (with a FORM in HTML) and return the results to the server. There are
66	   various ways to return the result (see in particular [Fileupload]),
67	   but the one most widely used is to encode the result in the query
68	   component of an URL [RFC1738, URLsyntax].  This mechanisms work with
69	   full satisfaction for characters of the us-ascii repertoire. Due to
70	   the lack of an agreed encoding for other characters, the situation is
71	   much less satisfactory for characters outside the us-ascii reper-
72	   toire.

74	   Ideally, the problem would be solved by agreeing on a single charac-
75	   ter encoding for all query parts or all URLs. The outstanding candi-
76	   date for this is UTF-8 [RFC2044].  UTF-8 is already the preferred
77	   encoding for new URL schemes [URLprocess], the only encoding for a
78	   recently defined URL scheme [IMAPURL], the encoding on the wire for
79	   beyond-ASCII FTP filenames [FTPINT] (thus making it the encoding for
80	   the ftp: URL scheme) and the encoding suggested for the Internet in
81	   general [RFC2130].  UTF-8 has various important properties, in par-
82	   ticular that it is completely compatible with US-ASCII and is easily
83	   detectable by simple heuristics.

85	   Moving to UTF-8 for URLs is most difficult for the query component.
86	   This is due to the fact that for the other components, in particular
87	   for the path component, the namespace is very sparse and well known
88	   to the server, while it is dense and not well known in the case of
89	   the query part.  To increase the reliability of transmitting query
90	   information, this document describes an existing convention and
91	   proposes some new protocol element for HTTP.

93	1.2 Terms

95	   This section contains definitions and explanations for some terms
96	   that may otherwise not be clear.

98	   -  Accept-Charset attribute: An HTML attribute, proposed in [RFC2070]
99	      and taken up in HTML 4.0 [HTML4.0]. Please note that this is not
100	      the same as the Accept-Charset request-header field in HTTP.
101	      Please also note that the Accept-Charset attribute is on INPUT and
102	      TEXTAREA in RFC 2070, but on FORM in HTML 4.0. The HTML 4.0 syntax
103	      is preferred, and assumed in this document.

105	   -  CGI Script: In the context of this document, a placeholder for any
106	      kind of functional component used to process a response to a
107	      query.

109	   -  Character Encoding: A mapping from an octet sequence to a sequence
110	      of characters. Misleadingly called "character set" in some IETF
111	      documents [RFC2045]. Denoted by the value of the "charset" pra-
112	      mater, with values from the corresponding IANA registry [IANA].

114	   -  Transcoding Server/Proxy: A HTTP Server or Proxy which transcodes
115	      the documents it serves, to respond to an "Accept-Charset" HTTP
116	      request header field.

118	   -  Transcoding: The act of changing the character encoding of a docu-
119	      ment, while not changing it otherwise (the length of the document
120	      may be affected).

122	2. A Simple Application Convention for Browsers

124	   This section spells out an application convention that is in use in
125	   most current and older browsers, although it is not followed, or not
126	   completely followded, by all browsers, and that can be implemented
127	   easily.

129	   The convention is that a user agent should send back the results of a
130	   query in exactly the same character encoding as the character encod-
131	   ing of the document that contained the FORM, as received by the user
132	   agent.

134	   The advantage of this application convention is that it works nicely
135	   for documents and CGI scripts that are assuming a single character
136	   encoding. In the plain case, neither the server nor the CGI script
137	   have to do any special processing such as trying to detect the char-
138	   acter encoding of the query component or transcode the query compo-
139	   nent.

141	   This application convention fails if the document has been transcoded
142	   by a transcoding proxy. The query compontent is sent back in the
143	   character encoding requested by the user agent, which is the target
144	   character encoding of the transcoding undergone at the proxy. The
145	   query component sent back to the server, however, must not be changed
146	   by the proxy (see [HTTP1.1]).

148	3. Upgrading of Query Component to UTF-8

150	   For those parts of an URL that originate at the server, in particular
151	   for the path component, the introduction of UTF-8 [RFC2044] as the
152	   encoding of choice can be made on a per-server or per-resource base.
153	   Because the name space of the path component is usually very sparsely
154	   populated, it is even possible to accept URLs with path components in
155	   different character encodings for the same resource.

157	   The query component of an URL, however, is in most cases generated
158	   independently in the user agent, and the namespace can be very
159	   densely populated. To upgrade it to UTF-8 therefore requires addi-
160	   tional provisions.

162	   Here, we propose to add a single header field to HTTP.  The header
163	   field is used both as a request header field and as a response header
164	   field.

166	3.1 The Query-UTF-8 Request/Response-Header Field

168	   The syntax of the QUERY-UTF-8 request/response-header field is
169	   defined as follows:

171	        query-utf-8    = "Query-UTF-8" ":" ( "Yes" | "No" )

173	   Both "Yes" and "No" above are case insensitive. I.e. "Yes" as well as
174	   "yes" or "yES", and so on, are acceptable.

176	   As a response-header field (sent from the server to the client), the
177	   field indicates whether the user agent can send back the query compo-
178	   nent encoded as UTF-8 or not.  If the value is "Yes", and the scheme
179	   component and site component of the URL of the document containing
180	   the FORM and the URL given for query submission are identical, the
181	   query component SHOULD be sent back encoded as UTF-8.  If the value
182	   is "No", and the FORM does not have an Accept-charset attribute that
183	   contains the "charset" parameter value "UTF-8", then the query compo-
184	   nent MUST be sent back according to the application convention
185	   described in Section 3, or in some other way by older browsers.

187	   As a request-header field (sent from the client to the server; the
188	   term request-header field is somewhat misleading here), the field
189	   indicates whether the query component is encoded as UTF-8. A Query-
190	   UTF-8 request-header field MUST be sent back when the following con-
191	   ditions are all met:

193	   -  The URL sent back contains a query compontent

195	   -  The document containing the FORM is received with a Query-UTF-8
196	      response-header field with value "Yes" or the Accept-Charset
197	      attribute of the FORM contains the charset parameter value of
198	      "UTF-8".

200	   -  The client recognizes the corresponding syntax.  (The intention of
201	      the last sentence is to be able to phase out Query-UTF-8 after a
202	      transitory period.)

204	3.2 Rationale

206	   The availability of both the Accept-charset attribute on FORM and the
207	   Query-UTF-8 response-header field may seem unnecessary. The rationale
208	   for this is to allow two modes of operation, called server-driven and
209	   script-driven.

211	   In script-driven mode, the CGI script handles character encoding
212	   negotiation and identification. Typically, the author of a FORM docu-
213	   ment and the corresponding CGI script will use the Accept-charset
214	   attribute on FORM with the value "UTF-8" to tell the client to send
215	   back data in UTF-8. It will then check for the presence and value of
216	   the Query-UTF-8 request-header field in the response from the client,
217	   and make conversions if necessary.

219	   In server-driven mode, the character encoding that a CGI scripts
220	   expects to receive is registered with the server in a similar way as
221	   the character encodings of documents (including those generated by
222	   CGI scripts) are registered.  A server offering such a functionality
223	   adds the Query-UTF-8 response-header field with value "Yes" to outgo-
224	   ing documents containing FORMs, and converts from UTF-8 back to the
225	   encoding the CGI script is expecting when a query arrives with
226	   "Query-UTF-8: Yes".

228	   The distinction between script-driven and server-driven mode is not
229	   made based on whether Query-UTF-8 or the Accept-Charset attribute are
230	   used. Both features are provided because it is easier for a document
231	   author to use Accept-Charset, and easier for a server to add Query-
232	   UTF-8. Also, because a server does not know about the facilities
233	   available on other servers, "Query-UTF-8: Yes" sent from the server
234	   to the client is only valid if the query result is sent back to the
235	   same server.  For query results sent to other servers, the Accept-
236	   Charset attribute must be used.

238	Acknowledgements

240	   I am grateful in particular to the following persons for their help
241	   and/or criticism: Roy Fielding, Eric van der Poel, Francois Yergeau,
242	   Gavin Nicol, Frank Tang, Larry Masenter, and Tim Greenwood.

244	Bibliography

246	   [Fileupload]   E. Nebel and L. Masinter, "Form-based File Upload in
247	                  HTML", draft-ietf-html-fileupload-03.txt, August 1995.

249	   [FTPINT]       B. Curtin, "Internationalization of the File Transfer
250	                  Protocol", draft-ietf-ftpext-intl-ftp-02.txt, June
251	                  1997.

253	   [HTTP1.1]      R. Fielding, J. Gettys, J. Mogul, H. Frystyk, and T.
254	                  Berners-Lee, "Hypertext Transfer Protocol --
255	                  HTTP/1.1", RFC 2068, January 1997.

257	   [HTML4.0]      D. Raggett, A. Le Hors, and I. Jacobs, "HTML 4.0 Spec-
258	                  ification", http://www.w3.org/TR/WD-html40/, July
259	                  1997.

261	   [IMAPURL]      Ch. Newman, "IMAP URL Scheme", draft-newman-url-
262	                  imap-10.txt, July 1997.

264	   [RFC1738]      T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform
265	                  Resource Locators (URL)", CERN, Dec. 1994.

267	   [RFC2044]      F. Yergeau, "UTF-8, A Transformation Format of Unicode
268	                  and ISO 10646", Alis Technologies, October 1996.

270	   [RFC2045]      N. Freed, N. Borenstein, "Multipurpose Internet Mail
271	                  Extensions (MIME) Part One: Format of Internet Message
272	                  Bodies", November 1996.

274	   [RFC2070]      F. Yergeau, G. Nicol, G. Adams, and M. Duerst, "Inter-
275	                  nationalization of the Hypertext Markup Language", RFC
276	                  2070, January 1997 (Note: This RFC is currently being
277	                  updated to reference Unicode 2.0 and ISO 10646 includ-
278	                  ing AM-5. The new definition of UTF-8 should be used).

280	   [RFC2130]      C. Weider C. Preston, K. Simonsen, H. Alvestrand, R.
281	                  Atkinson, M. Crispin, P. Svanberg, "The Report of the
282	                  IAB Character Set Workshop held 29 February - 1 March,
283	                  1996", April 1997.

285	   [URLprocess]   L. Masinter, D. Zigmond and H. Alvestrand, "Guidelines
286	                  and Process for new URL Schemes", draft-masinter-url-
287	                  process-01.txt, March 1997.

289	   [URLsyntax]    T. Berners-Lee, R. Fielding, L. Masinter, "Uniform
290	                  Resource Locators (URL): Generic Syntax and Seman-
291	                  tics", draft-fielding-url-syntax-05.txt, May 1997.

293	Author's Address

295	   Martin J. Duerst
296	   Multimedia-Laboratory
297	   Department of Computer Science
298	   University of Zurich
299	   Winterthurerstrasse 190
300	   CH-8057 Zurich
301	   Switzerland

303	   Tel: +41 1 257 43 16
304	   Fax: +41 1 363 00 35
305	   E-mail: mduerst@ifi.unizh.ch

307	     NOTE -- Please write the author's name with u-Umlaut wherever
308	     possible, e.g. in HTML as D&uuml;rst.