idnits 2.17.1 draft-alvestrand-charset-policy-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an Introduction section. (A line matching the expected section header was found, but with an unexpected indentation: ' 1. Introduction' ) ** The document seems to lack a Security Considerations section. (A line matching the expected section header was found, but with an unexpected indentation: ' 6. Security considerations' ) ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 68: '...ocols, protocols MUST specify which pa...' RFC 2119 keyword, line 91: '... All protocols MUST identify, for al...' RFC 2119 keyword, line 94: '... Protocols MUST be able to use the ...' RFC 2119 keyword, line 98: '... They MAY specify how to use other ...' RFC 2119 keyword, line 107: '...e, but UTF-8 support MUST be possible....' (4 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date () is 739383 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'WR' on line 260 looks like a reference -- Missing reference section? 'RFC 2119' on line 253 looks like a reference -- Missing reference section? 'ARCH' on line 265 looks like a reference Summary: 14 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 draft Charset policy June 97 3 IETF Policy on Character Sets and Languages 5 Sun Jun 15 14:23:36 MET DST 1997 7 Harald Tveit Alvestrand 8 UNINETT 9 Harald.T.Alvestrand@uninett.no 11 Status of this Memo 13 This draft document is being circulated for comment. 15 Please send comments to the author. 17 The following text is required by the Internet-draft rules: 19 This document is an Internet Draft. Internet Drafts are working 20 documents of the Internet Engineering Task Force (IETF), its 21 Areas, and its Working Groups. Note that other groups may also 22 distribute working documents as Internet Drafts. 24 Internet Drafts are draft documents valid for a maximum of six 25 months. Internet Drafts may be updated, replaced, or obsoleted by 26 other documents at any time. It is not appropriate to use 27 Internet Drafts as reference material or to cite them other than 28 as a "working draft" or "work in progress." 30 Please check the I-D abstract listing contained in each Internet 31 Draft directory to learn the current status of this or any other 32 Internet Draft. 34 The file name of this version is draft-alvestrand-charset-policy-00.txt 36 draft Charset policy June 97 38 1. Introduction 40 The Internet is international. 42 With the international Internet follows an absolute requirement to 43 interchange data in a multiplicity of languages, which in turn 44 utilize a bewildering number of characters or other character-like 45 representation mechanisms. 47 This document is (INTENDED TO BE) the current policies being 48 applied by the Internet Engineering Steering Group towards the 49 standardization efforts in the Internet Engineering Task Force in 50 order to help Internet protocols fulfil these requirements. 52 The document is very much based upon the recommendations of the 53 IAB Character Set Workshop of February 29-March 1, 1996, which is 54 documented in RFC 2130 [WR]. This document attempts to be concise, 55 explicit and clear; people wanting more background are encouraged 56 to read RFC 2130. 58 The document uses the terms "MUST", "SHOULD" and "MAY", and their 59 negatives, in the way described in [RFC 2119]. In this case, "the 60 specification" as used by RFC 2119 refers to the processing of 61 protocols being submitted to the IETF standards process. 63 2. Where to do internationalization 65 Internationalization is for humans. This means that protocols are 66 not subject to internationalization; text strings are. Where 67 protocols may masquerade as text strings, such as in many IETF 68 application layer protocols, protocols MUST specify which parts 69 are protocol and which are text. [WR 2.2.1.1] 71 Names are a problem, because people feel strongly about them, many 72 of them are mostly for local usage, and all of them tend to leak 73 out of the local context at times. RFC 1958 [ARCH] recommends US- 74 ASCII for all globally visible names. 76 This document does not mandate a policy on name 77 internationalization, but requires that all protocols describe 78 whether names are internationalized or US-ASCII. 80 draft Charset policy June 97 82 3. Character sets 84 For a definition of the term "character set", refer to the 85 workshop report. Like MIME, this document uses it to mean the 86 combination of a coded character set and a character encoding 87 scheme. 89 3.1. What character set to use 91 All protocols MUST identify, for all character data, which 92 character set is in use. 94 Protocols MUST be able to use the ISO 10646 coded character set, 95 with the UTF-8 character encoding scheme, for all text. (This is 96 called "UTF-8" in the rest of this document) 98 They MAY specify how to use other character sets or other 99 character encoding schemes, such as UTF-16, but lack of an ability 100 to use UTF-8 needs clear and solid justification in the protocol 101 specification document before being entered into or advanced upon 102 the standards track. 104 For existing protocols or protocols that move data from existing 105 datastores, support of other character sets, or even using a 106 default other than UTF-8, may be a requirement. This is 107 acceptable, but UTF-8 support MUST be possible. 109 When using other character sets than UTF-8, these MUST be 110 registered in the IANA character set registry, if necessary by 111 registering them when the protocol is published. 113 3.2. How to decide a character set 115 In some cases, like HTTP, there is direct or semi-direct 116 communication between the producer and the consumer of a character 117 set. In this case, it may make sense to negotiate a character set 118 before sending data. 120 In other cases, like E-mail or stored data, there is no such 121 communication, and the best one can do is to make sure the 122 character set is clearly identified with the stored data, and 123 choosing a character set that is as widely known as possible. 125 draft Charset policy June 97 127 Note that a character set is an absolute; for almost all languages 128 but English and a few other Latin-based scripts, text cannot be 129 rendered comprehensibly without supporting the right character 130 set. 132 Negotiating a character set may be regarded as an interim 133 mechanism that is to be supported until UTF-8 support is 134 prevalent; however, the timeframe of "interim" may be at least 50 135 years, so there is every reason to think of it as permanent in 136 practice. 138 4. Languages 140 4.1. The need for language information 142 All human-readable text has a language. 144 Many operations, including high quality formatting, text-to-speech 145 synthesis, searching, sorting, spellchecking and so on need access 146 to information about the language of a piece of text. [WC 147 3.1.1.4]. 149 Humans have some tolerance for foreign languages, but are 150 generally dissatisfied with being presented text in a language 151 they do not understand; this is why negotiation of language is 152 needed. 154 In most cases, machines cannot deduce the language by themselves; 155 the protocol must specify how to transfer the language information 156 if it is to be available at all. 158 (Some items, like domain names and other names, may in some cases 159 be very useful without this information.) 161 The interaction between language and processing is complex; for 162 instance, if I compare "hosta(lang=en)" to "hosta(lang=no)" I will 163 generally expect a match, while "aasmund" sorts after "attaboy" 164 according to Norwegian rules, but before it using English rules. 165 (the "aa" is sorted together with "latin letter a with ring 166 above", which is at the end of the Norwegian alphabet). 168 draft Charset policy June 97 170 4.2. How to identify a language 172 The RFC 1766 language tag is at the moment the most flexible tool 173 available for identifying a language; protocols SHOULD use this, 174 or provide clear and solid justification for doing otherwise in 175 the document. 177 4.3. Considerations for negotiation 179 Protocols that transfer human-readable text MUST provide for 180 multiple languages. 182 In some cases, a negotiation where the client proposes a set of 183 languages and the server replies with one is appropriate; in other 184 cases, supplying information in all available languages is a 185 better solution; most sites will either have very few languages 186 installed or be willing to pay the overhead of sending error 187 messages in many languages at once. 189 Negotiation is useful in the case where one side of the protocol 190 exchange is able to present text in multiple languages to the 191 other side, and the other side has a preference for one of these; 192 the most common example is the text part of error responses, or 193 Web pages that are available in multiple languages. 195 Negotiating a language should be regarded as a permanent 196 requirement of the protocol that will not go away at any time in 197 the future. 199 In most cases, it should be possible to include it as part of the 200 connection establishment, together with authentication and other 201 preferences negotiation. 203 4.4. Default Language 205 When human-readable text must be presented in a context where the 206 sender has no knowledge of the recipient's language preferences 207 (such as login failures or E-mailed warnings, or prior to language 208 negotiation), text SHOULD be presented in Default Language. 210 The Default Language is English, since this is the language which 211 most people will be able to get adequate help in interpreting when 213 draft Charset policy June 97 215 working with computers. 217 Note that negotiating English is NOT the same as Default Language; 218 Default Language is an emergency measure in otherwise unmanageable 219 situations. 221 5. Locale 223 POSIX defines a concept called a "locale", which includes a lot of 224 information about collating order, date format, currency format 225 and so on. 227 In some cases, and especially with text where the user is expected 228 to do processing on the text, locale information may be usefully 229 attached to the text. 231 This document does not require the communication of locale 232 information on all text, but encourages its inclusion when 233 appropriate. 235 Note that the language and character set will often be present as 236 parts of a locale tag (such as no_NO.iso-8859-1; the language is 237 before the _ and the character set is after the dot); care must be 238 taken to define precisely which specification of character set and 239 language applies to any one text item. 241 The default locale is the POSIX locale. 243 6. Security considerations 245 Apart from the fact that security warnings in a foreign language 246 may cause inappropriate behaviour from the user, and the fact that 247 multilingual systems usually have problems with consistency 248 between language variants, no security considerations relevant 249 have been identified. 251 7. References 253 [RFC 2119] 254 S. Bradner, "Key words for use in RFCs to Indicate 256 draft Charset policy June 97 258 Requirement Levels", 03/26/1997 - RFC 2119 260 [WR] C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R. 261 Atkinson, M. Crispin, P. Svanberg, "The Report of the IAB 262 Character Set Workshop held 29 February - 1 March, 1996", 263 04/21/1997, RFC 2130 265 [ARCH] 266 B. Carpenter, "Architectural Principles of the Internet", 267 06/06/1996, RFC 1958 269 8. Author's address 271 Harald Tveit Alvestrand 272 UNINETT 273 P.O.Box 6883 Elgeseter 274 N-7002 TRONDHEIM 275 NORWAY 277 +47 73 59 70 94 278 Harald.T.Alvestrand@uninett.no