idnits 2.17.1 

draft-alvestrand-charset-policy-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand
     corner of the first page

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack an Introduction section.
     (A line matching the expected section header was found, but with an
    unexpected indentation:
     '    1.  Introduction' )

  ** The document seems to lack a Security Considerations section.
     (A line matching the expected section header was found, but with an
    unexpected indentation:
     '    6.  Security considerations' )

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** There is 1 instance of too long lines in the document, the longest one
     being 3 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 68: '...ocols, protocols MUST specify which pa...'
     RFC 2119 keyword, line 91: '...   All protocols MUST identify, for al...'
     RFC 2119 keyword, line 94: '...    Protocols MUST be able to use the ...'
     RFC 2119 keyword, line 98: '...    They MAY specify how to use other ...'
     RFC 2119 keyword, line 107: '...e, but UTF-8 support MUST be possible....'
     (4 more instances...)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date () is 739383 days in the past.  Is this intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Missing reference section? 'WR' on line 260 looks like a reference

  -- Missing reference section? 'RFC 2119' on line 253 looks like a reference

  -- Missing reference section? 'ARCH' on line 265 looks like a reference


     Summary: 14 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	draft                       Charset policy                     June 97

3	               IETF Policy on Character Sets and Languages

5	                     Sun Jun 15 14:23:36 MET DST 1997

7	                         Harald Tveit Alvestrand
8	                                 UNINETT
9	                      Harald.T.Alvestrand@uninett.no

11	    Status of this Memo

13	    This draft document is being circulated for comment.

15	    Please send comments to the author.

17	    The following text is required by the Internet-draft rules:

19	    This document is an Internet Draft.  Internet Drafts are working
20	    documents of the Internet Engineering Task Force (IETF), its
21	    Areas, and its Working Groups. Note that other groups may also
22	    distribute working documents as Internet Drafts.

24	    Internet Drafts are draft documents valid for a maximum of six
25	    months. Internet Drafts may be updated, replaced, or obsoleted by
26	    other documents at any time.  It is not appropriate to use
27	    Internet Drafts as reference material or to cite them other than
28	    as a "working draft" or "work in progress."

30	    Please check the I-D abstract listing contained in each Internet
31	    Draft directory to learn the current status of this or any other
32	    Internet Draft.

34	    The file name of this version is draft-alvestrand-charset-policy-00.txt

36	draft                       Charset policy                     June 97

38	    1.  Introduction

40	    The Internet is international.

42	    With the international Internet follows an absolute requirement to
43	    interchange data in a multiplicity of languages, which in turn
44	    utilize a bewildering number of characters or other character-like
45	    representation mechanisms.

47	    This document is (INTENDED TO BE) the current policies being
48	    applied by the Internet Engineering Steering Group towards the
49	    standardization efforts in the Internet Engineering Task Force in
50	    order to help Internet protocols fulfil these requirements.

52	    The document is very much based upon the recommendations of the
53	    IAB Character Set Workshop of February 29-March 1, 1996, which is
54	    documented in RFC 2130 [WR]. This document attempts to be concise,
55	    explicit and clear; people wanting more background are encouraged
56	    to read RFC 2130.

58	    The document uses the terms "MUST", "SHOULD" and "MAY", and their
59	    negatives, in the way described in [RFC 2119]. In this case, "the
60	    specification" as used by RFC 2119 refers to the processing of
61	    protocols being submitted to the IETF standards process.

63	    2.  Where to do internationalization

65	    Internationalization is for humans. This means that protocols are
66	    not subject to internationalization; text strings are. Where
67	    protocols may masquerade as text strings, such as in many IETF
68	    application layer protocols, protocols MUST specify which parts
69	    are protocol and which are text. [WR 2.2.1.1]

71	    Names are a problem, because people feel strongly about them, many
72	    of them are mostly for local usage, and all of them tend to leak
73	    out of the local context at times. RFC 1958 [ARCH] recommends US-
74	    ASCII for all globally visible names.

76	    This document does not mandate a policy on name
77	    internationalization, but requires that all protocols describe
78	    whether names are internationalized or US-ASCII.

80	draft                       Charset policy                     June 97

82	    3.  Character sets

84	    For a definition of the term "character set", refer to the
85	    workshop report. Like MIME, this document uses it to mean the
86	    combination of a coded character set and a character encoding
87	    scheme.

89	    3.1.  What character set to use

91	    All protocols MUST identify, for all character data, which
92	    character set is in use.

94	    Protocols MUST be able to use the ISO 10646 coded character set,
95	    with the UTF-8 character encoding scheme, for all text. (This is
96	    called "UTF-8" in the rest of this document)

98	    They MAY specify how to use other character sets or other
99	    character encoding schemes, such as UTF-16, but lack of an ability
100	    to use UTF-8 needs clear and solid justification in the protocol
101	    specification document before being entered into or advanced upon
102	    the standards track.

104	    For existing protocols or protocols that move data from existing
105	    datastores, support of other character sets, or even using a
106	    default other than UTF-8, may be a requirement. This is
107	    acceptable, but UTF-8 support MUST be possible.

109	    When using other character sets than UTF-8, these MUST be
110	    registered in the IANA character set registry, if necessary by
111	    registering them when the protocol is published.

113	    3.2.  How to decide a character set

115	    In some cases, like HTTP, there is direct or semi-direct
116	    communication between the producer and the consumer of a character
117	    set. In this case, it may make sense to negotiate a character set
118	    before sending data.

120	    In other cases, like E-mail or stored data, there is no such
121	    communication, and the best one can do is to make sure the
122	    character set is clearly identified with the stored data, and
123	    choosing a character set that is as widely known as possible.

125	draft                       Charset policy                     June 97

127	    Note that a character set is an absolute; for almost all languages
128	    but English and a few other Latin-based scripts, text cannot be
129	    rendered comprehensibly without supporting the right character
130	    set.

132	    Negotiating a character set may be regarded as an interim
133	    mechanism that is to be supported until UTF-8 support is
134	    prevalent; however, the timeframe of "interim" may be at least 50
135	    years, so there is every reason to think of it as permanent in
136	    practice.

138	    4.  Languages

140	    4.1.  The need for language information

142	    All human-readable text has a language.

144	    Many operations, including high quality formatting, text-to-speech
145	    synthesis, searching, sorting, spellchecking and so on need access
146	    to information about the language of a piece of text. [WC
147	    3.1.1.4].

149	    Humans have some tolerance for foreign languages, but are
150	    generally dissatisfied with being presented text in a language
151	    they do not understand; this is why negotiation of language is
152	    needed.

154	    In most cases, machines cannot deduce the language by themselves;
155	    the protocol must specify how to transfer the language information
156	    if it is to be available at all.

158	    (Some items, like domain names and other names, may in some cases
159	    be very useful without this information.)

161	    The interaction between language and processing is complex; for
162	    instance, if I compare "hosta(lang=en)" to "hosta(lang=no)" I will
163	    generally expect a match, while "aasmund" sorts after "attaboy"
164	    according to Norwegian rules, but before it using English rules.
165	    (the "aa" is sorted together with "latin letter a with ring
166	    above", which is at the end of the Norwegian alphabet).

168	draft                       Charset policy                     June 97

170	    4.2.  How to identify a language

172	    The RFC 1766 language tag is at the moment the most flexible tool
173	    available for identifying a language; protocols SHOULD use this,
174	    or provide clear and solid justification for doing otherwise in
175	    the document.

177	    4.3.  Considerations for negotiation

179	    Protocols that transfer human-readable text MUST provide for
180	    multiple languages.

182	    In some cases, a negotiation where the client proposes a set of
183	    languages and the server replies with one is appropriate; in other
184	    cases, supplying information in all available languages is a
185	    better solution; most sites will either have very few languages
186	    installed or be willing to pay the overhead of sending error
187	    messages in many languages at once.

189	    Negotiation is useful in the case where one side of the protocol
190	    exchange is able to present text in multiple languages to the
191	    other side, and the other side has a preference for one of these;
192	    the most common example is the text part of error responses, or
193	    Web pages that are available in multiple languages.

195	    Negotiating a language should be regarded as a permanent
196	    requirement of the protocol that will not go away at any time in
197	    the future.

199	    In most cases, it should be possible to include it as part of the
200	    connection establishment, together with authentication and other
201	    preferences negotiation.

203	    4.4.  Default Language

205	    When human-readable text must be presented in a context where the
206	    sender has no knowledge of the recipient's language preferences
207	    (such as login failures or E-mailed warnings, or prior to language
208	    negotiation), text SHOULD be presented in Default Language.

210	    The Default Language is English, since this is the language which
211	    most people will be able to get adequate help in interpreting when

213	draft                       Charset policy                     June 97

215	    working with computers.

217	    Note that negotiating English is NOT the same as Default Language;
218	    Default Language is an emergency measure in otherwise unmanageable
219	    situations.

221	    5.  Locale

223	    POSIX defines a concept called a "locale", which includes a lot of
224	    information about collating order, date format, currency format
225	    and so on.

227	    In some cases, and especially with text where the user is expected
228	    to do processing on the text, locale information may be usefully
229	    attached to the text.

231	    This document does not require the communication of locale
232	    information on all text, but encourages its inclusion when
233	    appropriate.

235	    Note that the language and character set will often be present as
236	    parts of a locale tag (such as no_NO.iso-8859-1; the language is
237	    before the _ and the character set is after the dot); care must be
238	    taken to define precisely which specification of character set and
239	    language applies to any one text item.

241	    The default locale is the POSIX locale.

243	    6.  Security considerations

245	    Apart from the fact that security warnings in a foreign language
246	    may cause inappropriate behaviour from the user, and the fact that
247	    multilingual systems usually have problems with consistency
248	    between language variants, no security considerations relevant
249	    have been identified.

251	    7.  References

253	    [RFC 2119]
254	         S. Bradner, "Key words for use in RFCs to Indicate

256	draft                       Charset policy                     June 97

258	         Requirement Levels", 03/26/1997 - RFC 2119

260	    [WR] C. Weider, C. Preston, K. Simonsen, H. Alvestrand, R.
261	         Atkinson, M. Crispin, P. Svanberg, "The Report of the IAB
262	         Character Set Workshop held 29 February - 1 March, 1996",
263	         04/21/1997, RFC 2130

265	    [ARCH]
266	         B. Carpenter, "Architectural Principles of the Internet",
267	         06/06/1996, RFC 1958

269	    8.  Author's address

271	    Harald Tveit Alvestrand
272	    UNINETT
273	    P.O.Box 6883 Elgeseter
274	    N-7002 TRONDHEIM
275	    NORWAY

277	    +47 73 59 70 94
278	    Harald.T.Alvestrand@uninett.no