idnits 2.17.1 draft-rmcgowan-unicode-procs-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 07, 2003) is 7599 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) No issues found here. Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. McGowan 3 Internet-Draft Unicode 4 Expires: January 5, 2004 July 07, 2003 6 A Summary of Unicode Consortium Procedures, Policies, Stability, and 7 Public Access 8 draft-rmcgowan-unicode-procs-03 10 Status of this Memo 12 This document is an Internet-Draft and is in full conformance with 13 all provisions of Section 10 of RFC2026. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that other 17 groups may also distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six months 20 and may be updated, replaced, or obsoleted by other documents at any 21 time. It is inappropriate to use Internet-Drafts as reference 22 material or to cite them other than as "work in progress." 24 The list of current Internet-Drafts can be accessed at http:// 25 www.ietf.org/ietf/1id-abstracts.txt. 27 The list of Internet-Draft Shadow Directories can be accessed at 28 http://www.ietf.org/shadow.html. 30 This Internet-Draft will expire on January 5, 2004. 32 Copyright Notice 34 Copyright (C) The Internet Society (2003). All Rights Reserved. 36 Abstract 38 This memo describes various internal workings of the Unicode 39 Consortium for the benefit of participants in the IETF. It is 40 intended solely for informational purposes. Included are discussions 41 of how the decision-making bodies of the Consortium work and what 42 their procedures are, as well as information on public access to the 43 character encoding & standardization processes. 45 1. Introduction 47 This memo describes various internal workings of the Unicode 48 Consortium for the benefit of participants in the IETF. It is 49 intended solely for informational purposes. Included are discussions 50 of how the decision-making bodies of the Consortium work and what 51 their procedures are, as well as information on public access to the 52 character encoding & standardization processes. 54 2. About The Unicode Consortium 56 The Unicode Consortium is a corporation. Legally speaking it is a 57 "California Nonprofit Mutual Benefit Corporation", organized under 58 section 501 C(6) of the Internal Revenue Service Code of the United 59 States. As such, it is a "business league" not focussed on profiting 60 by sales or production of goods and services, but neither is it 61 formally a "charitable" organization. It is an alliance of member 62 companies whose purpose is to "extend, maintain, and promote the 63 Unicode Standard". To this end, the Consortium keeps a small office, 64 a few editorial and technical staff, World Wide Web presence, and 65 mail list presence. 67 The corporation is presided over by a Board of Directors who meet 68 annually. The Board is comprised of individuals who are elected 69 annually by the full members for three-year terms. The Board appoints 70 Officers of the corporation to run the daily operations. 72 Membership in the Consortium is open to "all corporations, other 73 business entities, governmental agencies, not-for-profit 74 organizations and academic institutions" who support the Consortium's 75 purpose. Formally, one class of voting membership is recognized, and 76 dues-paying members are typically for-profit corporations, research 77 and educational institutions, or national governments. Each such full 78 member sends representatives to meetings of the Unicode Technical 79 Committee (see below), as well as to a brief annual Membership 80 meeting. 82 3. The Unicode Technical Committee 84 The Unicode Technical Committee (UTC) is the technical decision 85 making body of the Consortium. The UTC inherited the work and prior 86 decisions of the Unicode Working Group (UWG) that was active prior to 87 formation of the Consortium in January 1991. 89 Formally, the UTC is a technical body instituted by resolution of the 90 board of directors. Each member appoints one principal and one or two 91 alternate representatives to the UTC. UTC representatives frequently 92 do, but need not, act as the ordinary member representatives for the 93 purposes of the annual meeting. 95 The UTC is presided over by a Chair and Vice-Chair, appointed by the 96 Board of Directors for an unspecified term of service. 98 The UTC meets 4 to 5 times a year to discuss proposals, additions, 99 and various other technical topics. Each meeting lasts 3 to 4 full 100 days. Meetings are held in locations decided upon by the membership, 101 frequently in the San Francisco Bay Area. There is no fee for 102 participation in UTC meetings. Agendas for meetings are not generally 103 posted to any public forum, but meeting dates, locations, and 104 logistics are posted well in advance on the "Unicode Calendar of 105 Events" web page. 107 At the discretion of the UTC chair, meetings are open to 108 participation of member and liaison organizations, and to observation 109 by others. The minutes of meetings are also posted publicly on the 110 "UTC Minutes" page of the Unicode Web site. 112 All UTC meetings are held jointly with INCITS Technical Committee L2, 113 the body responsible for Character Code standards in the United 114 States. They constitute "ad hoc" meetings of the L2 body and are 115 usually followed by a full meeting of the L2 committee. Further 116 information on L2 is available on the official INCITS web page. 118 4. Unicode Technical Committee Procedures 120 The formal procedures of the UTC are publicly available in a document 121 entitled "UTC Procedures" available from the Consortium, and on the 122 Unicode web site. 124 Despite the invocation of Robert's Rules of Order, UTC meetings are 125 conducted with relative informality in view of the highly technical 126 nature of most discussions. Meetings focus on items from a technical 127 agenda organized and published by the UTC Chair prior to the meeting. 128 Technical items are usually proposals in one of the following 129 categories: 131 1. Addition of new characters (whole scripts, additions to existing 132 scripts, or other characters 134 2. Preparation and Editing of Technical Reports and Standards 136 3. Changes in the semantics of specific characters 138 4. Extensions to the encoding architecture and forms of use 140 Note: There may also be changes to the architecture, character 141 properties or semantics. Such changes, which are rare, are always 142 constrained by the "Unicode Stability Policies" posted on the Unicode 143 web site. Significant changes are undertaken in consultation with 144 liaison organizations, such as W3C and IETF, which have standards 145 that may be affected by such changes. See sections 5 and 6 below. 147 Typical outputs of the UTC are: 149 1. The Unicode Standard, major and minor versions (including the 150 Unicode Character Database) 152 2. Unicode Technical Reports 154 3. Stand-alone Unicode Technical Standards 156 4. Formal resolutions 158 5. Liaison statements and instructions to the Unicode liaisons to 159 other organizations. 161 For each technical item on the meeting agenda, there is a general 162 process as follows: 164 1. Introduction by the topic sponsor 165 2. Proposals and discussion 167 3. Consensus statements or formal motions 169 4. Assignment of formal actions to implement decisions 171 5. Unicode Technical Committee Motions 173 Technical topics of any complexity never proceed from initial 174 proposal to final ratification or adoption into the standard in the 175 course of one UTC meeting. The UTC members and presiding officers are 176 aware that technical changes to the standard have broad consequences 177 to other standards, implementers, and end-users of the standard. 178 Input from other organizations and experts is often vital to the 179 understanding of various proposals and for successful adoption into 180 the standard. 182 Technical topics are decided in UTC through the use of formal 183 motions, either taken in meetings, or by means of 30-day letter 184 ballots. Formal UTC motions are of two types: 186 1. Simple motions 188 2. Precedents 190 Simple motions may pass with a simple majority constituting more than 191 50% of the qualified voting members; or by a special majority 192 constituting 2/3 or more of the qualified voting members. 194 Precedents are defined, according to the UTC Procedures as either 196 (A) an existing Unicode Policy, or 198 (B) an explicit precedent. 200 Precedents must be passed or overturned by a special majority. 202 Examples of implicit precedents include: 204 1. Publication of a character in the standard 206 2. Published normative character properties 208 3. Algorithms required for formal conformance 210 An Explicit Precedent is a policy, procedure, encoding, algorithm, or 211 other item that is established by a separate motion saying (in 212 effect) that a particular prior motion establishes a precedent. 214 A proposal may be passed either by a formal motion and vote, or by 215 consensus. If there is broad agreement as to the proposal, and no 216 member wishes to force a vote, then the proposal passes by consensus 217 and is recorded as such in the minutes. 219 6. Unicode Consortium Policies 221 Because the Unicode Standard is continually evolving to approach the 222 ideal of encoding "all the world's scripts", new characters will 223 constantly be added. In this sense, the standard is unstable: in the 224 standard's useful lifetime, there may never be a final point at which 225 no more characters are added. Realizing this, the Consortium has 226 adopted certain policies to promote and maintain stability of the 227 characters that are already encoded, as well as laying out a Roadmap 228 to future encodings. 230 The overall policies of the Consortium with regard to encoding 231 stability, as well as other issues such as privacy, are published on 232 a "Unicode Consortium Policies" web page. Deliberations and encoding 233 proposals in the UTC are bound by these policies. 235 The general effect of the stability policies may be stated in this 236 way: once a character is encoded, it will not be moved or removed and 237 its name will not be changed. Any of those actions has the potential 238 for causing obsolescence of data, and they are not permitted. The 239 canonical combining class and decompositions of characters will not 240 be changed in any way that affects normalization. In this sense 241 normalization, such as that used for International Domain Naming and 242 "early normalization" for use on the World Wide Web, is fixed and 243 stable for every character at the time that character is encoded. 244 (Any changes that are undertaken because of outright errors in 245 properties or decompositions are dealt with by means of an adjunct 246 data file so that normalization stability can still be maintained by 247 those who need it.) 249 Once published, each version of the Unicode Standard is absolutely 250 stable and will never be changed retroactively. Implementations or 251 specifications that refer to a specific version of the Unicode 252 Standard can rely upon this stability. If future versions of such 253 implementations or specifications upgrade to a future version of the 254 Unicode Standard, then some changes may be necessary. 256 Property values of characters, such as directionality for the Unicode 257 Bidi algorithm, may be changed between versions of the standard in 258 some circumstances. As less-well documented characters and scripts 259 are encoded, the exact character properties and behavior may not be 260 well known at the time the characters are first encoded. As more 261 experience is gathered in implementing the newly encoded characters, 262 adjustments in the properties may become necessary. This re-working 263 is kept to a minimum. New and old versions of the relevant property 264 tables are made available on the Consortium's web site. 266 Normative and some informative data about characters is kept in the 267 Unicode Character Database (UCD). The structure of many of these 268 property values will not be changed. Instead, when new properties are 269 defined, the Consortium adds new files for these properties, so as 270 not to affect the stability of existing implementations that use the 271 values and properties defined in the existing formats and files. The 272 latest version of the UCD is available on the Consortium web site via 273 the "Unicode Data" heading. 275 Note on data redistribution: Unlike the situation with IETF 276 documents, some parts of the Unicode Character Database may have 277 restrictions on their verbatim redistribution with source-code 278 products. Users should read the notices in files they intend to use 279 in such products. The information contained in the UCD may be freely 280 used to create derivative works (such as programs, compressed data 281 files, subroutines, data structures, etc.) that may be redistributed 282 freely, but some files may not be redistributable verbatim. Such 283 restrictions on Unicode data files are never meant to prohibit or 284 control the use of the data in products, but only to help ensure that 285 users retrieve the latest official releases of data files when using 286 the data in products. 288 7. UTC and ISO (WG2 and WG20) 290 The character repertoire, names, and general architecture of the 291 Unicode Standard are identical to the parallel international standard 292 ISO/IEC 10646. ISO/IEC 10646 only contains a small fraction of the 293 semantics, properties and implementation guidelines supplied by the 294 Unicode Standard and associated technical standards and reports. 295 Implementations conformant to Unicode are conformant to ISO/IEC 296 10646. 298 ISO/IEC 10646 is maintained by the committee ISO/IEC JTC1/SC2/WG2. 299 The WG2 committee is composed of national body representatives to 300 ISO. Details of ISO organization may be found on the official web 301 site of the International Organization for Standardization (ISO). 303 Details and history of the relationship between ISO/IEC JTC1/SC2/WG2 304 and Unicode, Inc. may be found in Appendix C of The Unicode Standard. 305 (A PDF rendition of the most recent printed edition of the Unicode 306 Standard can be found on the Unicode web site.) 308 WG2 shares with UTC the policies regarding stability: WG2 neither 309 removes characters nor changes their names once published. Changes in 310 both standards are closely tracked by the respective committees, and 311 a very close working relationship is fostered to maintain 312 synchronization between the standards. 314 The Unicode Collation Algorithm (UCA) is one of a small set of other 315 independent standards defined and maintained by UTC. It is not, 316 properly speaking, part of the Unicode Standard itself, but is 317 separately defined in Unicode Technical Standard #10 (UTS #10). There 318 is no conformance relationship between the two standards, except that 319 conformance to a specific base version of the Unicode Standard (e.g., 320 4.0) is specified in a particular version of a UTS. The collation 321 algorithm specified in UTS #10 is conformant to ISO/IEC 14651, 322 maintained by ISO/IEC JTC1/SC2/WG20, and the two organizations 323 maintain a close relationship. Beyond what is specified in ISO/IEC 324 14651, the UCA contains additional constraints on collation, 325 specifies additional options, and provides many more implementation 326 guidelines. 328 8. Process of Technical Changes to the Unicode Standard 330 Changes to The Unicode Standard are of two types: architectural 331 changes, and character additions. 333 Most architectural changes do not affect ISO/IEC 10646, for example, 334 the addition of various character properties to Unicode. Those 335 architectural changes that do affect both standards, such as 336 additional UTF formats or allocation of planes, are very carefully 337 coordinated by the committees. As always, on the UTC side, 338 architectural changes that establish precedents are carefully 339 monitored and the above-described rules and procedures are followed. 341 Additional characters for inclusion in the The Unicode Standard must 342 be approved both by the UTC and by WG2. Proposals for additional 343 characters enter the standards process in one of several ways: 344 through... 346 1. a national body member of WG2 348 2. a member company or associate of UTC 350 3. directly from an individual "expert" contributor 352 The two committees have jointly produced a "Proposal Summary Form" 353 that is required to accompany all additional character proposals. 354 This form may be found online at the WG2 web site, and on the Unicode 355 web site along with information about "Submitting New Characters or 356 Scripts". Instructions for submitting proposals to UTC may likewise 357 be found online. 359 Often, submission of proposals to both committees (UTC and WG2) is 360 simultaneous. Members of UTC also frequently forward to WG2 proposals 361 that have been initially reviewed by UTC. 363 In general, a proposal that is submitted to UTC before being 364 submitted to WG2 passes through several stages: 366 1. Initial presentation to UTC 368 2. Review and re-drafting 370 3. Forwarding to WG2 for consideration 372 4. Re-drafting for technical changes 374 5. Balloting for approval in UTC 375 6. Re-forwarding and recommendation to WG2 377 7. At least two rounds of international balloting in ISO 379 About two years are required to complete this process. Initial 380 proposals most often do not include sufficient information or 381 justification to be approved. These are returned to the submitters 382 with comments on how the proposal needs to be amended or extended. 383 Repertoire addition proposals that are submitted to WG2 before being 384 submitted to UTC are generally forwarded immediately to UTC through 385 committee liaisons. The crucial parts of the process (steps 5 through 386 7 above) are never short-circuited. Two-thirds majority in UTC is 387 required for approval at step 5. 389 Proposals for additional scripts are required to be coordinated with 390 relevant user communities. Often there are ad-hoc subcommittees of 391 UTC or expert mail list participants who are responsible for actually 392 drafting proposals, garnering community support, or representing user 393 communities. 395 The rounds of international balloting (steps 7) have participation 396 both by UTC and WG2, though UTC does not directly vote in the ISO 397 process. 399 Occasionally a proposal approved by one body is considered too 400 immature for approval by the other body, and may be blocked de-facto 401 by either of the two. Only after both bodies have approved the 402 additional characters do they proceed to the rounds of international 403 balloting. (The first round is a draft international standard during 404 which some changes may occur, the second round is final approval 405 during which only editorial changes are made.) 407 This process assures that proposals for additional characters are 408 mature and stable by the time they appear in a final international 409 ballot. 411 9. Public Access to the Character Encoding Process 413 While Unicode, Inc, is a membership organization, and the final say 414 in technical matters rests with UTC, the process is quite open to 415 public input and scrutiny of processes and proposals. There are many 416 influential individual experts and industry groups who are not 417 formally members, but whose input to the process is taken seriously 418 by UTC. 420 Internally, UTC maintains a mail list called the "Unicore" list, 421 which carries traffic related to meetings, technical content of the 422 standard, and so forth. Members of the list are UTC representatives; 423 employees and staff of member organizations (such as the Research 424 Libraries Group); individual liaisons to and from other standards 425 bodies (such as WG2 and IETF); and invited experts from institutions 426 such as the Library of Congress and some universities. Subscription 427 to the list for external individuals is subject to "sponsorship" by 428 the corporate officers. 430 Unicode, Inc. also maintains a public discussion list called the 431 "Unicode" list. Subscription is open to anyone, and proceedings of 432 the "Unicode" mail list are publicly archived. Details are on the 433 Consortium web site under the "Mail Lists" heading. 435 Technical proposals for changes to the standard are posted to both of 436 these mail lists on a regular basis. Discussion on the public list 437 may result in a written proposal being generated for a later UTC 438 meeting. Technical issues and other standardization "events" of any 439 significance, such as beta releases and availablility of draft 440 documents, are announced and then discussed in this public forum, 441 well before standardization is finalized. From time to time, the UTC 442 also publishes on the Consortium web site "Public Review Issues" to 443 gather feedback and generate discussion of specific proposals whose 444 impact may be unclear, or for which sufficiently broad review may not 445 yet have been brought to the UTC deliberations. 447 Anyone may make a character encoding or architectural proposal to 448 UTC. Membership in the organization is not required to submit a 449 proposal. To be taken seriously, the proposal must be framed in a 450 substantial way, and be accompanied by sufficient documentation to 451 warrant discussion. Examples of proposals are easily available by 452 following links from the "Proposed Characters" and "Roadmaps" 453 headings on the Unicode web site. Guidelines for proposals are also 454 available under the heading "Submitting Proposals". 456 In general, proposals are publicly aired on the "Unicode" mail list, 457 sometimes for a long period, prior to formal submission. Generally 458 this is of benefit to the proposer as it tends to reduce the number 459 of times the proposal is sent back for clarification or with requests 460 for additional information. Once a proposal reaches the stage of 461 being ready for discussion by UTC, the proposer will have received 462 contact through the public mail list with one or more UTC members 463 willing to explain or defend it in a UTC meeting. 465 10. Acknowledgements 467 Thanks to Mark Davis, Simon Josefsson, and Ken Whistler for their 468 extensive review and feedback on previous drafts of this document. 470 Intellectual Property Statement 472 The IETF takes no position regarding the validity or scope of any 473 intellectual property or other rights that might be claimed to 474 pertain to the implementation or use of the technology described in 475 this document or the extent to which any license under such rights 476 might or might not be available; neither does it represent that it 477 has made any effort to identify any such rights. Information on the 478 IETF's procedures with respect to rights in standards-track and 479 standards-related documentation can be found in BCP-11. Copies of 480 claims of rights made available for publication and any assurances of 481 licenses to be made available, or the result of an attempt made to 482 obtain a general license or permission for the use of such 483 proprietary rights by implementors or users of this specification can 484 be obtained from the IETF Secretariat. 486 The IETF invites any interested party to bring to its attention any 487 copyrights, patents or patent applications, or other proprietary 488 rights which may cover technology that may be required to practice 489 this standard. Please address the information to the IETF Executive 490 Director. 492 Full Copyright Statement 494 Copyright (C) The Internet Society (2003). All Rights Reserved. 496 This document and translations of it may be copied and furnished to 497 others, and derivative works that comment on or otherwise explain it 498 or assist in its implementation may be prepared, copied, published 499 and distributed, in whole or in part, without restriction of any 500 kind, provided that the above copyright notice and this paragraph are 501 included on all such copies and derivative works. However, this 502 document itself may not be modified in any way, such as by removing 503 the copyright notice or references to the Internet Society or other 504 Internet organizations, except as needed for the purpose of 505 developing Internet standards in which case the procedures for 506 copyrights defined in the Internet Standards process must be 507 followed, or as required to translate it into languages other than 508 English. 510 The limited permissions granted above are perpetual and will not be 511 revoked by the Internet Society or its successors or assignees. 513 This document and the information contained herein is provided on an 514 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 515 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 516 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 517 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 518 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 520 Acknowledgment 522 Funding for the RFC Editor function is currently provided by the 523 Internet Society.