idnits 2.17.1 draft-ietf-html-spec-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-29) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 3564 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 11 instances of too long lines in the document, the longest one being 4 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 31, 1995) is 10530 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'IMEDIA' on line 3042 looks like a reference -- Missing reference section? 'MIME' on line 3025 looks like a reference -- Missing reference section? 'SGML' on line 3073 looks like a reference -- Missing reference section? 'IANA' on line 3046 looks like a reference -- Missing reference section? 'RELURL' on line 3032 looks like a reference -- Missing reference section? 'URL' on line 3012 looks like a reference -- Missing reference section? 'URI' on line 3005 looks like a reference -- Missing reference section? 'HTTP' on line 3018 looks like a reference -- Missing reference section? 'GOLD90' on line 3038 looks like a reference -- Missing reference section? 'SQ91' on line 3050 looks like a reference -- Missing reference section? 'US-ASCII' on line 3054 looks like a reference -- Missing reference section? 'ISO-8859-1' on line 3059 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 HTML Working Group T. Berners-Lee 2 INTERNET-DRAFT MIT/W3C 3 D. Connolly 4 Expires: In six months May 31, 1995 6 Hypertext Markup Language - 2.0 8 CONTENTS 10 1. Introduction 11 2. HTML as an Application of SGML 12 3. HTML as an Internet Media Type 13 4. Document Structure 14 5. Character, Words, and Paragraphs 15 6. Hyperlinks 16 7. Forms 17 8. HTML Public Text 18 9. Glossary 19 10. Bibliography 20 11. Appendices 21 12. Acknowledgments 23 Status of this Memo 25 This document is an Internet-Draft. Internet-Drafts are working 26 documents of the Internet Engineering Task Force (IETF), its areas, 27 and its working groups. Note that other groups may also distribute 28 working documents as Internet-Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference material 33 or to cite them other than as ``work in progress.'' 35 To learn the current status of any Internet-Draft, please check the 36 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 37 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 38 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 39 ftp.isi.edu (US West Coast). 41 Distribution of this document is unlimited. Please send comments to 42 the HTML working group (HTML-WG) of the Internet Engineering Task 43 Force (IETF) at . Discussions of the group are 44 archived at . 46 In this draft, the first three sections are considered essentially 47 finished. Sections 4 and 5 have been significantly revised and are 48 open to comments, though I'm fairly happy with those parts. Section 6 49 is somewhat new: it collects all information about hyperlinking into 50 one place. Sections 7 (forms elements) has also been revised, and 51 there are a few points I'm not sure on. The glossary (section 8) has 52 also been tweaked. Section 8 ``public text'' has been stable for some 53 time, but as it's critical, I'd appreciate a careful review just the 54 same. 56 ABSTRACT 58 The Hypertext Markup Language (HTML) is a simple markup 59 language used to create hypertext documents that are 60 platform independent. HTML documents are SGML documents with 61 generic semantics that are appropriate for representing 62 information from a wide range of domains. HTML markup can 63 represent hypertext news, mail, documentation, and 64 hypermedia; menus of options; database query results; simple 65 structured documents with in-lined graphics; and hypertext 66 views of existing bodies of information. 68 HTML has been in use by the World Wide Web (WWW) global 69 information initiative since 1990. This specification 70 roughly corresponds to the capabilities of HTML in common 71 use prior to June 1994. HTML is an application of ISO 72 Standard 8879:1986 Information Processing Text and Office 73 Systems; Standard Generalized Markup Language (SGML). 75 The `"text/html; version=2.0"' Internet Media Type (RFC 76 1590) and MIME Content Type (RFC 1521) is defined by this 77 specification. 79 1. Introduction 81 The HyperText Markup Language (HTML) is a simple data format 82 used to create hypertext documents that are portable from 83 one platform to another. HTML documents are SGML documents 84 with generic semantics that are appropriate for representing 85 information from a wide range of domains. 87 1.1. Scope 89 HTML has been in use by the World-Wide Web (WWW) global 90 information initiative since 1990. This specification 91 corresponds to the capabilities of HTML in common use prior 92 to June 1994 and referred to as ``HTML 2.0''. 94 HTML is an application of ISO Standard 8879:1986 95 _Information Processing Text and Office Systems; Standard 96 Generalized Markup Language_ (SGML). The HTML Document Type 97 Definition (DTD) is a formal definition of the HTML syntax 98 in terms of SGML. 100 This specification also defines HTML as an Internet Media 101 Type[IMEDIA] and MIME Content Type[MIME] called `text/html', 102 or `text/html; version=2.0'. As such, it defines the 103 semantics of the HTML syntax and how that syntax should be 104 interpreted by user agents. 106 1.2. Conformance 108 This specification governs the syntax of HTML documents and 109 the behaviour of HTML user agents. 111 1.2.1. Documents 113 A document is a conforming HTML document only if: 115 * It is a conforming SGML document, and it conforms to 116 the HTML DTD (see 8.1, "HTML DTD"). 118 NOTE - There are a number of syntactic idioms that are 119 not supported or are supported inconsistently in some 120 historical user agent implementations. These idioms are 121 called out in notes like this throughout this 122 specification. 123 HTML documents should not contain these idioms, at 124 least until such time as support for them is widely 125 deployed. 127 * It conforms to the application conventions in this 128 specification. For example, the value of the HREF 129 attribute of the element must conform to the URI 130 syntax. 132 * Its document character set includes ANSI/ISO 8859-1 133 and agrees with ISO/IEC 10646-1; that is, each code 134 position listed in 11.1, "The ANSI/ISO 8859-1 Coded 135 Character Set" is included, and each code position in 136 the document character set is mapped to the same 137 character as ISO10646 designates for that code 138 position. 140 NOTE - The document character set is somewhat 141 independent of the character encoding scheme used to 142 represent a document. For example, the ISO-2022-JP 143 character encoding scheme can be used for HTML 144 documents, since its repertoire is a subset of the 145 ISO10646 repertoire. The critical distinction is that 146 numeric character references agree with ISO10646 147 regardless of how the document is encoded. 149 The HTML DTD defines a standard HTML document type and 150 several variations, based on feature test entities: 152 HTML.Recommended 153 Certain features of the language are necessary for 154 compatibility with widespread usage, but they may 155 compromise the structural integrity of a document. 156 This feature test entity enables a more 157 prescriptive document type definition that 158 eliminates those features. 160 For example, in order to preserve the structure of 161 a document, an editing user agent may translate 162 HTML documents to the recommended subset, or it 163 may require that the documents be in the 164 recommended subset for import. 166 HTML.Deprecated 167 Certain features of the language are necessary for 168 compatibility with earlier versions of the 169 specification, but they tend to be used and 170 implemented inconsistently, and their use is 171 deprecated. This feature test entity enables a 172 document type definition that eliminates these 173 features. 175 Documents generated by tranlation software or 176 editing software should not contain these idioms. 178 1.2.2. User Agents 180 An HTML user agent conforms to this specification if: 182 * It parses the characters of an HTML document into 183 data characters and markup according to [SGML]. 185 NOTE - In the interest of robustness and extensibility, 186 there are a number of widely deployed conventions for 187 handling non-conforming documents. See 3.2.1, 188 "Undeclared Markup Error Handling" for details. 190 * It supports the `ISO-8859-1' character encoding 191 scheme and processes each character in the ISO Latin 192 Alphabet No. 1 as specified in 5.1, "The ISO Latin 1 193 Character Repertoire". 195 NOTE - To support non-western writing systems, HTML 196 user agents should support ISO-10646-UCS-2 or similar 197 character encoding schemes and as much of the character 198 repertoire of ISO10646 as is practical. 200 * It behaves identically for documents whose parsed 201 token sequences are identical. 202 For example, comments and the whitespace in tags 203 disappear during tokenization, and hence they do not 204 influence the behaviour of conforming user agents. 206 * It allows the user to traverse (or at least attempt 207 to traverse, resources permitting) all hyperlinks in an 208 HTML document. 210 * It allows the user to express all form field values 211 specified in an HTML document and to (attempt to) 212 submit the values as requests to information services. 214 2. HTML as an Application of SGML 216 HTML is an application of ISO 8879:1986 -- Standard 217 Generalized Markup Language (SGML). SGML is a system for 218 defining structured document types and markup languages to 219 represent instances of those document types[SGML]. The 220 public text -- DTD and SGML declaration -- of the HTML 221 document type definition are provided in 8, "HTML Public 222 Text". 224 The term _HTML_ refers to both the document type defined 225 here and the markup language for representing instances of 226 this document type. 228 2.1. SGML Documents 230 An HTML document is an SGML document; that is, a sequence of 231 characters organized physically into a set of entities, and 232 logically as a hierarchy of elements. 234 The first production of the SGML grammar separates an SGML 235 document into three parts: an SGML declaration, a prologue, 236 and an instance. For the purposes of this specification, the 237 prologue is a DTD. This DTD describes another grammar: the 238 start symbol is given in the doctype declaration, the 239 terminals are data characters and tags, and the productions 240 are determined by the element declarations. The instance 241 must conform to the DTD, that is, it must be in the language 242 defined by this grammar. 244 The SGML declaration determines the lexicon of the grammar. 245 It specifies the document character set, which determines a 246 character repertoire that contains all characters that occur 247 in all text entities in the document, and the code positions 248 associated with those characters. 250 The SGML declaration also specifies the syntax-reference 251 character set of the document, and a few other parameters 252 that bind the abstract syntax of SGML to a concrete syntax. 253 This concrete syntax determines how the sequence of 254 characters of the document is mapped to a sequence of 255 terminals in the grammar of the prologue. 257 For example, consider the following document: 259 260 Parsing Example 261

Some text. *wow*

263 An HTML user agent should use the SGML declaration that is 264 given in 8.2, "SGML Declaration for HTML". According to its 265 document character set, `*' refers to an asterisk 266 character. 268 The instance above is regarded as the following sequence of 269 terminals: 271 1. TITLE start-tag 273 2. data characters: ``Parsing Example'' 275 3. TITLE end-tag 277 4. P start-tag 279 5. data characters ``Some text. '' 281 6. EM start-tag 283 7. ``*wow*'' 285 8. EM end-tag 287 9. P end-tag 289 The start symbol of the DTD grammar is HTML, and the 290 productions are given in the public text identified by 291 `-//IETF//DTD HTML 2.0//EN' (8.1, "HTML DTD"). Hence the 292 terminals above parse as: 294 HTML 295 | 296 \-HEAD 297 | | 298 | \-TITLE 299 | | 300 | \- 301 | | 302 | \-"Parsing Example" 303 | | 304 | \- 305 | 306 \-BODY 307 | 308 \-P 309 | 310 \-

311 | 312 \-"Some text. " 313 | 314 \-EM 315 | | 316 | \- 317 | | 318 | \-"*wow*" 319 | | 320 | \- 321 | 322 \-

324 2.2. HTML Lexical Syntax 326 SGML specifies an abstract syntax and a reference concrete 327 syntax. Aside from certain quantities and capacities (e.g. 328 the limit on the length of a name), all HTML documents use 329 the reference concrete syntax. In particular, all markup 330 characters are in the repertoire of ISO 646 IRV. Data 331 characters are drawn from the document character set (see 5, 332 "Character, Words, and Paragraphs"). 334 A complete discussion of SGML parsing, e.g. the mapping of a 335 sequence of characters to a sequence of tags and data, is 336 left to the SGML standard[SGML]. This section is only a 337 summary. 339 2.2.1. Data Characters 341 Any sequence of characters that do not constitute markup 342 (see 9.6 ``Delimiter Recognition'' of [SGML]) are mapped 343 directly to strings of data characters. Some markup also 344 maps to data character strings. Numeric character references 345 also map to single-character strings, via the document 346 character set. Each reference to one of the general entities 347 defined in the HTML DTD also maps to a single-character 348 string. 350 For example, 352 abc<def => "abc","<","def" 353 abc<def => "abc","<","def" 355 Note that the terminating semicolon is only necessary when 356 the character following the reference would otherwise be 357 recognized as markup: 359 abc < def => "abc ","<"," def" 360 abc < def => "abc ","<"," def" 362 And note that an ampersand is only recognized as markup when 363 it is followed by a letter or digit: 365 abc & lt def => "abc & lt def" 366 abc & 60 def => "abc & 60 def" 368 A useful technique for translating plain text to HTML is to 369 replace each '<', '&', and '>' by an entity reference or 370 numeric character reference as follows: 372 ENTITY NUMERIC 373 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 374 & & & Ampersand 375 < < < Less than 376 > > > Greater than 378 NOTE - There are SGML mechanisms, CDATA and RCDATA, to 379 allow most `<', `>', and `&' characters to be entered 380 without the use of entity references. Because these 381 features tend to be used and implemented 382 inconsistently, and because they conflict with 383 techniques for reducing HTML to 7 bit ASCII for 384 transport, they are not used in this version of the 385 HTML DTD. 387 2.2.2. Tags 389 Tags delimit elements such as headings, paragraphs, lists, 390 character highlighting, and links. Most HTML elements are 391 identified in a document as a start-tag, which gives the 392 element name and attributes, followed by the content, 393 followed by the end tag. Start-tags are delimited by `<' and 394 `>'; end tags are delimited by `'. An example is: 396

This is a Heading

398 Some elements only have a start-tag without an end-tag. For 399 example, to create a line break, you use the `
' tag. 400 Additionally, the end tags of some other elements, such as 401 Paragraph (`

'), List Item (`'), Definition Term 402 (`'), and Definition Description (`
') elements, may 403 be omitted. 405 The content of an element is a sequence of data character 406 strings and nested elements. Some elements, such as anchors, 407 cannot be nested. Anchors and character highlighting may be 408 put inside other constructs. See the HTML DTD, 8.1, "HTML 409 DTD" for full details. 411 NOTE - The SGML declaration for HTML specifies SHORTTAG 412 YES, which means that there are other valid syntaxes 413 for tags, such as NET tags, `'; and empty end-tags, `'. Until support 415 for these idioms is widely deployed, their use is 416 strongly discouraged. 418 2.2.3. Names 420 A name consists of a letter followed by up to 71 letters, 421 digits, periods, or hyphens. Element names are not case 422 sensitive, but entity names are. For example, 423 `
', `
', and `
' are 424 equivalent, whereas `&' is different from `&'. 426 In a start-tag, the element name must immediately follow the 427 tag open delimiter `<'. 429 2.2.4. Attributes 431 In a start-tag, white space and attributes are allowed 432 between the element name and the closing delimiter. An 433 attribute typically consists of an attribute name, an equal 434 sign, and a value, though some attributes may be just a 435 value. White space is allowed around the equal sign. 437 The value of the attribute may be either: 439 * A string literal, delimited by single quotes or 440 double quotes and not containing any occurrences of the 441 delimiting character. 443 NOTE - Some historical implementations consider any 444 occurrence of the `>' character to signal the end of a 445 tag. For compatibility with such implementations, when 446 `>' appears in an attribute value, it should be 447 represented with a numeric character reference. For 448 example, `a>b' should be 449 written `a>b' or `a>b'. 452 * A name token (a sequence of letters, digits, periods, 453 or hyphens). 455 NOTE - Some historical implementations allow any 456 character except space or `>' in a name token. 458 In this example, is the element name, src is the 459 attribute name, and `http://host/dir/file.gif' is the 460 attribute value: 462 464 A useful technique for computing an attribute value literal 465 for a given string is to replace each quote and space 466 character by an entity reference or numeric character 467 reference as follows: 469 ENTITY NUMERIC 470 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 471 TAB Tab 472 LF Line Feed 473 CR Carriage Return 474 Space 475 " " " Quotation mark 476 & & & Ampersand 478 For example: 480 First "real" example 482 Note that the SGML declaration in section 13.3 limits the 483 length of an attribute value to 1024 characters. 485 Attributes such as ISMAP and COMPACT may be written using a 486 minimized syntax. The markup: 488