idnits 2.17.1 draft-ietf-html-spec-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 3550 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 31 instances of too long lines in the document, the longest one being 15 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 6, 1995) is 10583 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Missing reference section? 'IMEDIA' on line 3041 looks like a reference -- Missing reference section? 'MIME' on line 3025 looks like a reference -- Missing reference section? 'SGML' on line 3072 looks like a reference -- Missing reference section? 'IANA' on line 3045 looks like a reference -- Missing reference section? 'RELURL' on line 3032 looks like a reference -- Missing reference section? 'HTTP' on line 3018 looks like a reference -- Missing reference section? 'URL' on line 3012 looks like a reference -- Missing reference section? 'URI' on line 3005 looks like a reference -- Missing reference section? 'GOLD90' on line 3037 looks like a reference -- Missing reference section? 'SQ91' on line 3049 looks like a reference -- Missing reference section? 'US-ASCII' on line 3053 looks like a reference -- Missing reference section? 'ISO-8859-1' on line 3058 looks like a reference Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 HTML Working Group T. Berners-Lee 2 INTERNET-DRAFT MIT/W3C 3 D. Connolly 4 Expires: In six months May 6, 1995 6 Hypertext Markup Language - 2.0 8 CONTENTS 10 1. Introduction 11 2. HTML as an Application of SGML 12 3. HTML as an Internet Media Type 13 4. Document Structure Elements 14 5. Character Content 15 6. Data Elements 16 7. Character Format Elements 17 8. Hyperlink Elements 18 9. Block Structuring Elements 19 10. Form-based Input Elements 20 11. HTML Public Text 21 12. Glossary 22 13. Bibliography 23 14. Appendices 24 15. Acknowledgments 26 Status of this Memo 28 This document is an Internet-Draft. Internet-Drafts are working 29 documents of the Internet Engineering Task Force (IETF), its areas, 30 and its working groups. Note that other groups may also distribute 31 working documents as Internet-Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference material 36 or to cite them other than as ``work in progress.'' 38 To learn the current status of any Internet-Draft, please check the 39 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 40 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 41 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 42 ftp.isi.edu (US West Coast). 44 Distribution of this document is unlimited. Please send comments to 45 the HTML working group (HTML-WG) of the Internet Engineering Task 46 Force (IETF) at . Discussions of the group are 47 archived at . 49 ABSTRACT 51 The Hypertext Markup Language (HTML) is a simple markup 52 language used to create hypertext documents that are 53 platform independent. HTML documents are SGML documents with 54 generic semantics that are appropriate for representing 55 information from a wide range of domains. HTML markup can 56 represent hypertext news, mail, documentation, and 57 hypermedia; menus of options; database query results; simple 58 structured documents with in-lined graphics; and hypertext 59 views of existing bodies of information. 61 HTML has been in use by the World Wide Web (WWW) global 62 information initiative since 1990. This specification 63 roughly corresponds to the capabilities of HTML in common 64 use prior to June 1994. HTML is an application of ISO 65 Standard 8879:1986 Information Processing Text and Office 66 Systems; Standard Generalized Markup Language (SGML). 68 The `"text/html; version=2.0"' Internet Media Type (RFC 69 1590) and MIME Content Type (RFC 1521) is defined by this 70 specification. 72 1. Introduction 74 The HyperText Markup Language (HTML) is a simple data format 75 used to create hypertext documents that are portable from 76 one platform to another. HTML documents are SGML documents 77 with generic semantics that are appropriate for representing 78 information from a wide range of domains. 80 1.1. Scope 82 HTML has been in use by the World-Wide Web (WWW) global 83 information initiative since 1990. This specification 84 corresponds to the capabilities of HTML in common use prior 85 to June 1994 and referred to as ``HTML 2.0''. 87 HTML is an application of ISO Standard 8879:1986 88 _Information Processing Text and Office Systems; Standard 89 Generalized Markup Language_ (SGML). The HTML Document Type 90 Definition (DTD) is a formal definition of the HTML syntax 91 in terms of SGML. 93 This specification also defines HTML as an Internet Media 94 Type[IMEDIA] and MIME Content Type[MIME] called `text/html', 95 or `text/html; version=2.0'. As such, it defines the 96 semantics of the HTML syntax and how that syntax should be 97 interpreted by user agents. 99 1.2. Conformance 101 This specification governs the syntax of HTML documents and 102 the behaviour of HTML user agents. 104 1.2.1. Documents 106 A document is a conforming HTML document only if: 108 * It is a conforming SGML document, and it conforms to 109 the HTML DTD (see 11.1, "HTML DTD") 110 * It conforms to the application conventions in this 111 specification. For example, the value of the `HREF' 112 attribute of the element must conform to the URI 113 syntax. 114 * Its document character set includes ISO-8859-1 and 115 agrees with ISO10646; that is, each code position 116 listed in 14.1, "The ISO-8859-1 Coded Character Set" is 117 included, and each code position in the document 118 character set is mapped to the same character as 119 ISO10646 designates for that code position. 120 NOTE - The document character set is somewhat 121 independent of the character encoding scheme used to 122 represent a document. For example, the ISO-2022-JP 123 character encoding scheme can be used for HTML 124 documents, since its repertoire is a subset of the 125 ISO10646 repertoire. The crititcal distinction is that 126 numeric character references agree with ISO10646 127 regardless of how the document is encoded. 129 NOTE - There are a number of syntactic idioms that are 130 not supported or are supported inconsistently in some 131 historical user agent implementations. These idioms are 132 called out in notes like this throughout this 133 specification. 135 HTML documents should not contain these idioms, at 136 least until such time as support for them is widely 137 deployed. 139 The HTML DTD defines a standard HTML document type and 140 several variations, based on feature test entities: 142 HTML.Recommended 143 Certain features of the language are necessary for 144 compatibility with widespread usage, but they may 145 compromise the structural integrity of a document. 146 This feature test entity enables a more 147 prescriptive document type definition that 148 eliminates those features. 150 For example, in order to preserve the structure of 151 a document, an editing user agent may translate 152 HTML documents to the recommended subset, or it 153 may require that the documents be in the 154 recommended subset for import. 156 HTML.Deprecated 157 Certain features of the language are necessary for 158 compatibility with earlier versions of the 159 specification, but they tend to be used an 160 implemented inconsistently, and their use is 161 deprecated. This feature test entity enables a 162 document type definition that eliminates these 163 features. 165 Documents generated by tranlation software or 166 editing software should not contain these idioms. 168 1.2.2. User Agents 170 An HTML user agent conforms to this specification if: 172 * It parses the characters of an HTML document into 173 data characters and markup as per [SGML]. 174 * It supports the ISO-8859-1 character encoding scheme, 175 and processes each character in the ISO Latin Alphabet 176 Nr. 1 as specified in 5.1, "The ISO Latin 1 Character 177 Repertoire". 178 NOTE - To support non-western writing systems, HTML 179 user agents should support the Unicode-1-1-UTF-8 and 180 Unicode-1-1-UCS-2 encodings and as much of the 181 character repertoire of ISO10646 as is possible as 182 well. 183 * It behaves identically for documents whose parsed 184 token sequences are identical. 185 For example, comments and the whitespace in tags 186 disappear during tokenization, and hence they do not 187 influence the behaviour of conforming user agents. 188 * It allows the user to traverse (or at least attempt 189 to traverse, resources permitting) all hyperlinks in an 190 HTML document. 191 * It allows the user to express all form field values 192 specified in an HTML document and to (attempt to) 193 submit the values as requests to information services. 195 NOTE - In the interest of robustness and extensibility, 196 there are a number of widely deployed conventions for 197 handling non-conforming documents. See 3.2.1, 198 "Undeclared Markup Error Handling" for details. 200 2. HTML as an Application of SGML 202 HTML is an application of ISO Standard 8879:1986 - Standard 203 Generalized Markup Language (SGML). SGML is a system for 204 defining structured document types and markup languages to 205 represent instances of those document types[SGML]. The 206 public text -- DTD and SGML declaration -- of the HTML 207 document type definition are provided in 11, "HTML Public 208 Text". 210 The term _HTML_ refers to both the document type defined 211 here and the markup language for representing instances of 212 this document type. 214 2.1. SGML Documents 216 An HTML document is an SGML document; that is, a sequence of 217 characters organized physically into a set of entities, and 218 logically as a hierarchy of elements. 220 The first production of the SGML grammar separates an SGML 221 document into three parts: an SGML declaration, a prologue, 222 and an instance. For the purposes of this specification, the 223 prologue is a DTD. This DTD describes another grammar: the 224 start symbol is given in the doctype declaration; the 225 terminals are data characters and tags, and the productions 226 are determined by the element declarations. The instance 227 must conform to the DTD, that is, it must be in the language 228 defined by this grammar. 230 The SGML declaration determines the lexicon of the grammar. 231 It specifies the document character set, which determines a 232 character repertoire that contains all characters that occur 233 in all text entities in the document, and the code positions 234 associated with those characters. 236 The SGML declaration also specifies the syntax-reference 237 character set of the document, and a few other parameters 238 that bind the abstract syntax of SGML to a concrete syntax. 239 This concrete syntax determines how the sequence of 240 characters of the document is mapped to a sequence of 241 terminals in the grammar of the prologue. 243 For example, consider the following document: 245 246 Parsing Example 247

Some text. *wow*

249 An HTML user agent should use the SGML declaration is given 250 in 11.2, "SGML Declaration for HTML". According to the 251 document character set there,`*' refers to an asterisk 252 character. 254 The instance above is regarded as the following sequence of 255 terminals: 257 1. TITLE start-tag 258 2. data characters: ``Parsing Example'' 259 3. TITLE end-tag 260 4. P start-tag 261 5. data characters ``Some text. '' 262 6. EM start-tag 263 7. ``*wow*'' 264 8. EM end-tag 266 The start symbol of the DTD grammar is HTML, and the 267 productions are given in the public text identified by 268 `-//IETF//DTD HTML 2.0//EN' (11.1, "HTML DTD"). Hence the 269 terminals above parse as: 271 HTML 272 | 273 \-HEAD 274 | | 275 | \-TITLE 276 | | 277 | \- 278 | | 279 | \-"Parsing Example" 280 | | 281 | \- 282 | 283 \-BODY 284 | 285 \-P 286 | 287 \-

288 | 289 \-"Some text. " 290 | 291 \-EM 292 | | 293 | \- 294 | | 295 | \-"*wow*" 296 | | 297 | \- 298 | 299 \-

301 2.2. HTML Lexical Syntax 303 SGML specifies an abstract syntax and a reference concrete 304 syntax. Aside from certain quantities and capacities (e.g. 305 the limit on the length of a name), all HTML documents use 306 the reference concrete syntax. In particular, all markup 307 characters are in the ISO-646-IRV character repertoire. Data 308 characters are drawn from the document character set (see 5, 309 "Character Content"). 311 A complete discussion of SGML parsing, e.g. the mapping of a 312 sequence of characters to a sequence of tags and data is 313 left to the SGML standard[SGML]. This section is only a 314 summary. 316 2.2.1. Data Characters 318 Any sequence of characters that do not constitute markup 319 (see 9.6 ``Delimiter Recognition'' of [SGML]) are mapped 320 directly to strings of data characters. Some markup also 321 maps to data character strings. Numeric character references 322 also map to single-character strings, via the document 323 character set. Each reference to one of the general entities 324 defined in the HTML DTD also maps to a single-character 325 string. 327 For example, 329 abc<def => "abc","<","def" 330 abc<def => "abc","<","def" 332 Note that the terminating semicolon is only necessary when 333 the character following the reference would otherwise be 334 recognized as markup: 336 abc < def => "abc ","<"," def" 337 abc < def => "abc ","<"," def" 339 And note that an ampersand is only recognized as markup when 340 it is followed by a letter or digit: 342 abc & lt def => "abc & lt def" 343 abc & 60 def => "abc & 60 def" 345 A useful technique for translating plain text to HTML is to 346 replace each '<', '&', and '>' by an entity reference or 347 numeric character reference as follows: 349 ENTITY NUMERIC 350 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 351 & & & Ampersand 352 < < < Less than 353 > > > Greater than 355 NOTE - There are SGML mechanisms, CDATA and RCDATA, to 356 allow most `<', `>', and `&' characters to be entered 357 without the use of entity references. Because these 358 features tend to be used and implemented 359 inconsistently, and because they conflict with 360 techinques for reducing HTML to 7 bit ASCII for 361 transport, they are not used in this version of the 362 HTML DTD. 364 2.2.2. Tags 366 Tags delimit elements such as headings, paragraphs, lists, 367 character highlighting and links. Most HTML elements are 368 identified in a document as a start-tag, which gives the 369 element name and attributes, followed by the content, 370 followed by the end tag. Start-tags are delimited by `<' and 371 `>'; end tags are delimited by `'. An example is: 373

This is a Heading

375 Some elements only have a start-tag without an end-tag. For 376 example, to create a line break, you use the `
' tag. 377 Additionally, the end tags of some other elements, such as 378 Paragraph (`

'), List Item (`'), Definition Term 379 (`'), and Definition Description (`
') elements, may 380 be omitted. 382 The content of an element is a sequence of data character 383 strings and nested elements. Some elements, such as anchors, 384 cannot be nested. Anchors and character highlighting may be 385 put inside other constructs. See the HTML DTD, 11.1, "HTML 386 DTD" for full details. 388 NOTE - The SGML declaration for HTML specifies SHORTTAG 389 YES, which means that there are other valid syntaxes 390 for tags, such as NET tags, `'; and empty end-tags, `'. Until support 392 for these idioms is widely deployed, their use is 393 strongly discouraged. 395 2.2.3. Names 397 A name consists of a letter followed by up to 71 letters, 398 digits, periods, or hyphens. Element names are not case 399 sensitive, but entity names are. For example, 400 `
', `
', and `
' are 401 equivalent, whereas `&' is different from `&'. 403 In a start-tag, the element name must immediately follow the 404 tag open delimiter `<'. 406 2.2.4. Attributes 408 In a start-tag, white space and attributes are allowed 409 between the element name and the closing delimiter. An 410 attribute typically consists of an attribute name, an equal 411 sign, and a value, though some attributes may be just a 412 value. White space is allowed around the equal sign. 414 The value of the attribute may be either: 416 * A string literal, delimited by single quotes or 417 double quotes and not containing any occurrences of the 418 delimiting character. 419 * A name token (a sequence of letters, digits, periods, 420 or hyphens) 422 In this example, img is the element name, `src' is the 423 attribute name, and `http://host/dir/file.gif' is the 424 attribute value: 426 428 NOTE - Some historical implementations consider any 429 occurrence of the `>' character to signal the end of a 430 tag. For ompatibility with such implementations, when 431 `>' appears in an attribute value, it should be 432 represented with a numeric character reference, such as 433 in: `a>b'. 435 A useful technique for computing an attribute value literal 436 for a given string is to replace each quote and space 437 character by an entity reference or numeric character 438 reference as follows: 440 ENTITY NUMERIC 441 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 442 TAB Tab 443 LF Line Feed 444 CR Carriage Return 445 Space 446 " " " Quotation mark 447 & & & Ampersand 449 For example: 451 First "real" example 453 NOTE - Some historical implementations allow any 454 character except space or `>' in a name token. 455 Attributes values must be quoted only if they don't 456 satisfy the syntax for a name token. 458 Note that the SGML declaration in section 13.3 limits the 459 length of an attribute value to 1024 characters. 461 Attributes such as ISMAP and COMPACT, may be written using a 462 minimized syntax. The markup: 464