idnits 2.17.1 draft-ietf-html-spec-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 18 instances of too long lines in the document, the longest one being 14 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 8, 1995) is 10482 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'ISO 8859-1' is mentioned on line 1710, but not defined == Unused Reference: 'URI' is defined on line 3304, but no explicit reference was found in the text == Unused Reference: 'HTTP' is defined on line 3317, but no explicit reference was found in the text == Unused Reference: 'GOLD90' is defined on line 3335, but no explicit reference was found in the text == Unused Reference: 'SQ91' is defined on line 3354, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. 'URI') ** Obsolete normative reference: RFC 1738 (ref. 'URL') (Obsoleted by RFC 4248, RFC 4266) -- Possible downref: Non-RFC (?) normative reference: ref. 'HTTP' ** Obsolete normative reference: RFC 1521 (ref. 'MIME') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1808 (ref. 'RELURL') (Obsoleted by RFC 3986) -- Possible downref: Non-RFC (?) normative reference: ref. 'GOLD90' -- Possible downref: Non-RFC (?) normative reference: ref. 'DEXTER' ** Obsolete normative reference: RFC 1590 (ref. 'IMEDIA') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1700 (ref. 'IANA') (Obsoleted by RFC 3232) -- Possible downref: Non-RFC (?) normative reference: ref. 'SQ91' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'SGML' Summary: 15 errors (**), 0 flaws (~~), 8 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 HTML Working Group T. Berners-Lee 2 INTERNET-DRAFT MIT/W3C 3 D. Connolly 4 Expires: In six months August 8, 1995 6 Hypertext Markup Language - 2.0 8 Status of this Memo 10 This document is an Internet-Draft. Internet-Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its areas, and 12 its working groups. Note that other groups may also distribute working 13 documents as Internet-Drafts. 15 Internet-Drafts are draft documents valid for a maximum of six months 16 and may be updated, replaced, or obsoleted by other documents at any 17 time. It is inappropriate to use Internet-Drafts as reference material 18 or to cite them other than as ``work in progress.'' 20 To learn the current status of any Internet-Draft, please check the 21 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 22 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 23 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 24 ftp.isi.edu (US West Coast). 26 Distribution of this document is unlimited. Please send comments to the 27 HTML working group (HTML-WG) of the Internet Engineering Task Force 28 (IETF) at . Discussions of the group are archived at 29 . 31 ABSTRACT 33 The Hypertext Markup Language (HTML) is a simple markup language 34 used to create hypertext documents that are platform 35 independent. HTML documents are SGML documents with generic 36 semantics that are appropriate for representing information from 37 a wide range of domains. HTML markup can represent hypertext 38 news, mail, documentation, and hypermedia; menus of options; 39 database query results; simple structured documents with 40 in-lined graphics; and hypertext views of existing bodies of 41 information. 43 HTML has been in use by the World Wide Web (WWW) global 44 information initiative since 1990. This specification roughly 45 corresponds to the capabilities of HTML in common use prior to 46 June 1994. HTML is an application of ISO Standard 8879:1986 47 Information Processing Text and Office Systems; Standard 48 Generalized Markup Language (SGML). 50 The `text/html' Internet Media Type (RFC 1590) and MIME Content 51 Type (RFC 1521) is defined by this specification. 53 CONTENTS 55 1 Introduction .......................................... 3 56 1.1 Scope ................................................. 3 57 1.2 Conformance ........................................... 3 58 2 Terms ................................................. 5 59 3 HTML as an Application of SGML ........................ 9 60 3.1 SGML Documents ........................................ 9 61 3.2 HTML Lexical Syntax .................................. 11 62 3.3 HTML Public Text Identifiers ......................... 15 63 3.4 Example HTML Document ................................ 16 64 4 HTML as an Internet Media Type ....................... 16 65 4.1 text/html media type ................................. 16 66 4.2 HTML Document Representation ......................... 17 67 5 Document Structure ................................... 18 68 5.1 Document Element: HTML ............................... 19 69 5.2 Head: HEAD ........................................... 19 70 5.3 Body: BODY ........................................... 22 71 5.4 Headings: H1 ... H6 .................................. 22 72 5.5 Block Structuring Elements ........................... 23 73 5.6 List Elements ........................................ 25 74 5.7 Phrase Markup ........................................ 28 75 5.8 Line Break: BR ....................................... 31 76 5.9 Horizontal Rule: HR .................................. 31 77 5.10 Image: IMG ........................................... 31 78 6 Characters, Words, and Paragraphs .................... 33 79 6.1 The HTML Document Character Set ...................... 33 80 7 Hyperlinks ........................................... 34 81 7.1 Accessing Resources .................................. 34 82 7.2 Activation of Hyperlinks ............................. 35 83 7.3 Simultaneous Presentation of Image Resources ......... 35 84 7.4 Fragment Identifiers ................................. 36 85 7.5 Queries and Indexes .................................. 36 86 7.6 Image Maps ........................................... 37 87 8 Forms ................................................ 37 88 8.1 Form Elements ........................................ 37 89 8.2 Form Submission ...................................... 42 90 9 HTML Public Text ..................................... 45 91 9.1 HTML DTD ............................................. 46 92 9.2 Strict HTML DTD ...................................... 57 93 9.3 Level 1 HTML DTD ..................................... 57 94 9.4 Strict Level 1 HTML DTD .............................. 58 95 9.5 SGML Declaration for HTML ............................ 59 96 9.6 Sample SGML Open Entity Catalog for HTML ............. 61 97 9.7 Character Entity Sets ................................ 62 98 10 Security Considerations .............................. 64 99 11 References ........................................... 64 100 12 Acknowledgments ...................................... 66 101 12.1 Authors' Addresses ................................... 66 102 13 The HTML Coded Character Set ......................... 66 103 14 Proposed Entities .................................... 69 105 1. Introduction 107 The HyperText Markup Language (HTML) is a simple data format 108 used to create hypertext documents that are portable from one 109 platform to another. HTML documents are SGML documents with 110 generic semantics that are appropriate for representing 111 information from a wide range of domains. 113 As HTML is an application of SGML, this specification assumes a 114 working knowledge of [SGML]. 116 1.1. Scope 118 HTML has been in use by the World-Wide Web (WWW) global 119 information initiative since 1990. This specification 120 corresponds to the capabilities of HTML in common use prior to 121 June 1994 and referred to as ``HTML 2.0''. 123 HTML is an application of ISO Standard 8879:1986 _Information 124 Processing Text and Office Systems; Standard Generalized Markup 125 Language_ (SGML). The HTML Document Type Definition (DTD) is a 126 formal definition of the HTML syntax in terms of SGML. 128 This specification also defines HTML as an Internet Media 129 Type[IMEDIA] and MIME Content Type[MIME] called `text/html'. As 130 such, it defines the semantics of the HTML syntax and how that 131 syntax should be interpreted by user agents. 133 1.2. Conformance 135 This specification governs the syntax of HTML documents and 136 aspects of the behavior of HTML user agents. 138 1.2.1. Documents 140 A document is a conforming HTML document if: 142 * It is a conforming SGML document, and it conforms to the 143 HTML DTD (see 9.1, "HTML DTD"). 145 NOTE - There are a number of syntactic idioms that 146 are not supported or are supported inconsistently in 147 some historical user agent implementations. These 148 idioms are identified in notes like this throughout 149 this specification. 151 * It conforms to the application conventions in this 152 specification. For example, the value of the HREF attribute 153 of the element must conform to the URI syntax. 155 * Its document character set includes [ISO-8859-1] and 156 agrees with [ISO-10646]; that is, each code position listed 157 in 13, "The HTML Coded Character Set" is included, and each 158 code position in the document character set is mapped to the 159 same character as [ISO-10646] designates for that code 160 position. 162 NOTE - The document character set is somewhat 163 independent of the character encoding scheme used to 164 represent a document. For example, the `ISO-2022-JP' 165 character encoding scheme can be used for HTML 166 documents, since its repertoire is a subset of the 167 [ISO-10646] repertoire. The critical distinction is 168 that numeric character references agree with 169 [ISO-10646] regardless of how the document is 170 encoded. 172 1.2.2. Feature Test Entities 174 The HTML DTD defines a standard HTML document type and several 175 variations, by way of feature test entities. Feature test 176 entities are declarations in the HTML DTD that control the 177 inclusion or exclusion of portions of the DTD. 179 HTML.Recommended 180 Certain features of the language are necessary for 181 compatibility with widespread usage, but they may 182 compromise the structural integrity of a document. This 183 feature test entity selects a more prescriptive document 184 type definition that eliminates those features. It is 185 set to `IGNORE' by default. 187 For example, in order to preserve the structure of a 188 document, an editing user agent may translate HTML 189 documents to the recommended subset, or it may require 190 that the documents be in the recommended subset for 191 import. 193 HTML.Deprecated 194 Certain features of the language are necessary for 195 compatibility with earlier versions of the 196 specification, but they tend to be used and implemented 197 inconsistently, and their use is deprecated. This 198 feature test entity enables a document type definition 199 that allows these features. It is set to `INCLUDE' by 200 default. 202 Documents generated by translation software or editing 203 software should not contain deprecated idioms. 205 1.2.3. User Agents 207 An HTML user agent conforms to this specification if: 209 * It parses the characters of an HTML document into data 210 characters and markup according to [SGML]. 212 NOTE - In the interest of robustness and 213 extensibility, there are a number of widely deployed 214 conventions for handling non-conforming documents. 215 See 4.2.1, "Undeclared Markup Error Handling" for 216 details. 218 * It supports the `ISO-8859-1' character encoding scheme and 219 processes each character in the ISO Latin Alphabet No. 1 as 220 specified in 6.1, "The HTML Document Character Set". 222 NOTE - To support non-western writing systems, HTML 223 user agents are encouraged to support 224 `ISO-10646-UCS-2' or similar character encoding 225 schemes and as much of the character repertoire of 226 [ISO-10646] as is practical. 228 * It behaves identically for documents whose parsed token 229 sequences are identical. 231 For example, comments and the whitespace in tags disappear 232 during tokenization, and hence they do not influence the 233 behavior of conforming user agents. 235 * It allows the user to traverse (or at least attempt to 236 traverse, resources permitting) all hyperlinks from 237 elements in an HTML document. 239 An HTML user agent is a level 2 user agent if, additionally: 241 * It allows the user to express all form field values 242 specified in an HTML document and to (attempt to) submit the 243 values as requests to information services. 245 2. Terms 247 absolute URI 248 a URI in absolute form; for example, as per [URL] 250 anchor 251 one of two ends of a hyperlink; typically, a phrase 252 marked as an element. 254 base URI 255 an absolute URI used in combination with a relative URI 256 to determine another absolute URI. 258 character 259 An atom of information, for example a letter or a digit. 260 Graphic characters have associated glyphs, where as 261 control characters have associated processing semantics. 263 character encoding 264 scheme 265 A function whose domain is the set of sequences of 266 octets, and whose range is the set of sequences of 267 characters from a character repertoire; that is, a 268 sequence of octets and a character encoding scheme 269 determines a sequence of characters. 271 character repertoire 272 A finite set of characters; e.g. the range of a coded 273 character set. 275 code position 276 An integer. A coded character set and a code position 277 from its domain determine a character. 279 coded character set 280 A function whose domain is a subset of the integers and 281 whose range is a character repertoire. That is, for some 282 set of integers (usually of the form {0, 1, 2, ..., N} 283 ), a coded character set and an integer in that set 284 determine a character. Conversely, a character and a 285 coded character set determine the character's code 286 position (or, in rare cases, a few code positions). 288 conforming HTML user 289 agent 290 A user agent that conforms to this specification in its 291 processing of the Internet Media Type `text/html'. 293 data character 294 Characters other than markup, which make up the content 295 of elements. 297 document character set 298 a coded character set whose range includes all 299 characters used in a document. Every SGML document has 300 exactly one document character set. Numeric character 301 references are resolved via the document character set. 303 DTD 304 document type definition. Rules that apply SGML to the 305 markup of documents of a particular type, including a 306 set of element and entity declarations. [SGML] 308 element 309 A component of the hierarchical structure defined by a 310 document type definition; it is identified in a document 311 instance by descriptive markup, usually a start-tag and 312 end-tag. [SGML] 314 end-tag 315 Descriptive markup that identifies the end of an 316 element. [SGML] 318 entity 319 data with an associated notation or interpretation; for 320 example, a sequence of octets associated with an 321 Internet Media Type. [SGML] 323 fragment identifier 324 the portion of an HREF attribute value following the `#' 325 character which modifies the presentation of the 326 destination of a hyperlink. 328 form data set 329 a sequence of name/value pairs; the names are given by 330 an HTML document and the values are given by a user. 332 HTML document 333 An SGML document conforming to this document type 334 definition. 336 hyperlink 337 a relationship between two anchors, called the tail and 338 the head. 340 markup 341 Syntactically delimited characters added to the data of 342 a document to represent its structure. There are four 343 different kinds of markup: descriptive markup (tags), 344 references, markup declarations, and processing 345 instructions. [SGML] 347 may 348 A document or user interface is conforming whether this 349 statement applies or not. 351 media type 352 an Internet Media Type, as per [IMEDIA]. 354 message entity 355 a head and body. The head is a collection of name/value 356 fields, and the body is a sequence of octets. The head 357 defines the content type and content transfer encoding 358 of the body. [MIME] 360 minimally conforming 361 HTML user agent 362 A user agent that conforms to this specification except 363 for form processing. It may only process level 1 HTML 364 documents. 366 must 367 Documents or user agents in conflict with this statement 368 are not conforming. 370 numeric character 371 reference 372 markup that refers to a character by its code position 373 in the document character set. 375 SGML document 376 A sequence of characters organized physically as a set 377 of entities and logically into a hierarchy of elements. 378 An SGML document consists of data characters and markup; 379 the markup describes the structure of the information 380 and an instance of that structure. [SGML] 382 shall 383 If a document or user agent conflicts with this 384 statement, it does not conform to this specification. 386 should 387 If a document or user agent conflicts with this 388 statement, undesirable results may occur in practice 389 even though it conforms to this specification. 391 start-tag 392 Descriptive markup that identifies the start of an 393 element and specifies its generic identifier and 394 attributes. [SGML] 396 syntax-reference 397 character set 398 A coded character set whose range includes all 399 characters used for markup; e.g. name characters and 400 delimiter characters. 402 tag 403 Markup that delimits an element. A tag includes a name 404 which refers to an element declaration in the DTD, and 405 may include attributes. [SGML] 407 text entity 408 A finite sequence of characters. A text entity typically 409 takes the form of a sequence of octets with some 410 associated character encoding scheme, transmitted over 411 the network or stored in a file. [SGML] 413 typical 414 Typical processing is described for many elements. This 415 is not a mandatory part of the specification but is 416 given as guidance for designers and to help explain the 417 uses for which the elements were intended. 419 URI 420 A Uniform Resource Identifier is a formatted string that 421 serves as an identifier for a resource, typically on the 422 Internet. URIs are used in HTML to identify the anchors 423 of hyperlinks. URIs in common practice include Uniform 424 Resource Locators (URLs)[URL] and Relative URLs 425 [RELURL]. 427 user agent 428 A component of a distributed system that presents an 429 interface and processes requests on behalf of a user; 430 for example, a www browser or a mail user agent. 432 WWW 433 The World-Wide Web is a hypertext-based, distributed 434 information system created by researchers at CERN in 435 Switzerland. 437 3. HTML as an Application of SGML 439 HTML is an application of ISO 8879:1986 -- Standard Generalized 440 Markup Language (SGML). SGML is a system for defining structured 441 document types and markup languages to represent instances of 442 those document types[SGML]. The public text -- DTD and SGML 443 declaration -- of the HTML document type definition are provided 444 in 9, "HTML Public Text". 446 The term _HTML_ refers to both the document type defined here 447 and the markup language for representing instances of this 448 document type. 450 3.1. SGML Documents 452 An HTML document is an SGML document; that is, a sequence of 453 characters organized physically into a set of entities, and 454 logically as a hierarchy of elements. 456 In the SGML specification, the first production of the SGML 457 syntax grammar separates an SGML document into three parts: an 458 SGML declaration, a prologue, and an instance. For the purposes 459 of this specification, the prologue is a DTD. This DTD describes 460 another grammar: the start symbol is given in the doctype 461 declaration, the terminals are data characters and tags, and the 462 productions are determined by the element declarations. The 463 instance must conform to the DTD, that is, it must be in the 464 language defined by this grammar. 466 The SGML declaration determines the lexicon of the grammar. It 467 specifies the document character set, which determines a 468 character repertoire that contains all characters that occur in 469 all text entities in the document, and the code positions 470 associated with those characters. 472 The SGML declaration also specifies the syntax-reference 473 character set of the document, and a few other parameters that 474 bind the abstract syntax of SGML to a concrete syntax. This 475 concrete syntax determines how the sequence of characters of the 476 document is mapped to a sequence of terminals in the grammar of 477 the prologue. 479 For example, consider the following document: 481 482 Parsing Example 483

Some text. *wow*

485 An HTML user agent should use the SGML declaration that is given 486 in 9.5, "SGML Declaration for HTML". According to its document 487 character set, `*' refers to an asterisk character, `*'. 489 The instance above is regarded as the following sequence of 490 terminals: 492 1. start-tag: TITLE 494 2. data characters: ``Parsing Example'' 496 3. end-tag: TITLE 498 4. start-tag: P 500 5. data characters ``Some text. '' 502 6. start-tag: EM 504 7. data characters: ``*wow*'' 506 8. end-tag: EM 508 9. end-tag: P 510 The start symbol of the DTD grammar is HTML, and the productions 511 are given in the public text identified by `-//IETF//DTD HTML 512 2.0//EN' (9.1, "HTML DTD"). The terminals above parse as: 514 HTML 515 | 516 \-HEAD 517 | | 518 | \-TITLE 519 | | 520 | \- 521 | | 522 | \-"Parsing Example" 523 | | 524 | \- 525 | 526 \-BODY 527 | 528 \-P 529 | 530 \-

531 | 532 \-"Some text. " 533 | 534 \-EM 535 | | 536 | \- 537 | | 538 | \-"*wow*" 539 | | 540 | \- 541 | 542 \-

544 Some of the elements are delimited explicity by tags, while the 545 boundaries of others are inferred. The element contains a 546 element and a element. The contains 547 , which is explicitly delimited by start- and end-tags. 549 3.2. HTML Lexical Syntax 551 SGML specifies an abstract syntax and a reference concrete 552 syntax. Aside from certain quantities and capacities (e.g. the 553 limit on the length of a name), all HTML documents use the 554 reference concrete syntax. In particular, all markup characters 555 are in the repertoire of [ISO-646]. Data characters are drawn 556 from the document character set (see 6, "Characters, Words, and 557 Paragraphs"). 559 A complete discussion of SGML parsing, e.g. the mapping of a 560 sequence of characters to a sequence of tags and data, is left 561 to the SGML standard[SGML]. This section is only a summary. 563 3.2.1. Data Characters 565 Any sequence of characters that do not constitute markup (see 566 9.6 ``Delimiter Recognition'' of [SGML]) are mapped directly to 567 strings of data characters. Some markup also maps to data 568 character strings. Numeric character references map to 569 single-character strings, via the document character set. Each 570 reference to one of the general entities defined in the HTML DTD 571 maps to a single-character string. 573 For example, 575 abc<def => "abc","<","def" 576 abc<def => "abc","<","def" 578 The terminating semicolon on entity or numeric character 579 references is only necessary when the character following the 580 reference would otherwise be recognized as part of the name (see 581 9.4.5 ``Reference End'' in [SGML]). 583 abc < def => "abc ","<"," def" 584 abc < def => "abc ","<"," def" 586 An ampersand is only recognized as markup when it is followed by 587 a letter or a `#' and a digit: 589 abc & lt def => "abc & lt def" 590 abc &# 60 def => "abc &# 60 def" 592 A useful technique for translating plain text to HTML is to 593 replace each '<', '&', and '>' by an entity reference or numeric 594 character reference as follows: 596 ENTITY NUMERIC 597 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 598 --------- ---------- ----------- --------------------- 599 & & & Ampersand 600 < < < Less than 601 > > > Greater than 603 NOTE - There are SGML mechanisms, CDATA and RCDATA 604 declared content, that allow most `<', `>', and `&' 605 characters to be entered without the use of entity 606 references. Because these mechanisms tend to be used and 607 implemented inconsistently, and because they conflict 608 with techniques for reducing HTML to 7 bit ASCII for 609 transport, they are deprecated in this version of HTML. 610 See 5.5.2.1, "Example and Listing: XMP, LISTING". 612 3.2.2. Tags 614 Tags delimit elements such as headings, paragraphs, lists, 615 character highlighting, and links. Most HTML elements are 616 identified in a document as a start-tag, which gives the element 617 name and attributes, followed by the content, followed by the 618 end tag. Start-tags are delimited by `<' and `>'; end tags are 619 delimited by `</' and `>'. An example is: 621 <H1>This is a Heading</H1> 623 Some elements only have a start-tag without an end-tag. For 624 example, to create a line break, you use the `<BR>' tag. 625 Additionally, the end tags of some other elements, such as 626 Paragraph (`</P>'), List Item (`</LI>'), Definition Term 627 (`</DT>'), and Definition Description (`<DD>') elements, may be 628 omitted. 630 The content of an element is a sequence of data character 631 strings and nested elements. Some elements, such as anchors, 632 cannot be nested. Anchors and character highlighting may be put 633 inside other constructs. See the HTML DTD, 9.1, "HTML DTD" for 634 full details. 636 NOTE - The SGML declaration for HTML specifies SHORTTAG 637 YES, which means that there are other valid syntaxes for 638 tags, such as NET tags, `<EM/.../'; empty start tags, 639 `<>'; and empty end-tags, `</>'. Until support for these 640 idioms is widely deployed, their use is strongly 641 discouraged. 643 3.2.3. Names 645 A name consists of a letter followed by letters, digits, 646 periods, or hyphens. The length of a name is limited to 72 647 characters by the `NAMELEN' parameter in the SGML delcaration 648 for HTML, 9.5, "SGML Declaration for HTML". Element and 649 attribute names are not case sensitive, but entity names are. 650 For example, `<BLOCKQUOTE>', `<BlockQuote>', and `<blockquote>' 651 are equivalent, whereas `&' is different from `&'. 653 In a start-tag, the element name must immediately follow the tag 654 open delimiter `<'. 656 3.2.4. Attributes 658 In a start-tag, white space and attributes are allowed between 659 the element name and the closing delimiter. An attribute 660 specification typically consists of an attribute name, an equal 661 sign, and a value, though some attribute specifications may be 662 just a name token. White space is allowed around the equal sign. 664 The value of the attribute may be either: 666 * A string literal, delimited by single quotes or double 667 quotes and not containing any occurrences of the delimiting 668 character. 670 NOTE - Some historical implementations consider any 671 occurrence of the `>' character to signal the end of 672 a tag. For compatibility with such implementations, 673 when `>' appears in an attribute value, it should be 674 represented with a numeric character reference. For 675 example, `<IMG SRC="eq1.jpg" alt="a>b">' should be 676 written `<IMG SRC="eq1.jpg" alt="a>b">' or `<IMG 677 SRC="eq1.jpg" alt="a>b">'. 679 * A name token (a sequence of letters, digits, periods, or 680 hyphens). Name tokens are not case sensitive. 682 NOTE - Some historical implementations allow any 683 character except space or `>' in a name token. 685 In this example, <img> is the element name, src is the attribute 686 name, and `http://host/dir/file.gif' is the attribute value: 688 <img src='http://host/dir/file.gif'> 690 A useful technique for computing an attribute value literal for 691 a given string is to replace each quote and white space 692 character by an entity reference or numeric character reference 693 as follows: 695 ENTITY NUMERIC 696 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 697 --------- ---------- ----------- --------------------- 698 HT Tab 699 LF Line Feed 700 CR Carriage Return 701 SP Space 702 " " " Quotation mark 703 & & & Ampersand 705 For example: 707 <IMG SRC="image.jpg" alt="First "real" example"> 709 The `NAMELEN' parameter in the SGML declaration (9.5, "SGML 710 Declaration for HTML") limits the length of an attribute value 711 to 1024 characters. 713 Attributes such as ISMAP and COMPACT may be written using a 714 minimized syntax (see 7.9.1.2 ``Omitted Attribute Name'' in 715 [SGML]). The markup: 717 <UL COMPACT="compact"> 719 can be written using a minimized syntax: 721 <UL COMPACT> 723 NOTE - Some historical implementations only understand 724 the minimized syntax. 726 3.2.5. Comments 728 To include comments in an HTML document, use a comment 729 declaration. A comment declaration consists of `<!' followed by 730 zero or more comments followed by `>'. Each comment starts with 731 `--' and includes all text up to and including the next 732 occurrence of `--'. In a comment declaration, white space is 733 allowed after each comment, but not before the first comment. 734 The entire comment declaration is ignored. 736 NOTE - Some historical HTML implementations incorrectly 737 consider any `>' character to be the termination of a 738 comment. 740 For example: 742 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 743 <HEAD> 744 <TITLE>HTML Comment Example 745 746 747 748 749 750

752 3.3. HTML Public Text Identifiers 754 To identify information as an HTML document conforming to this 755 specification, each document must start with one of the 756 following document type declarations. 758 760 This document type declaration refers to the HTML DTD in 9.1, 761 "HTML DTD". 763 NOTE - If the body of a `text/html' message entity does 764 not begin with a document type declaration, an HTML user 765 agent should infer the above document type declaration. 767 769 This document type declaration also refers to the HTML DTD which 770 appears in 9.1, "HTML DTD". 772 774 This document type declaration refers to the level 1 HTML DTD in 775 9.3, "Level 1 HTML DTD". Form elements must not occur in level 1 776 documents. 778 779 781 These two document type declarations refer to the HTML DTD in 782 9.2, "Strict HTML DTD" and 9.4, "Strict Level 1 HTML DTD". They 783 refer to the more structurally rigid definition of HTML. 785 HTML user agents may support other document types. In 786 particular, they may support other formal public identifiers, or 787 other document types altogether. They may support an internal 788 declaration subset with supplemental entity, element, and other 789 markup declarations. 791 3.4. Example HTML Document 793 794 795 796 797 Structural Example 798 799

First Header

800

This is a paragraph in the example HTML file. Keep in mind 801 that the title does not appear in the document text, but that 802 the header (defined by H1) does.

803
    804
  1. First item in an ordered list. 805
  2. Second item in an ordered list. 806
      807
    • Note that lists can be nested; 808
    • Whitespace may be used to assist in reading the 809 HTML source. 810
    811
  3. Third item in an ordered list. 812
813

This is an additional paragraph. Technically, end tags are 814 not required for paragraphs, although they are allowed. You can 815 include character highlighting in a paragraph. This sentence 816 of the paragraph is emphasized. Note that the </P> 817 end tag has been omitted. 818

819 Warning: 820 Be sure to read these bold instructions. 821 823 4. HTML as an Internet Media Type 825 An HTML user agent allows users to interact with resources which 826 have HTML representations. At a minimum, it must allow users to 827 examine and navigate the content of HTML level 1 documents. HTML 828 user agents should be able to preserve all formatting 829 distinctions represented in an HTML document, and be able to 830 simultaneously present resources referred to by IMG elements 831 (they may ignore some formatting distinctions or IMG resources 832 at the request of the user). Level 2 HTML user agents should 833 support form entry and submission. 835 4.1. text/html media type 837 This specification defines the Internet Media Type[IMEDIA] 838 (formerly referred to as the Content Type[MIME]) called 839 `text/html'. The following is to be registered with [IANA]. 841 Media Type name 842 text 844 Media subtype name 845 html 847 Required parameters 848 none 850 Optional parameters 851 level, charset 853 Encoding considerations 854 any encoding is allowed 856 Security considerations 857 see 10, "Security Considerations" 859 The optional parameters are defined as follows: 861 Level 862 The level parameter specifies the feature set used in 863 the document. The level is an integer number, implying 864 that any features of same or lower level may be present 865 in the document. Level 1 is all features defined in this 866 specification except those that require the

867 element. Level 2 includes form processing. Level 2 is 868 the default. 870 Charset 871 The charset parameter (as defined in section 7.1.1 of 872 RFC 1521[MIME]) may be given to specify the character 873 encoding scheme used to represent the HTML document as a 874 sequence of octets. The default value is outside the 875 scope of this specification; but for example, the 876 default is `US-ASCII' in the context of MIME mail, and 877 `ISO-8859-1' in the context of HTTP. 879 4.2. HTML Document Representation 881 A message entity with a content type of `text/html' represents 882 an HTML document, consisting of a single text entity. The 883 `charset' parameter (whether implicit or explicit) identifies a 884 character encoding scheme. The text entity consists of the 885 characters determined by this character encoding scheme and the 886 octets of the body of the message entity. 888 4.2.1. Undeclared Markup Error Handling 890 To facilitate experimentation and interoperability between 891 implementations of various versions of HTML, the installed base 892 of HTML user agents supports a superset of the HTML 2.0 language 893 by reducing it to HTML 2.0: markup in the form of a start-tag or 894 end-tag, whose generic identifier is not declared is mapped to 895 nothing during tokenization. Undeclared attributes are treated 896 similarly. The entire attribute specification of an unknown 897 attribute (i.e., the unknown attribute and its value, if any) 898 should be ignored. On the other hand, references to undeclared 899 entities should be treated as data characters. 901 For example: 903

foo

...

904 =>

,"foo",

,

,"..." 905 xxx

yyy 906 => "xxx ",

," yyy 907 Let α & β be finite sets. 908 => "Let α & β be finite sets." 910 Support for notifying the user of such errors is encouraged. 912 Information providers are warned that this convention is not 913 binding: unspecified behavior may result, as such markup does 914 not conform to this specification. 916 4.2.2. Conventional Representation of Newlines 918 SGML specifies that a text entity is a sequence of records, each 919 beginning with a record start character and ending with a record 920 end character (code positions 10 and 13 respectively) (section 921 7.6.1, ``Record Boundaries'' in [SGML]). 923 [MIME] specifies that a body of type `text/*' is a sequence of 924 lines, each terminated by CRLF, that is, octets 13, 10. 926 In practice, HTML documents are frequently represented and 927 transmitted using an end of line convention that depends on the 928 conventions of the source of the document; frequently, that 929 representation consists of CR only, LF only, or a CR LF 930 sequence. Hence the decoding of the octets will often result in 931 a text entity with some missing record start and record end 932 characters. 934 Since there is no ambiguity, HTML user agents are encouraged to 935 infer the missing record start and end characters. 937 An HTML user agent should treat end of line in any of its 938 variations as a word space in all contexts except preformatted 939 text. Within preformatted text, an HTML user agent should treat 940 any of the three common representations of end-of-line as 941 starting a new line. 943 5. Document Structure 945 An HTML document is a tree of elements, including a head and 946 body, headings, paragraphs, lists, etc. Form elements are 947 discussed in 8, "Forms". 949 5.1. Document Element: HTML 951 The HTML document element consists of a head and a body, much 952 like a memo or a mail message. The head contains the title and 953 optional elements. The body is a text flow consisting of 954 paragraphs, lists, and other elements. 956 5.2. Head: HEAD 958 The head of an HTML document is an unordered collection of 959 information about the document. For example: 961 962 963 Introduction to HTML 964 965 ... 967 5.2.1. Title: TITLE 969 Every HTML document must contain a element. 971 The title should identify the contents of the document in a 972 global context. A short title, such as ``Introduction'' may be 973 meaningless out of context. A title such as ``Introduction to 974 HTML Elements'' is more appropriate. 976 NOTE - The length of a title is not limited; however, 977 long titles may be truncated in some applications. To 978 minimize this possibility, titles should be fewer than 979 64 characters. 981 A user agent may display the title of a document in a history 982 list or as a label for the window displaying the document. This 983 differs from headings (5.4, "Headings: H1 ... H6"), which are 984 typically displayed within the body text flow. 986 5.2.2. Base Address: BASE 988 The optional <BASE> element allows the address of a document to 989 be recorded in situations in which the document may be read out 990 of context. The required HREF attribute specifies the base URI 991 (see 7, "Hyperlinks") for navigating the document, overriding 992 any context otherwise known to the user agent. The value of the 993 HREF attribute must be an absolute URI. 995 5.2.3. Keyword Index: ISINDEX 997 The <ISINDEX> element indicates that the user agent should allow 998 the user to search an index by giving keywords. See 7.5, 999 "Queries and Indexes" for details. 1001 5.2.4. Link: LINK 1003 The <LINK> element represents a hyperlink (see 7, "Hyperlinks"). 1004 It has the same attributes as the <A> element (see 5.7.3, 1005 "Anchor: A"). 1007 The <LINK> element is typically used to indicate authorship, 1008 related indexes and glossaries, older or more recent versions, 1009 style sheets, document hierarchy etc. 1011 5.2.5. Associated Meta-information: META 1013 The <META> element is an extensible container for use in 1014 identifying specialized document meta-information. 1015 Meta-information has two main functions: 1017 * to provide a means to discover that the data set exists 1018 and how it might be obtained or accessed; and 1020 * to document the content, quality, and features of a data 1021 set, indicating its fitness for use. 1023 Each <META> element specifies a name/value pair. If multiple 1024 META elements are provided with the same name, their combined 1025 contents--concatenated as a comma-separated list--is the value 1026 associated with that name. 1028 NOTE - The <META> element should not be used where a 1029 specific element, such as <TITLE>, would be more 1030 appropriate. 1032 HTTP servers may read the content of the document <HEAD> to 1033 generate header fields corresponding to any elements defining a 1034 value for the attribute HTTP-EQUIV. 1036 NOTE - The method by which the server extracts document 1037 meta-information is unspecified and not mandatory. The 1038 <META> element only provides an extensible mechanism for 1039 identifying and embedding document meta-information -- 1040 how it may be used is up to the individual server 1041 implementation and the HTML user agent. 1043 Attributes of the META element: 1045 HTTP-EQUIV 1046 binds the element to an HTTP header field. An HTTP 1047 server may use this information to process the document. 1048 In particular, it may include a header field in the 1049 responses to requests for this document: the header name 1050 is taken from the HTTP-EQUIV attribute value, and the 1051 header value is taken from the value of the CONTENT 1052 attribute. HTTP header names are not case sensitive. 1054 NAME 1055 specifies the name of the name/value pair. If not 1056 present, HTTP-EQUIV gives the name. 1058 CONTENT 1059 specifies the value of the name/value pair. 1061 Examples 1063 If the document contains: 1065 <META HTTP-EQUIV="Expires" 1066 CONTENT="Tue, 04 Dec 1993 21:29:02 GMT"> 1067 <meta http-equiv="Keywords" CONTENT="Fred"> 1068 <META HTTP-EQUIV="Reply-to" 1069 content="fielding@ics.uci.edu (Roy Fielding)"> 1070 <Meta Http-equiv="Keywords" CONTENT="Barney"> 1072 then the server may include the following header fields: 1074 Expires: Tue, 04 Dec 1993 21:29:02 GMT 1075 Keywords: Fred, Barney 1076 Reply-to: fielding@ics.uci.edu (Roy Fielding) 1078 as part of the HTTP response to a `GET' or `HEAD' request for 1079 that document. 1081 An HTTP server must not use the <META> element to form an HTTP 1082 response header unless the HTTP-EQUIV attribute is present. 1084 An HTTP server may disregard any <META> elements that specify 1085 information controlled by the HTTP server, for example `Server', 1086 `Date', and `Last-modified'. 1088 5.2.6. Next Id: NEXTID 1090 The <NEXTID> element is included for historical reasons only. 1091 HTML document should not contain <NEXTID> elements. 1093 The <NEXTID> element gives a hint for the name to use for a new 1094 <A> element when editing an HTML document. It should be distinct 1095 from all NAME attribute values on <A> elements. For example: 1097 <NEXTID N=Z27> 1099 5.3. Body: BODY 1101 The <BODY> element contains the text flow of the document, 1102 including headings, paragraphs, lists, etc. 1104 For example: 1106 <BODY> 1107 <h1>Important Stuff</h1> 1108 <p>Explanation about important stuff... 1109 </BODY> 1111 5.4. Headings: H1 ... H6 1113 The six heading elements, <H1> through <H6>, denote section 1114 headings. Although the order and occurrence of headings is not 1115 constrained by the HTML DTD, documents should not skip levels 1116 (for example, from H1 to H3), as converting such documents to 1117 other representations is often problematic. 1119 Example of use: 1121 <H1>This is a heading</H1> 1122 Here is some text 1123 <H2>Second level heading</H2> 1124 Here is some more text. 1126 Typical renderings are: 1128 H1 1129 Bold, very-large font, centered. One or two blank lines 1130 above and below. 1132 H2 1133 Bold, large font, flush-left. One or two blank lines 1134 above and below. 1136 H3 1137 Italic, large font, slightly indented from the left 1138 margin. One or two blank lines above and below. 1140 H4 1141 Bold, normal font, indented more than H3. One blank line 1142 above and below. 1144 H5 1145 Italic, normal font, indented as H4. One blank line 1146 above. 1148 H6 1149 Bold, indented same as normal text, more than H5. One 1150 blank line above. 1152 5.5. Block Structuring Elements 1154 Block structuring elements include paragraphs, lists, and block 1155 quotes. They must not contain heading elements, but they may 1156 contain phrase markup, and in some cases, they may be nested. 1158 5.5.1. Paragraph: P 1160 The <P> element indicates a paragraph. The exact indentation, 1161 leading space, etc. of a paragraph is not specified and may be a 1162 function of other tags, style sheets, etc. 1164 Typically, paragraphs are surrounded by a vertical space of one 1165 line or half a line. The first line in a paragraph is indented 1166 in some cases. 1168 Example of use: 1170 <H1>This Heading Precedes the Paragraph</H1> 1171 <P>This is the text of the first paragraph. 1172 <P>This is the text of the second paragraph. Although you do not 1173 need to start paragraphs on new lines, maintaining this 1174 convention facilitates document maintenance.</P> 1175 <P>This is the text of a third paragraph.</P> 1177 5.5.2. Preformatted Text: PRE 1179 The <PRE> element represents a character cell block of text and 1180 is suitable for text that has been formatted for a monospaced 1181 font. 1183 The <PRE> tag may be used with the optional WIDTH attribute. The 1184 WIDTH attribute specifies the maximum number of characters for a 1185 line and allows the HTML user agent to select a suitable font 1186 and indentation. 1188 Within preformatted text: 1190 * Line breaks within the text are rendered as a move to the 1191 beginning of the next line. 1193 NOTE - References to the ``beginning of a new line'' 1194 do not imply that the renderer is forbidden from 1195 using a constant left indent for rendering 1196 preformatted text. The left indent may be 1197 constrained by the width required. 1199 * Anchor elements and phrase markup may be used. 1201 NOTE - Constraints on the processing of <PRE> 1202 content may may limit or prevent the ability of the 1203 HTML user agent to faithfully render phrase markup. 1205 * Elements that define paragraph formatting (headings, 1206 address, etc.) must not be used. 1208 NOTE - Some historical documents contain <P> tags in 1209 <PRE> elements. User agents are encouraged to treat 1210 this as a line break. A <P> tag followed by a 1211 newline character should produce only one line 1212 break, not a line break plus a blank line. 1214 * The horizontal tab character (code position 9 in the HTML 1215 document character set) must be interpreted as the smallest 1216 positive nonzero number of spaces which will leave the 1217 number of characters so far on the line as a multiple of 8. 1218 Documents should not contain tab characters, as they are not 1219 supported consistently. 1221 Example of use: 1223 <PRE> 1224 Line 1. 1225 Line 2 is to the right of line 1. <a href="abc">abc</a> 1226 Line 3 aligns with line 2. <a href="def">def</a> 1227 </PRE> 1229 5.5.2.1. Example and Listing: XMP, LISTING 1231 The <XMP> and <LISTING> elements are similar to the <PRE> 1232 element, but they have a different syntax. Their content is 1233 declared as CDATA, which means that no markup except the end-tag 1234 open delimiter-in-context is recognized (see 9.6 ``Delimiter 1235 Recognition'' of [SGML]). 1237 NOTE - In a previous draft of the HTML specification, 1238 the syntax of <XMP> and <LISTING> elements allowed 1239 closing tags to be treated as data characters, as long 1240 as the tag name was not <XMP> or <LISTING>, 1241 respectively. 1243 Since CDATA declared content has a number of unfortunate 1244 interactions with processing techniques and tends to be used and 1245 implemented inconsistently, HTML documents should not contain 1246 <XMP> nor <LISTING> elements -- the <PRE> tag is more expressive 1247 and more consistently supported. 1249 The <LISTING> element should be rendered so that at least 132 1250 characters fit on a line. The <XMP> element should be rendered 1251 so that at least 80 characters fit on a line but is otherwise 1252 identical to the <LISTING> element. 1254 NOTE - In a previous draft, HTML included a <PLAINTEXT> 1255 element that is similar to the <LISTING> element, except 1256 that there is no closing tag: all characters after the 1257 <PLAINTEXT> start-tag are data. 1259 5.5.3. Address: ADDRESS 1261 The <ADDRESS> element contains such information as address, 1262 signature and authorship, often at the beginning or end of the 1263 body of a document. 1265 Typically, the <ADDRESS> element is rendered in an italic 1266 typeface and may be indented. 1268 Example of use: 1270 <ADDRESS> 1271 Newsletter editor<BR> 1272 J.R. Brown<BR> 1273 JimquickPost News, Jimquick, CT 01234<BR> 1274 Tel (123) 456 7890 1275 </ADDRESS> 1277 5.5.4. Block Quote: BLOCKQUOTE 1279 The <BLOCKQUOTE> element contains text quoted from another 1280 source. 1282 A typical rendering might be a slight extra left and right 1283 indent, and/or italic font. The <BLOCKQUOTE> typically provides 1284 space above and below the quote. 1286 Single-font rendition may reflect the quotation style of 1287 Internet mail by putting a vertical line of graphic characters, 1288 such as the greater than symbol (>), in the left margin. 1290 Example of use: 1292 I think the poem ends 1293 <BLOCKQUOTE> 1294 <P>Soft you now, the fair Ophelia. Nymph, in thy orisons, be all 1295 my sins remembered. 1296 </BLOCKQUOTE> 1297 but I am not sure. 1299 5.6. List Elements 1301 HTML includes a number of list elements. They may be used in 1302 combination; for example, a <OL> may be nested in an <LI> 1303 element of a <UL>. 1305 The COMPACT attribute suggests that a compact rendering be used. 1307 5.6.1. Unordered List: UL, LI 1309 The <UL> represents a list of items -- typically a bulleted 1310 list. 1312 The content of a <UL> element is a sequence of <LI> elements. 1313 For example: 1315 <UL> 1316 <LI>First list item 1317 <LI>Second list item 1318 <p>second paragraph of second item 1319 <LI>Third list item 1320 </UL> 1322 5.6.2. Ordered List: OL 1324 The <OL> element represents an ordered list of items, sorted by 1325 sequence or order of importance. It is typically rendered as a 1326 numbered list. 1328 The content of a <OL> element is a sequence of <LI> elements. 1329 For example: 1331 <OL> 1332 <LI>Click the Web button to open URI window. 1333 <LI>Enter the URI number in the text field of the Open URI 1334 window. The Web document you specified is displayed. 1335 <ol> 1336 <li>substep 1 1337 <li>substep 2 1338 </ol> 1339 <LI>Click highlighted text to move from one link to another. 1340 </OL> 1342 5.6.3. Directory List: DIR 1344 The <DIR> element is similar to the <UL> element. It represents 1345 a list of short items, typically up to 20 characters each. Items 1346 in a directory list may be arranged in columns, typically 24 1347 characters wide. 1349 The content of a <DIR> element is a sequence of <LI> elements. 1350 Nested block elements are not allowed in the content of <DIR> 1351 elements. For example: 1353 <DIR> 1354 <LI>A-H<LI>I-M 1355 <LI>M-R<LI>S-Z 1356 </DIR> 1358 5.6.4. Menu List: MENU 1360 The <MENU> element is a list of items with typically one line 1361 per item. The menu list style is typically more compact than the 1362 style of an unordered list. 1364 The content of a <MENU> element is a sequence of <LI> elements. 1365 Nested block elements are not allowed in the content of <MENU> 1366 elements. For example: 1368 <MENU> 1369 <LI>First item in the list. 1370 <LI>Second item in the list. 1371 <LI>Third item in the list. 1372 </MENU> 1374 5.6.5. Definition List: DL, DT, DD 1376 A definition list is a list of terms and corresponding 1377 definitions. Definition lists are typically formatted with the 1378 term flush-left and the definition, formatted paragraph style, 1379 indented after the term. 1381 The content of a <DL> element is a sequence of <DT> elements 1382 and/or <DD> elements, usually in pairs. Multiple <DT> may be 1383 paired with a single <DD> element. Documents should not contain 1384 multiple consecutive <DD> elements. 1386 Example of use: 1388 <DL> 1389 <DT>Term<DD>This is the definition of the first term. 1390 <DT>Term<DD>This is the definition of the second term. 1391 </DL> 1393 If the DT term does not fit in the DT column (typically one 1394 third of the display area), it may be extended across the page 1395 with the DD section moved to the next line, or it may be wrapped 1396 onto successive lines of the left hand column. 1398 The optional COMPACT attribute suggests that a compact rendering 1399 be used, because the list items are small and/or the entire list 1400 is large. 1402 Unless the COMPACT attribute is present, an HTML user agent may 1403 leave white space between successive DT, DD pairs. The COMPACT 1404 attribute may also reduce the width of the left-hand (DT) 1405 column. 1407 <DL COMPACT> 1408 <DT>Term<DD>This is the first definition in compact format. 1409 <DT>Term<DD>This is the second definition in compact format. 1410 </DL> 1412 5.7. Phrase Markup 1414 Phrases may be marked up according to idiomatic usage, 1415 typographic appearance, or for use as hyperlink anchors. 1417 User agents must render highlighted phrases distinctly from 1418 plain text. Additionally, <EM> content must be rendered as 1419 distinct from <STRONG> content, and <B> content must rendered as 1420 distinct from <I> content. 1422 Phrase elements may be nested within the content of other phrase 1423 elements; however, HTML user agents may render nested phrase 1424 elements indistinctly from non-nested elements: 1426 plain <B>bold <I>italic</I></B> may be rendered 1427 the same as plain <B>bold </B><I>italic</I> 1429 5.7.1. Idiomatic Elements 1431 Phrases may be marked up to indicate certain idioms. 1433 NOTE - User agents may support the <DFN> element, not 1434 included in this specification, as it has been deployed 1435 to some extent. It is used to indicate the defining 1436 instance of a term, and it is typically rendered in 1437 italic or bold italic. 1439 5.7.1.1. Citation: CITE 1441 The <CITE> element is used to indicate the title of a book or 1442 other citation. It is typically rendered as italics. For 1443 example: 1445 He just couldn't get enough of <cite>The Grapes of Wrath</cite>. 1447 5.7.1.2. Code: CODE 1449 The <CODE> element indicates an example of code, typically 1450 rendered in a mono-spaced font. The <CODE> element is intended 1451 for short words or phrases of code; the <PRE> block structuring 1452 element (5.5.2, "Preformatted Text: PRE") is more apropriate for 1453 multiple-line listings. For example: 1455 The expression <code>x += 1</code> 1456 is short for <code>x = x + 1</code>. 1458 5.7.1.3. Emphasis: EM 1460 The <EM> element indicates an emphasized phrase, typically 1461 rendered as italics. For example: 1463 A singular subject <em>always</em> takes a singular verb. 1465 5.7.1.4. Keyboard: KBD 1467 The <KBD> element indicates text typed by a user, typically 1468 rendered in a mono-spaced font. This is commonly used in 1469 instruction manuals. For example: 1471 Enter <kbd>FIND IT</kbd> to search the database. 1473 5.7.1.5. Sample: SAMP 1475 The <SAMP> element indicates a sequence of literal characters, 1476 typically rendered in a mono-spaced font. For example: 1478 The only word containing the letters <samp>mt</samp> is dreamt. 1480 5.7.1.6. Strong Emphasis: STRONG 1482 The <STRONG> element indicates strong emphasis, typically 1483 rendered in bold. For example: 1485 <strong>STOP</strong>, or I'll say "<strong>STOP</strong>" again!. 1487 5.7.1.7. Variable: VAR 1489 The <VAR> element indicates a placeholder variable, typically 1490 rendered as italic. For example: 1492 Type <SAMP>html-check <VAR>file</VAR> | more</SAMP> 1493 to check <VAR>file</VAR> for markup errors. 1495 5.7.2. Typographic Elements 1497 Typographic elements are used to specify the format of marked 1498 text. 1500 Typical renderings for idiomatic elements may vary between user 1501 agents. If a specific rendering is necessary -- for example, 1502 when referring to a specific text attribute as in ``The italic 1503 parts are mandatory'' -- a typographic element can be used to 1504 ensure that the intended typography is used where possible. 1506 NOTE - User agents may support some typographic elements 1507 not included in this specification, as they have been 1508 deployed to some extent. The <STRIKE> element indicates 1509 horizontal line through the characters, and the <U> 1510 element indicates an underline. 1512 5.7.2.1. Bold: B 1514 The <B> element indicates bold text. Where bold typography is 1515 unavailable, an alternative representation may be used. 1517 5.7.2.2. Italic: I 1519 The <I> element indicates italic text. Where italic typography 1520 is unavailable, an alternative representation may be used. 1522 5.7.2.3. Teletype: TT 1524 The <TT> element indicates teletype (monospaced )text. Where a 1525 teletype font is unavailable, an alternative representation may 1526 be used. 1528 5.7.3. Anchor: A 1530 The <A> element indicates a hyperlink anchor (see 7, 1531 "Hyperlinks"). At least one of the NAME and HREF attributes 1532 should be present. Attributes of the <A> element: 1534 HREF 1535 gives the URI of the head anchor of a hyperlink. 1537 NAME 1538 gives the name of the anchor, and makes it available as 1539 a head of a hyperlink. 1541 TITLE 1542 suggests a title for the destination resource -- 1543 advisory only. The TITLE attribute may be used: 1545 * for display prior to accessing the destination 1546 resource, for example, as a margin note or on a 1547 small box while the mouse is over the anchor, or 1548 while the document is being loaded; 1550 * for resources that do not include a title, such as 1551 graphics, plain text and Gopher menus, for use as a 1552 window title. 1554 REL 1555 The REL attribute gives the relationship(s) described by 1556 the hyperlink. The value is a whitespace separated list 1557 of relationship names. 1559 REV 1560 same as the REL attribute, but the semantics of the 1561 relationship are in the reverse direction. A link from A 1562 to B with REL=``X'' expresses the same relationship as a 1563 link from B to A with REV=``X''. An anchor may have both 1564 REL and REV attributes. 1566 URN 1567 specifies a preferred, more persistent identifier for 1568 the head anchor of the hyperlink. The syntax and 1569 semantics of the URN attribute are not yet specified. 1571 METHODS 1572 specifies methods to be used in accessing the 1573 destination, as a whitespace-separated list of names. 1574 The set of applicable names is a function of the scheme 1575 of the URI in the HREF attribute. For similar reasons as 1576 for the TITLE attribute, it may be useful to include the 1577 information in advance in the link. For example, the 1578 HTML user agent may chose a different rendering as a 1579 function of the methods allowed; for example, something 1580 that is searchable may get a different icon. 1582 5.8. Line Break: BR 1584 The <BR> element specifies a line break between words (see 6, 1585 "Characters, Words, and Paragraphs"). For example: 1587 <P> Pease porridge hot<BR> 1588 Pease porridge cold<BR> 1589 Pease porridge in the pot<BR> 1590 Nine days old. 1592 5.9. Horizontal Rule: HR 1594 The <HR> element is a divider between sections of text; 1595 typically a full width horizontal rule or equivalent graphic. 1596 For example: 1598 <HR> 1599 <ADDRESS>February 8, 1995, CERN</ADDRESS> 1600 </BODY> 1602 5.10. Image: IMG 1604 The <IMG> element refers to an image or icon via a hyperlink 1605 (see 7.3, "Simultaneous Presentation of Image Resources"). 1607 HTML user agents may process the value of the ALT attribute as 1608 an alternative to processing the image resource indicated by the 1609 SRC attribute. 1611 NOTE - Some HTML user agents can process graphics linked 1612 via anchors, but not <IMG> graphics. If a graphic is 1613 essential, it should be referenced from an <A> element 1614 rather than an <IMG> element. If the graphic is not 1615 essential, then the <IMG> element is appropriate. 1617 Attributes of the <IMG> element: 1619 ALIGN 1620 alignment of the image with respect to the text 1621 baseline. 1623 * `TOP' specifies that the top of the image aligns 1624 with the tallest item on the line containing the 1625 image. 1627 * `MIDDLE' specifies that the center of the image 1628 aligns with the baseline of the line containing the 1629 image. 1631 * `BOTTOM' specifies that the bottom of the image 1632 aligns with the baseline of the line containing the 1633 image. 1635 ALT 1636 text to use in place of the referenced image resource, 1637 for example due to processing constraints or user 1638 preference. 1640 ISMAP 1641 indicates an image map (see 7.6, "Image Maps"). 1643 SRC 1644 specifies the URI of the image resource. 1646 NOTE - In practice, the media types of image 1647 resources are limited to a few raster graphic 1648 formats: typically `image/gif', `image/jpeg'. In 1649 particular, `text/html' resources are not 1650 intended to be used as image resources. 1652 Examples of use: 1654 <IMG SRC="triangle.xbm" ALT="Warning:"> Be sure 1655 to read these instructions. 1657 <a href="http://machine/htbin/imagemap/sample"> 1658 <IMG SRC="sample.xbm" ISMAP> 1659 </a> 1661 6. Characters, Words, and Paragraphs 1663 An HTML user agent should present the body of an HTML document 1664 as a collection of typeset paragraphs and preformatted text. 1665 Except for preformatted elements (<PRE>, <XMP>, <LISTING>, 1666 <TEXTAREA>), each block structuring element is regarded as a 1667 paragraph by taking the data characters in its content and the 1668 content of its descendant elements, concatenating them, and 1669 splitting the result into words, separated by space, tab, or 1670 record end characters (and perhaps hyphen characters). The 1671 sequence of words is typeset as a paragraph by breaking it into 1672 lines. 1674 6.1. The HTML Document Character Set 1676 The document character set specified in 9.5, "SGML Declaration 1677 for HTML" must be supported by HTML user agents. It includes the 1678 graphic characters of Latin Alphabet No. 1, or simply Latin-1. 1679 Latin-1 comprises 191 graphic characters, including the 1680 alphabets of most Western European languages. 1682 NOTE - Use the non-breaking space and soft hyphen 1683 indicator characters is discouraged because support for 1684 them is not widely deployed. 1686 NOTE - To support non-western writing systems, a larger 1687 character repertoire will be specified in a future 1688 version of HTML. The document character set will be 1689 [ISO-10646], or some subset that agrees with 1690 [ISO-10646]; in particular, all numeric character 1691 references must use code positions assigned by 1692 [ISO-10646]. 1694 In SGML applications, the use of control characters is limited 1695 in order to maximize the chance of successful interchange over 1696 heterogeneous networks and operating systems. In the HTML 1697 document character set only three control characters are 1698 allowed: Horizontal Tab, Carriage Return, and Line Feed (code 1699 positions 9, 13, and 10). 1701 The HTML DTD references the Added Latin 1 entity set, to allow 1702 mnemonic representation of selected Latin 1 characters using 1703 only the widely supported ASCII character repertoire. For 1704 example: 1706 Kurt Gödel was a famous logician and mathematician. 1708 See 9.7.2, "ISO Latin 1 Character Entity Set" for a table of the 1709 ``Added Latin 1'' entities, and 13, "The HTML Coded Character 1710 Set" for a table of the code positions of [ISO 8859-1] and the 1711 control characters in the HTML document character set. 1713 7. Hyperlinks 1715 In addition to general purpose elements such as paragraphs and 1716 lists, HTML documents can express hyperlinks. An HTML user agent 1717 allows the user to navigate these hyperlinks. 1719 A hyperlink is a relationship between two anchors, called the 1720 head and the tail of the hyperlink[DEXTER]. Each anchor is 1721 addressed, or uniquely identified, by an absolute Uniform 1722 Resource Identifier (URI), optionally followed by a '#' and a 1723 sequence of characters called a fragment identifier, as per 1724 [RELURL]. For example: 1726 http://www.w3.org/hypertext/WWW/TheProject.html 1727 http://www.w3.org/hypertext/WWW/TheProject.html#z31 1729 In an anchor address, the URI refers to a resource; it may be 1730 used in a variety of information retrieval protocols to obtain 1731 an entity that represents the resource, such as an HTML 1732 document. The fragment identifier, if present, refers to some 1733 view on, or portion of the resource. 1735 An HTML user agent begins navigation with an absolute URI, 1736 called the base URI, and an HTML document that is a 1737 representation of the resource identified by the base URI. 1739 Each of the following markup constructs indicates the tail 1740 anchor of a hyperlink or set of hyperlinks: 1742 * <A> elements with HREF present. 1744 * <LINK> elements. 1746 * <IMG> elements. 1748 * <INPUT> elements with the SRC attribute present. 1750 * <ISINDEX> elements. 1752 * <FORM> elements with `METHOD=GET'. 1754 These markup constructs refer to head anchors either directly by 1755 means of an absolute URI, or indirectly by means of a relative 1756 URI, which must be combined with the base URI as in [RELURL] to 1757 determine the address of the head anchor. The markup may also 1758 include fragment identifiers, separated from the URI by a '#' 1759 character. 1761 7.1. Accessing Resources 1763 Once the address of the head anchor is determined, the user 1764 agent may obtain a representation of the resource, for example 1765 as in [URL]. 1767 For example, if the base URI is `http://host/x/y.html' and the 1768 document contains: 1770 <img src="../icons/abc.gif"> 1772 then the user agent uses the URI `http://host/icons/abc.gif' to 1773 access the resource linked from the <IMG> element. 1775 If the URI in the address of the head anchor is the same as the 1776 base URI, then the base document is sufficient as a 1777 representation of the resource. A user agent must _not_, for 1778 example, use any network information retrieval protocols to 1779 obtain a new representation of the resourse. 1781 For example, if the base uri is 1782 `http:'/www.w3.org/hypertext/WWW/TheProject.html/, then each of 1783 the following markup constructs indicates a link whose head and 1784 tail anchors have the same URI in their address: 1786 <a href="#xyz"> 1787 <a href="../WWW/TheProject.html"> 1788 <a href="./TheProject.html"> 1789 <a href="TheProject.html"> 1790 <a href="TheProject.html#z21"> 1791 <a href="../../hypertext/WWW/TheProject.html"> 1792 <a href="http://www.w3.org/hypertext/WWW/TheProject.html"> 1794 7.2. Activation of Hyperlinks 1796 An HTML user agent allows the user to navigate the content of 1797 the document and request activation of hyperlinks denoted by <A> 1798 elements. HTML user agents should also allow activation of 1799 <LINK> element hyperlinks. 1801 To activate a link, the user agent obtains a representation of 1802 the resource identified in the address of the head anchor. If 1803 the representation is another HTML document, navigation may 1804 begin again with this new document. The base URI for navigation 1805 is taken from the head anchor by default; however, any <BASE> 1806 tag in the destination document overrides this default. The 1807 process of obtaining the destination document may also override 1808 the base URI, as in the case of an HTTP `URI:' header or 1809 redirection transaction. 1811 7.3. Simultaneous Presentation of Image Resources 1813 An HTML user agent may activate hyperlinks indicated by <IMG> 1814 and <INPUT> elements concurrently with processing the document; 1815 that is, image hyperlinks may be processed without explicit 1816 request by the user. Image resources should be embedded in the 1817 presentation at the point of the tail anchor, that is the <IMG> 1818 or <INPUT> element. 1820 <LINK> hyperlinks may also be processed without explicit user 1821 request; for example, style sheet resources may be processed 1822 before or during the processing of the document. 1824 7.4. Fragment Identifiers 1826 Any characters following a `#' character in a hypertext address 1827 constitute a fragment identifier. In particular, an address of 1828 the form `#fragment' refers to an anchor in the same document. 1830 The meaning of fragment identifiers depends on the media type of 1831 the representation of the anchor's resource. For `text/html' 1832 representations, it refers to the <A> element with a NAME 1833 attribute whose value is the same as the fragment identifier. 1834 The matching is case sensitive. The document should have exactly 1835 one such element. The user agent should indicate the anchor 1836 element, for example by scrolling to and/or highlighting the 1837 phrase. 1839 For example, if the base URI is `http://host/x/y.html' and the 1840 user activated the link denoted by the following markup: 1842 <p> See: <a href="app1.html#bananas">appendix 1</a> 1843 for more detail on bananas. 1845 Then the user agent accesses the resource identified by 1846 `http://host/x/app1.html'. Assuming the resource is represented 1847 using the `text/html' media type, the user agent must locate the 1848 <A> element whose NAME attribute is `bananas' and begin 1849 navigation there. 1851 7.5. Queries and Indexes 1853 The <ISINDEX> element represents a set of hyperlinks. The user 1854 can choose from the set by providing keywords to the user agent. 1855 The user agent computes the head URI by appending `?' and the 1856 keywords to the base URI. The keywords are escaped according to 1857 [URL] and joined by `+'. For example, if a document contains: 1859 <BASE HREF="http://host/index"> 1860 <ISINDEX> 1862 and the user provides the keywords `apple' and `berry', then the 1863 user agent must access the resource 1864 `http://host/index?apple+berry'. 1866 <FORM> elements with `METHOD=GET' also represent sets of 1867 hyperlinks. See 8.2.2, "Query Forms: METHOD=GET" for details. 1869 7.6. Image Maps 1871 If the ISMAP attribute is present on an <IMG> element, the <IMG> 1872 element must be contained in an <A> element with an HREF 1873 present. This construct represents a set of hyperlinks. The user 1874 can choose from the set by choosing a pixel of the image. The 1875 user agent computes the head URI by appending `?' and the x and 1876 y coordinates of the pixel to the URI given in the <A> element. 1877 For example, if a document contains: 1879 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 1880 <head><title>ImageMap Example 1881 1882 1883

Choose any of these icons:
1884
1886 and the user chooses the upper-leftmost pixel, the chosen 1887 hyperlink is the one with the URI 1888 `http://host/cgi-bin/imagemap?0,0'. 1890 8. Forms 1892 A form is a template for a form data set and an associated 1893 method and action URI. A form data set is a sequence of 1894 name/value pair fields. The names are specified on the NAME 1895 attributes of form input elements, and the values are given 1896 initial values by various forms of markup and edited by the 1897 user. The resulting form data set is used to access an 1898 information service as a function of the action and method. 1900 Forms elements can be mixed in with document structuring 1901 elements. For example, a

 element may contain a 
1902	    element, or a  element may contain lists which contain
1903	     elements. This gives considerable flexibility in
1904	    designing the layout of forms.

1906	    Form processing is a level 2 feature.

1908	8.1. Form Elements

1910	8.1.1. Form: FORM

1912	    The  element contains a sequence of input elements, along
1913	    with document structuring elements. The attributes are:

1915	    ACTION
1916	            specifies the action URI for the form. The action URI of
1917	            a form defaults to the base URI of the document (see 7,
1918	            "Hyperlinks").

1920	    METHOD
1921	            selects a method of accessing the action URI. The set of
1922	            applicable methods is a function of the scheme of the
1923	            action URI of the form. See 8.2.2, "Query Forms:
1924	            METHOD=GET" and 8.2.3, "Forms with Side-Effects:
1925	            METHOD=POST".

1927	    ENCTYPE
1928	            specifies the media type used to encode the name/value
1929	            pairs for transport, in case the protocol does not
1930	            itself impose a format. See 8.2.1, "The form-urlencoded
1931	            Media Type".

1933	8.1.2. Input Field: INPUT

1935	    The  element represents a field for user input. The TYPE
1936	    attribute discriminates between several variations of fields.

1938	    The  element has a number of attributes. The set of
1939	    applicable attributes depends on the value of the TYPE
1940	    attribute.

1942	8.1.2.1. Text Field: INPUT TYPE=TEXT

1944	    The default vaule of the TYPE attribute is `TEXT', indicating a
1945	    single line text entry fields. (Use the 

2165	    The content of the 
2273	    
2274	    Nickname: 
2275	    

Thank you for responding to this questionnaire. 2276

2277

2279 The initial state of the form data set is: 2281 name 2282 ``'' 2284 gender 2285 ``male'' 2287 family 2288 ``'' 2290 other 2291 ``'' 2293 nickname 2294 ``'' 2296 Note that the radio input has an initial value, while the 2297 checkbox has none. 2299 The user might edit the fields and request that the form be 2300 submitted. At that point, suppose the values are: 2302 name 2303 ``John Doe'' 2305 gender 2306 ``male'' 2308 family 2309 ``5'' 2311 city 2312 ``kent'' 2314 city 2315 ``miami'' 2317 other 2318 ``abc\ndef'' 2320 nickname 2321 ``J&D'' 2323 The user agent then conducts an HTTP POST transaction using the 2324 URI `http://www.w3.org/sample'. The message body would be 2325 (ignore the line break): 2327 name=John+Doe&gender=male&family=5&city=kent&city=miami& 2328 other=abc%0D%0Adef&nickname=J%26D 2330 9. HTML Public Text 2331 9.1. HTML DTD 2333 This is the Document Type Definition for the HyperText Markup 2334 Language, level 2. 2336 2348 2354 2355 ... 2356 2357 -- 2358 > 2360 2362 2371 2373 ]]> 2375 2384 2390 2396 2398 2403 2407 2409 2411 2413 2415 2417 %ISOlat1; 2419 2420 2421 2422 2424 2426 2443 2445 2447 2449 2451 2454 2456 2460 2462 2464 2465 2468 2471 2475 2476 2477 2479 2480 2481 2482 2483 2484 2485 2487 2489 ]]> 2490 2492 2493 2497 2499 2501 2503 2511 Heading 2514 is preferred to 2515

Heading

2516 --> 2517 ]]> 2519 2521 2522 " 2527 > 2528 2529 2530 2531 2532 2533 2534 2535 2537 2539 2540 #AttVal(Alt)" 2546 > 2548 2549 2550 2551 2552 2554 2556 2557 2561 2563 2565 2566 2570 2572 2573 2576 2579 2582 2585 2588 2592 2593 2594 2595 2596 2597 2599 2601 2603 ]]> 2605 2607 2609 ]]> 2611 2613 2617 2619 2620 2621 2626 2627 2629 2637 2638 2642 2647 2648 2650 2651 2653 2656 ]]> 2658 2660 2661 2667 2668 2672 2673 2677 2678 2679 2680 2682 2683 2687 2691 2692 2693 2694 2696 2697 Directory" 2701 > 2702 Menu" 2706 > 2708 2709 2710 2711 2713 2714 2718 2720 2722 Heading 2725

Text ... 2726 is preferred to 2727

Heading

2728 Text ... 2729 --> 2730 ]]> 2732 2735 2737 2739 2740 2744 2746 2747 2752 2753 2755 2758 Form:" 2763 %SDASUFF; "Form End." 2764 > 2766 2767 2768 2769 2771 2774 2775 2787 2788 2789 2790 2791 2792 2793 2794 2795 2797 2798 Select #AttVal(Multiple)" 2805 > 2807 2808 2809 2810 2812 2813 2821 2822 2823 2825 2826 2834 2835 2836 2837 2839 ]]> 2841 2843 2845 ]]> 2846 2848 2850 2852 2854 2855 2858 2860 2861 " > 2866 2867 2868 2869 2870 2871 2872 2874 2875 [Document is indexed/searchable.]"> 2879 2881 2882 2885 2886 2888 2889 2892 2893 2895 2896 2901 2902 2903 2904 2906 2908 2910 ]]> 2911 2912 2913 2915 2920 2922 9.2. Strict HTML DTD 2924 This document type declaration refers to the HTML DTD with the 2925 `HTML.Recommended' entity defined as `INCLUDE' rather than 2926 IGNORE; that is, it refers to the more structurally rigid 2927 definition of HTML. 2929 2940 2947 2948 ... 2949 2950 -- 2951 > 2953 2954 2956 2957 %html; 2959 9.3. Level 1 HTML DTD 2961 This document type declaration refers to the HTML DTD with the 2962 `HTML.Forms' entity defined as `IGNORE' rather than `INCLUDE'. 2964 Documents which contain
elements do not conform to this 2965 DTD, and must use the level 2 DTD. 2967 2978 2985 2986 ... 2987 2988 -- 2989 > 2991 2992 2994 2995 %html; 2997 9.4. Strict Level 1 HTML DTD 2999 This document type declaration refers to the level 1 HTML DTD 3000 with the `HTML.Recommended' entity defined as `INCLUDE' rather 3001 than IGNORE; that is, it refers to the more structurally rigid 3002 definition of HTML. 3004 3015 3021 3022 ... 3023 3024 -- 3025 > 3027 3028 3030 3031 %html-1; 3033 9.5. SGML Declaration for HTML 3035 This is the SGML Declaration for HyperText Markup Language. 3037 3117 3124 9.6. Sample SGML Open Entity Catalog for HTML 3126 The SGML standard describes an ``entity manager'' as the portion 3127 or component of an SGML system that maps SGML entities into the 3128 actual storage model (e.g., the file system). The standard 3129 itself does not define a particular mapping methodology or 3130 notation. 3132 To assist the interoperability among various SGML tools and 3133 systems, the SGML Open consortium has passed a technical 3134 resolution that defines a format for an application- independent 3135 entity catalog that maps external identifiers and/or entity 3136 names to file names. 3138 Each entry in the catalog associates a storage object identifier 3139 (such as a file name) with information about the external entity 3140 that appears in the SGML document. In addition to entries that 3141 associate public identifiers, a catalog entry can associate an 3142 entity name with a storage object identifier. For example, the 3143 following are possible catalog entries: 3145 -- catalog: SGML Open style entity catalog for HTML -- 3146 -- $Id: catalog,v 1.2 1994/11/30 23:45:18 connolly Exp $ -- 3148 -- Ways to refer to Level 2: most general to most specific -- 3149 PUBLIC "-//IETF//DTD HTML//EN" html.dtd 3150 PUBLIC "-//IETF//DTD HTML 2.0//EN" html.dtd 3151 PUBLIC "-//IETF//DTD HTML Level 2//EN" html.dtd 3152 PUBLIC "-//IETF//DTD HTML 2.0 Level 2//EN" html.dtd 3154 -- Ways to refer to Level 1: most general to most specific -- 3155 PUBLIC "-//IETF//DTD HTML Level 1//EN" html-1.dtd 3156 PUBLIC "-//IETF//DTD HTML 2.0 Level 1//EN" html-1.dtd 3158 -- Ways to refer to Level 0: most general to most specific -- 3159 PUBLIC "-//IETF//DTD HTML Level 0//EN" html-0.dtd 3160 PUBLIC "-//IETF//DTD HTML 2.0 Level 0//EN" html-0.dtd 3162 -- Ways to refer to Strict Level 2: most general to most specif\ 3163 c -- 3164 PUBLIC "-//IETF//DTD HTML Strict//EN" html-s.dtd 3165 PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN" html-s.dtd 3166 PUBLIC "-//IETF//DTD HTML Strict Level 2//EN" html-s.dtd 3167 PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 2//EN" html-s.dtd 3169 -- Ways to refer to Strict Level 1: most general to most specif\ 3170 c -- 3171 PUBLIC "-//IETF//DTD HTML Strict Level 1//EN" html-1s.dtd 3172 PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN" html-1s.dtd 3173 -- Ways to refer to Strict Level 0: most general to most specif\ 3174 c -- 3175 PUBLIC "-//IETF//DTD HTML Strict Level 0//EN" html-0s.dtd 3176 PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 0//EN" html-0s.dtd 3178 -- ISO latin 1 entity set for HTML -- 3179 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" ISOlat1\ 3180 sgml 3182 9.7. Character Entity Sets 3184 The HTML DTD defines the following entities. They represent 3185 particular graphic characters which have special meanings in 3186 places in the markup, or may not be part of the character set 3187 available to the writer. 3189 9.7.1. Numeric and Special Graphic Entity Set 3191 The following table lists each of the characters included from 3192 the Numeric and Special Graphic entity set, along with its name, 3193 syntax for use, and description. This list is derived from `ISO 3194 Standard 8879:1986//ENTITIES Numeric and Special Graphic//EN'. 3195 However, HTML does not include for the entire entity set -- only 3196 the entities listed below are included. 3198 GLYPH NAME SYNTAX DESCRIPTION 3199 < lt < Less than sign 3200 > gt > Greater than sign 3201 & amp & Ampersand 3202 " quot " Double quote sign 3204 9.7.2. ISO Latin 1 Character Entity Set 3206 The following public text lists each of the characters specified 3207 in the Added Latin 1 entity set, along with its name, syntax for 3208 use, and description. This list is derived from ISO Standard 3209 8879:1986//ENTITIES Added Latin 1//EN. HTML includes the entire 3210 entity set. 3212 3217 3222 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3280 3281 3282 3283 3284 3285 3286 3288 10. Security Considerations 3290 Anchors, embedded images, and all other elements which contain 3291 URIs as parameters may cause the URI to be dereferenced in 3292 response to user input. In this case, the security 3293 considerations of [URL] apply. 3295 The widely deployed methods for submitting forms requests -- 3296 HTTP and SMTP -- provide little assurance of confidentiality. 3297 Information providers who request sensitive information via 3298 forms -- especially by way of the `PASSWORD' type input field 3299 (see 8.1.2, "Input Field: INPUT") -- should be aware and make 3300 their users aware of the lack of confidentiality. 3302 11. References 3304 [URI] 3305 T. Berners-Lee. ``Universal Resource Identifiers in WWW: 3306 A Unifying Syntax for the Expression of Names and 3307 Addresses of Objects on the Network as used in the 3308 World- Wide Web.'' RFC 1630, CERN, June 1994. 3309 3311 [URL] 3312 T. Berners-Lee, L. Masinter, and M. McCahill. ``Uniform 3313 Resource Locators (URL).'' RFC 1738, CERN, Xerox PARC, 3314 University of Minnesota, October 1994. 3315 3317 [HTTP] 3318 T. Berners-Lee, R. T. Fielding, and H. Frystyk Nielsen. 3319 ``Hypertext Transfer Protocol - HTTP/1.0.'' Work in 3320 Progress, MIT, UC Irvine, CERN, March 1995. 3321 3323 [MIME] 3324 N. Borenstein and N. Freed. ``MIME (Multipurpose 3325 Internet Mail Extensions) Part One: Mechanisms for 3326 Specifying and Describing the Format of Internet Message 3327 Bodies.'' RFC 1521, Bellcore, Innosoft, September 1993. 3328 3330 [RELURL] 3331 R. Fielding. ``Relative Uniform Resource Locators.'' RFC 3332 1808, June 1995 3333 3335 [GOLD90] 3336 C. F. Goldfarb. ``The SGML Handbook.'' Y. Rubinsky, Ed., 3337 Oxford University Press, 1990. 3339 [DEXTER] 3340 Frank Halasz and Mayer Schwartz, ``The Dexter Hypertext 3341 Reference Model'', ``Communications of the ACM'', pp. 3342 30-39, vol. 37 no. 2, Feb 1994, 3344 [IMEDIA] 3345 J. Postel. ``Media Type Registration Procedure.'', 3346 USC/ISI, March 1994. 3347 3349 [IANA] 3350 J. Reynolds and J. Postel. ``Assigned Numbers.'' STD 2, 3351 RFC 1700, USC/ISI, October 1994. 3352 3354 [SQ91] 3355 SoftQuad. ``The SGML Primer.'' 3rd ed., SoftQuad Inc., 3356 1991. 3358 [ISO-646] 3359 ISO/IEC 646:1991 Information technology -- ISO 7-bit 3360 coded character set for information interchange 3361 3363 [ISO-10646] 3364 ISO/IEC 10646-1:1993 Information technology -- Universal 3365 Multiple-Octet Coded Character Set (UCS) -- Part 1: 3366 Architecture and Basic Multilingual Plane 3367 3369 [ISO-8859-1] 3370 ISO 8859. International Standard -- Information 3371 Processing -- 8-bit Single-Byte Coded Graphic Character 3372 Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987. 3373 3375 [SGML] 3376 ISO 8879. Information Processing -- Text and Office 3377 Systems - Standard Generalized Markup Language (SGML), 3378 1986. 3380 12. Acknowledgments 3382 The HTML document type was designed by Tim Berners-Lee at CERN 3383 as part of the 1990 World Wide Web project. In 1992, Dan 3384 Connolly wrote the HTML Document Type Definition (DTD) and a 3385 brief HTML specification. 3387 Since 1993, a wide variety of Internet participants have 3388 contributed to the evolution of HTML, which has included the 3389 addition of in-line images introduced by the NCSA Mosaic 3390 software for WWW. Dave Raggett played an important role in 3391 deriving the FORMS material from the HTML+ specification. 3393 Dan Connolly and Karen Olson Muldrow rewrote the HTML 3394 Specification in 1994. The document was then edited by the HTML 3395 working group as a whole, with updates being made by Eric 3396 Schieler, Mike Knezovich, and Eric W. Sink at Spyglass, Inc. 3397 Finally, Roy Fielding restructured the entire draft into its 3398 current form. 3400 Special thanks to the many active participants in the HTML 3401 working group, too numerous to list individually, without whom 3402 there would be no standards process and no standard. That this 3403 document approaches its objective of carefully converging a 3404 description of current practice and formalization of HTML's 3405 relationship to SGML is a tribute to their effort. 3407 12.1. Authors' Addresses 3409 Tim Berners-Lee 3411 Director, W3 Consortium 3412 MIT Laboratory for Computer Science 3413 545 Technology Square 3414 Cambridge, MA 02139, U.S.A. 3415 Tel: +1 (617) 253 9670 3416 Fax: +1 (617) 258 8682 3417 Email: timbl@w3.org 3419 Daniel W. Connolly 3421 Research Technical Staff, W3 Consortium 3422 MIT Laboratory for Computer Science 3423 545 Technology Square 3424 Cambridge, MA 02139, U.S.A. 3425 Fax: +1 (617) 258 8682 3426 Email: connolly@w3.org 3427 URI: http://www.w3.org/hypertext/WWW/People/Connolly/ 3429 13. The HTML Coded Character Set 3431 This list details the code positions and characters of the HTML 3432 document character set, specified in 9.5, "SGML Declaration for 3433 HTML". This coded character set is based on [ISO-8859-1]. 3435 REFERENCE DESCRIPTION 3436 -------------- ----------- 3437 � -  Unused 3438 Horizontal tab 3439 Line feed 3440 - Unused 3441 Carriage Return 3442  -  Unused 3443 Space 3444 ! Exclamation mark 3445 " Quotation mark 3446 # Number sign 3447 $ Dollar sign 3448 % Percent sign 3449 & Ampersand 3450 ' Apostrophe 3451 ( Left parenthesis 3452 ) Right parenthesis 3453 * Asterisk 3454 + Plus sign 3455 , Comma 3456 - Hyphen 3457 . Period (fullstop) 3458 / Solidus (slash) 3459 0 - 9 Digits 0-9 3460 : Colon 3461 ; Semi-colon 3462 < Less than 3463 = Equals sign 3464 > Greater than 3465 ? Question mark 3466 @ Commercial at 3467 A - Z Letters A-Z 3468 [ Left square bracket 3469 \ Reverse solidus (backslash) 3470 ] Right square bracket 3471 ^ Caret 3472 _ Horizontal bar (underscore) 3473 ` Acute accent 3474 a - z Letters a-z 3475 { Left curly brace 3476 | Vertical bar 3477 } Right curly brace 3478 ~ Tilde 3479  - Ÿ Unused 3480   Non-breaking Space 3481 ¡ Inverted exclamation 3482 ¢ Cent sign 3483 £ Pound sterling 3484 ¤ General currency sign 3485 ¥ Yen sign 3486 ¦ Broken vertical bar 3487 § Section sign 3488 ¨ Umlaut (dieresis) 3489 © Copyright 3490 ª Feminine ordinal 3491 « Left angle quote, guillemotleft 3492 ¬ Not sign 3493 ­ Soft hyphen 3494 ® Registered trademark 3495 ¯ Macron accent 3496 ° Degree sign 3497 ± Plus or minus 3498 ² Superscript two 3499 ³ Superscript three 3500 ´ Acute accent 3501 µ Micro sign 3502 ¶ Paragraph sign 3503 · Middle dot 3504 ¸ Cedilla 3505 ¹ Superscript one 3506 º Masculine ordinal 3507 » Right angle quote, guillemotright 3508 ¼ Fraction one-fourth 3509 ½ Fraction one-half 3510 ¾ Fraction three-fourths 3511 ¿ Inverted question mark 3512 À Capital A, grave accent 3513 Á Capital A, acute accent 3514 Â Capital A, circumflex accent 3515 Ã Capital A, tilde 3516 Ä Capital A, dieresis or umlaut mark 3517 Å Capital A, ring 3518 Æ Capital AE dipthong (ligature) 3519 Ç Capital C, cedilla 3520 È Capital E, grave accent 3521 É Capital E, acute accent 3522 Ê Capital E, circumflex accent 3523 Ë Capital E, dieresis or umlaut mark 3524 Ì Capital I, grave accent 3525 Í Capital I, acute accent 3526 Î Capital I, circumflex accent 3527 Ï Capital I, dieresis or umlaut mark 3528 Ð Capital Eth, Icelandic 3529 Ñ Capital N, tilde 3530 Ò Capital O, grave accent 3531 Ó Capital O, acute accent 3532 Ô Capital O, circumflex accent 3533 Õ Capital O, tilde 3534 Ö Capital O, dieresis or umlaut mark 3535 × Multiply sign 3536 Ø Capital O, slash 3537 Ù Capital U, grave accent 3538 Ú Capital U, acute accent 3539 Û Capital U, circumflex accent 3540 Ü Capital U, dieresis or umlaut mark 3541 Ý Capital Y, acute accent 3542 Þ Capital THORN, Icelandic 3543 ß Small sharp s, German (sz ligature) 3544 à Small a, grave accent 3545 á Small a, acute accent 3546 â Small a, circumflex accent 3547 ã Small a, tilde 3548 ä Small a, dieresis or umlaut mark 3549 å Small a, ring 3550 æ Small ae dipthong (ligature) 3551 ç Small c, cedilla 3552 è Small e, grave accent 3553 é Small e, acute accent 3554 ê Small e, circumflex accent 3555 ë Small e, dieresis or umlaut mark 3556 ì Small i, grave accent 3557 í Small i, acute accent 3558 î Small i, circumflex accent 3559 ï Small i, dieresis or umlaut mark 3560 ð Small eth, Icelandic 3561 ñ Small n, tilde 3562 ò Small o, grave accent 3563 ó Small o, acute accent 3564 ô Small o, circumflex accent 3565 õ Small o, tilde 3566 ö Small o, dieresis or umlaut mark 3567 ÷ Division sign 3568 ø Small o, slash 3569 ù Small u, grave accent 3570 ú Small u, acute accent 3571 û Small u, circumflex accent 3572 ü Small u, dieresis or umlaut mark 3573 ý Small y, acute accent 3574 þ Small thorn, Icelandic 3575 ÿ Small y, dieresis or umlaut mark 3577 14. Proposed Entities 3579 The HTML DTD references the ``Added Latin 1'' entity set, which 3580 only supplies named entities for a subset of the non-ASCII 3581 characters in [ISO-8859-1], namely the accented characters. The 3582 following entities should be supported so that all ISO 8859-1 3583 characters may only be referenced symbolically. The names for 3584 these entities are taken from the appendixes of [SGML]. 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 3675 3676 3677 3678 3679 3680 3681