idnits 2.17.1 draft-ietf-html-spec-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-18) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 19 instances of too long lines in the document, the longest one being 18 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 16, 1995) is 10534 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'ISO 8859-1' is mentioned on line 1711, but not defined == Unused Reference: 'URI' is defined on line 3279, but no explicit reference was found in the text == Unused Reference: 'HTTP' is defined on line 3292, but no explicit reference was found in the text == Unused Reference: 'GOLD90' is defined on line 3310, but no explicit reference was found in the text == Unused Reference: 'SQ91' is defined on line 3329, but no explicit reference was found in the text ** Downref: Normative reference to an Informational RFC: RFC 1630 (ref. 'URI') ** Obsolete normative reference: RFC 1738 (ref. 'URL') (Obsoleted by RFC 4248, RFC 4266) -- Possible downref: Non-RFC (?) normative reference: ref. 'HTTP' ** Obsolete normative reference: RFC 1521 (ref. 'MIME') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) -- Possible downref: Non-RFC (?) normative reference: ref. 'RELURL' -- Possible downref: Non-RFC (?) normative reference: ref. 'GOLD90' -- Possible downref: Non-RFC (?) normative reference: ref. 'DEXTER' ** Obsolete normative reference: RFC 1590 (ref. 'IMEDIA') (Obsoleted by RFC 2045, RFC 2046, RFC 2047, RFC 2048, RFC 2049) ** Obsolete normative reference: RFC 1700 (ref. 'IANA') (Obsoleted by RFC 3232) -- Possible downref: Non-RFC (?) normative reference: ref. 'SQ91' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-10646' -- Possible downref: Non-RFC (?) normative reference: ref. 'ISO-8859-1' -- Possible downref: Non-RFC (?) normative reference: ref. 'SGML' Summary: 14 errors (**), 0 flaws (~~), 9 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 HTML Working Group T. Berners-Lee 3 INTERNET-DRAFT MIT/W3C 4 D. Connolly 5 Expires: In six months June 16, 1995 7 Hypertext Markup Language - 2.0 9 Status of this Memo 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its areas, and 13 its working groups. Note that other groups may also distribute working 14 documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six months 17 and may be updated, replaced, or obsoleted by other documents at any 18 time. It is inappropriate to use Internet-Drafts as reference material 19 or to cite them other than as ``work in progress.'' 21 To learn the current status of any Internet-Draft, please check the 22 1id-abstracts.txt listing contained in the Internet-Drafts Shadow 23 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 24 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 25 ftp.isi.edu (US West Coast). 27 Distribution of this document is unlimited. Please send comments to the 28 HTML working group (HTML-WG) of the Internet Engineering Task Force 29 (IETF) at . Discussions of the group are archived at 30 . 32 ABSTRACT 34 The Hypertext Markup Language (HTML) is a simple markup language 35 used to create hypertext documents that are platform 36 independent. HTML documents are SGML documents with generic 37 semantics that are appropriate for representing information from 38 a wide range of domains. HTML markup can represent hypertext 39 news, mail, documentation, and hypermedia; menus of options; 40 database query results; simple structured documents with 41 in-lined graphics; and hypertext views of existing bodies of 42 information. 44 HTML has been in use by the World Wide Web (WWW) global 45 information initiative since 1990. This specification roughly 46 corresponds to the capabilities of HTML in common use prior to 47 June 1994. HTML is an application of ISO Standard 8879:1986 48 Information Processing Text and Office Systems; Standard 49 Generalized Markup Language (SGML). 51 The `text/html' Internet Media Type (RFC 1590) and MIME Content 52 Type (RFC 1521) is defined by this specification. 54 CONTENTS 56 1 Introduction .......................................... 3 57 1.1 Scope ................................................. 3 58 1.2 Conformance ........................................... 3 59 2 Terms ................................................. 5 60 3 HTML as an Application of SGML ........................ 9 61 3.1 SGML Documents ........................................ 9 62 3.2 HTML Lexical Syntax .................................. 11 63 3.3 HTML Public Text Identifiers ......................... 15 64 3.4 Example HTML Document ................................ 16 65 4 HTML as an Internet Media Type ....................... 16 66 4.1 text/html media type ................................. 16 67 4.2 HTML Document Representation ......................... 17 68 5 Document Structure ................................... 18 69 5.1 Document Element: HTML ............................... 19 70 5.2 Head: HEAD ........................................... 19 71 5.3 Body: BODY ........................................... 22 72 5.4 Headings: H1 ... H6 .................................. 22 73 5.5 Block Structuring Elements ........................... 23 74 5.6 List Elements ........................................ 25 75 5.7 Phrase Markup ........................................ 28 76 5.8 Line Break: BR ....................................... 31 77 5.9 Horizontal Rule: HR .................................. 31 78 5.10 Image: IMG ........................................... 31 79 6 Characters, Words, and Paragraphs .................... 33 80 6.1 The HTML Document Character Set ...................... 33 81 7 Hyperlinks ........................................... 34 82 7.1 Accessing Resources .................................. 34 83 7.2 Activation of Hyperlinks ............................. 34 84 7.3 Simultaneous Presentation of Image Resources ......... 35 85 7.4 Fragment Identifiers ................................. 35 86 7.5 Queries and Indexes .................................. 35 87 7.6 Image Maps ........................................... 36 88 8 Forms ................................................ 36 89 8.1 Form Elements ........................................ 37 90 8.2 Form Submission ...................................... 42 91 9 HTML Public Text ..................................... 45 92 9.1 HTML DTD ............................................. 45 93 9.2 Strict HTML DTD ...................................... 56 94 9.3 Level 1 HTML DTD ..................................... 57 95 9.4 Strict Level 1 HTML DTD .............................. 58 96 9.5 SGML Declaration for HTML ............................ 58 97 9.6 Sample SGML Open Entity Catalog for HTML ............. 60 98 9.7 Character Entity Sets ................................ 61 99 10 Security Considerations .............................. 63 100 11 References ........................................... 64 101 12 Acknowledgments ...................................... 65 102 12.1 Authors' Addresses ................................... 66 103 13 The HTML Coded Character Set ......................... 66 104 14 Proposed Entities .................................... 69 106 1. Introduction 108 The HyperText Markup Language (HTML) is a simple data format 109 used to create hypertext documents that are portable from one 110 platform to another. HTML documents are SGML documents with 111 generic semantics that are appropriate for representing 112 information from a wide range of domains. 114 As HTML is an application of SGML, this specification assumes a 115 working knowledge of [SGML]. 117 1.1. Scope 119 HTML has been in use by the World-Wide Web (WWW) global 120 information initiative since 1990. This specification 121 corresponds to the capabilities of HTML in common use prior to 122 June 1994 and referred to as ``HTML 2.0''. 124 HTML is an application of ISO Standard 8879:1986 _Information 125 Processing Text and Office Systems; Standard Generalized Markup 126 Language_ (SGML). The HTML Document Type Definition (DTD) is a 127 formal definition of the HTML syntax in terms of SGML. 129 This specification also defines HTML as an Internet Media 130 Type[IMEDIA] and MIME Content Type[MIME] called `text/html'. As 131 such, it defines the semantics of the HTML syntax and how that 132 syntax should be interpreted by user agents. 134 1.2. Conformance 136 This specification governs the syntax of HTML documents and 137 aspects of the behavior of HTML user agents. 139 1.2.1. Documents 141 A document is a conforming HTML document if: 143 * It is a conforming SGML document, and it conforms to the 144 HTML DTD (see 9.1, "HTML DTD"). 146 NOTE - There are a number of syntactic idioms that 147 are not supported or are supported inconsistently in 148 some historical user agent implementations. These 149 idioms are identified in notes like this throughout 150 this specification. 152 * It conforms to the application conventions in this 153 specification. For example, the value of the HREF attribute 154 of the element must conform to the URI syntax. 156 * Its document character set includes [ISO-8859-1] and 157 agrees with [ISO-10646]; that is, each code position listed 158 in 13, "The HTML Coded Character Set" is included, and each 159 code position in the document character set is mapped to the 160 same character as [ISO-10646] designates for that code 161 position. 163 NOTE - The document character set is somewhat 164 independent of the character encoding scheme used to 165 represent a document. For example, the `ISO-2022-JP' 166 character encoding scheme can be used for HTML 167 documents, since its repertoire is a subset of the 168 [ISO-10646] repertoire. The critical distinction is 169 that numeric character references agree with 170 [ISO-10646] regardless of how the document is 171 encoded. 173 1.2.2. Feature Test Entities 175 The HTML DTD defines a standard HTML document type and several 176 variations, by way of feature test entities. Feature test 177 entities are declarations in the HTML DTD that control the 178 inclusion or exclusion of portions of the DTD. 180 HTML.Recommended 181 Certain features of the language are necessary for 182 compatibility with widespread usage, but they may 183 compromise the structural integrity of a document. This 184 feature test entity selects a more prescriptive document 185 type definition that eliminates those features. It is 186 set to `IGNORE' by default. 188 For example, in order to preserve the structure of a 189 document, an editing user agent may translate HTML 190 documents to the recommended subset, or it may require 191 that the documents be in the recommended subset for 192 import. 194 HTML.Deprecated 195 Certain features of the language are necessary for 196 compatibility with earlier versions of the 197 specification, but they tend to be used and implemented 198 inconsistently, and their use is deprecated. This 199 feature test entity enables a document type definition 200 that allows these features. It is set to `INCLUDE' by 201 default. 203 Documents generated by translation software or editing 204 software should not contain deprecated idioms. 206 1.2.3. User Agents 208 An HTML user agent conforms to this specification if: 210 * It parses the characters of an HTML document into data 211 characters and markup according to [SGML]. 213 NOTE - In the interest of robustness and 214 extensibility, there are a number of widely deployed 215 conventions for handling non-conforming documents. 216 See 4.2.1, "Undeclared Markup Error Handling" for 217 details. 219 * It supports the `ISO-8859-1' character encoding scheme and 220 processes each character in the ISO Latin Alphabet No. 1 as 221 specified in 6.1, "The HTML Document Character Set". 223 NOTE - To support non-western writing systems, HTML 224 user agents are encouraged to support 225 `ISO-10646-UCS-2' or similar character encoding 226 schemes and as much of the character repertoire of 227 [ISO-10646] as is practical. 229 * It behaves identically for documents whose parsed token 230 sequences are identical. 232 For example, comments and the whitespace in tags disappear 233 during tokenization, and hence they do not influence the 234 behavior of conforming user agents. 236 * It allows the user to traverse (or at least attempt to 237 traverse, resources permitting) all hyperlinks from 238 elements in an HTML document. 240 An HTML user agent is a level 2 user agent if, additionally: 242 * It allows the user to express all form field values 243 specified in an HTML document and to (attempt to) submit the 244 values as requests to information services. 246 2. Terms 248 absolute URI 249 a URI in absolute form, as per [URL] 251 anchor 252 one of two ends of a hyperlink; typically, a phrase 253 marked as an element. 255 base URI 256 URI used as the base of an HTML document for the purpose 257 of resolving hyperlink destinations. 259 character 260 An atom of information, for example a letter or a digit. 261 Graphic characters have associated glyphs, where as 262 control characters have associated processing semantics. 264 character encoding 265 scheme 266 A function whose domain is the set of sequences of 267 octets, and whose range is the set of sequences of 268 characters from a character repertoire; that is, a 269 sequence of octets and a character encoding scheme 270 determines a sequence of characters. 272 character repertoire 273 A finite set of characters; e.g. the range of a coded 274 character set. 276 code position 277 An integer. A coded character set and a code position 278 from its domain determine a character. 280 coded character set 281 A function whose domain is a subset of the integers and 282 whose range is a character repertoire. That is, for some 283 set of integers (usually of the form {0, 1, 2, ..., N} 284 ), a coded character set and an integer in that set 285 determine a character. Conversely, a character and a 286 coded character set determine the character's code 287 position (or, in rare cases, a few code positions). 289 conforming HTML user 290 agent 291 A user agent that conforms to this specification in its 292 processing of the Internet Media Type `text/html'. 294 data character 295 Characters other than markup, which make up the content 296 of elements. 298 document character set 299 a coded character set whose range includes all 300 characters used in a document. Every SGML document has 301 exactly one document character set. Numeric character 302 references are resolved via the document character set. 304 DTD 305 document type definition. Rules that apply SGML to the 306 markup of documents of a particular type, including a 307 set of element and entity declarations. [SGML] 309 element 310 A component of the hierarchical structure defined by a 311 document type definition; it is identified in a document 312 instance by descriptive markup, usually a start-tag and 313 end-tag. [SGML] 315 end-tag 316 Descriptive markup that identifies the end of an 317 element. [SGML] 319 entity 320 data with an associated notation or interpretation; for 321 example, a sequence of octets associated with an 322 Internet Media Type. [SGML] 324 fragment identifier 325 the portion of an HREF attribute value following the `#' 326 character which modifies the presentation of the 327 destination of a hyperlink. 329 form data set 330 a sequence of name/value pairs; the names are given by 331 an HTML document and the values are given by a user. 333 HTML document 334 An SGML document conforming to this document type 335 definition. 337 hyperlink 338 a relationship between two anchors, called the tail and 339 the head. 341 markup 342 Syntactically delimited characters added to the data of 343 a document to represent its structure. There are four 344 different kinds of markup: descriptive markup (tags), 345 references, markup declarations, and processing 346 instructions. [SGML] 348 may 349 A document or user interface is conforming whether this 350 statement applies or not. 352 media type 353 an Internet Media Type, as per [IMEDIA]. 355 message entity 356 a head and body. The head is a collection of name/value 357 fields, and the body is a sequence of octets. The head 358 defines the content type and content transfer encoding 359 of the body. [MIME] 361 minimally conforming 362 HTML user agent 363 A user agent that conforms to this specification except 364 for form processing. It may only process level 1 HTML 365 documents. 367 must 368 Documents or user agents in conflict with this statement 369 are not conforming. 371 numeric character 372 reference 373 markup that refers to a character by its code position 374 in the document character set. 376 SGML document 377 A sequence of characters organized physically as a set 378 of entities and logically into a hierarchy of elements. 379 An SGML document consists of data characters and markup; 380 the markup describes the structure of the information 381 and an instance of that structure. [SGML] 383 shall 384 If a document or user agent conflicts with this 385 statement, it does not conform to this specification. 387 should 388 If a document or user agent conflicts with this 389 statement, undesirable results may occur in practice 390 even though it conforms to this specification. 392 start-tag 393 Descriptive markup that identifies the start of an 394 element and specifies its generic identifier and 395 attributes. [SGML] 397 syntax-reference 398 character set 399 A coded character set whose range includes all 400 characters used for markup; e.g. name characters and 401 delimiter characters. 403 tag 404 Markup that delimits an element. A tag includes a name 405 which refers to an element declaration in the DTD, and 406 may include attributes. [SGML] 408 text entity 409 A finite sequence of characters. A text entity typically 410 takes the form of a sequence of octets with some 411 associated character encoding scheme, transmitted over 412 the network or stored in a file. [SGML] 414 typical 415 Typical processing is described for many elements. This 416 is not a mandatory part of the specification but is 417 given as guidance for designers and to help explain the 418 uses for which the elements were intended. 420 URI 421 A Universal Resource Identifier is a formatted string 422 that serves as an identifier for a resource, typically 423 on the Internet. URIs are used in HTML to identify the 424 destination of hyperlinks. URIs in common practice 425 include Uniform Resource Locators (URLs)[URL] and 426 Relative URLs [RELURL]. 428 user agent 429 A component of a distributed system that presents an 430 interface and processes requests on behalf of a user; 431 for example, a www browser or a mail user agent. 433 WWW 434 The World-Wide Web is a hypertext-based, distributed 435 information system created by researchers at CERN in 436 Switzerland. 438 3. HTML as an Application of SGML 440 HTML is an application of ISO 8879:1986 -- Standard Generalized 441 Markup Language (SGML). SGML is a system for defining structured 442 document types and markup languages to represent instances of 443 those document types[SGML]. The public text -- DTD and SGML 444 declaration -- of the HTML document type definition are provided 445 in 9, "HTML Public Text". 447 The term _HTML_ refers to both the document type defined here 448 and the markup language for representing instances of this 449 document type. 451 3.1. SGML Documents 453 An HTML document is an SGML document; that is, a sequence of 454 characters organized physically into a set of entities, and 455 logically as a hierarchy of elements. 457 In the SGML specification, the first production of the SGML 458 syntax grammar separates an SGML document into three parts: an 459 SGML declaration, a prologue, and an instance. For the purposes 460 of this specification, the prologue is a DTD. This DTD describes 461 another grammar: the start symbol is given in the doctype 462 declaration, the terminals are data characters and tags, and the 463 productions are determined by the element declarations. The 464 instance must conform to the DTD, that is, it must be in the 465 language defined by this grammar. 467 The SGML declaration determines the lexicon of the grammar. It 468 specifies the document character set, which determines a 469 character repertoire that contains all characters that occur in 470 all text entities in the document, and the code positions 471 associated with those characters. 473 The SGML declaration also specifies the syntax-reference 474 character set of the document, and a few other parameters that 475 bind the abstract syntax of SGML to a concrete syntax. This 476 concrete syntax determines how the sequence of characters of the 477 document is mapped to a sequence of terminals in the grammar of 478 the prologue. 480 For example, consider the following document: 482 483 Parsing Example 484

Some text. *wow*

486 An HTML user agent should use the SGML declaration that is given 487 in 9.5, "SGML Declaration for HTML". According to its document 488 character set, `*' refers to an asterisk character, `*'. 490 The instance above is regarded as the following sequence of 491 terminals: 493 1. start-tag: TITLE 495 2. data characters: ``Parsing Example'' 497 3. end-tag: TITLE 499 4. start-tag: P 501 5. data characters ``Some text. '' 503 6. start-tag: EM 505 7. data characters: ``*wow*'' 507 8. end-tag: EM 509 9. end-tag: P 511 The start symbol of the DTD grammar is HTML, and the productions 512 are given in the public text identified by `-//IETF//DTD HTML 513 2.0//EN' (9.1, "HTML DTD"). The terminals above parse as: 515 HTML 516 | 517 \-HEAD 518 | | 519 | \-TITLE 520 | | 521 | \- 522 | | 523 | \-"Parsing Example" 524 | | 525 | \- 526 | 527 \-BODY 528 | 529 \-P 530 | 531 \-

532 | 533 \-"Some text. " 534 | 535 \-EM 536 | | 537 | \- 538 | | 539 | \-"*wow*" 540 | | 541 | \- 542 | 543 \-

545 Some of the elements are delimited explicity by tags, while the 546 boundaries of others are inferred. The element contains a 547 element and a element. The contains 548 , which is explicitly delimited by start- and end-tags. 550 3.2. HTML Lexical Syntax 552 SGML specifies an abstract syntax and a reference concrete 553 syntax. Aside from certain quantities and capacities (e.g. the 554 limit on the length of a name), all HTML documents use the 555 reference concrete syntax. In particular, all markup characters 556 are in the repertoire of [ISO-646]. Data characters are drawn 557 from the document character set (see 6, "Characters, Words, and 558 Paragraphs"). 560 A complete discussion of SGML parsing, e.g. the mapping of a 561 sequence of characters to a sequence of tags and data, is left 562 to the SGML standard[SGML]. This section is only a summary. 564 3.2.1. Data Characters 566 Any sequence of characters that do not constitute markup (see 567 9.6 ``Delimiter Recognition'' of [SGML]) are mapped directly to 568 strings of data characters. Some markup also maps to data 569 character strings. Numeric character references map to 570 single-character strings, via the document character set. Each 571 reference to one of the general entities defined in the HTML DTD 572 maps to a single-character string. 574 For example, 576 abc<def => "abc","<","def" 577 abc<def => "abc","<","def" 579 The terminating semicolon on entity or numeric character 580 references is only necessary when the character following the 581 reference would otherwise be recognized as part of the name (see 582 9.4.5 ``Reference End'' in [SGML]). 584 abc < def => "abc ","<"," def" 585 abc < def => "abc ","<"," def" 587 An ampersand is only recognized as markup when it is followed by 588 a letter or a `#' and a digit: 590 abc & lt def => "abc & lt def" 591 abc &# 60 def => "abc &# 60 def" 593 A useful technique for translating plain text to HTML is to 594 replace each '<', '&', and '>' by an entity reference or numeric 595 character reference as follows: 597 ENTITY NUMERIC 598 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 599 --------- ---------- ----------- --------------------- 600 & & & Ampersand 601 < < < Less than 602 > > > Greater than 604 NOTE - There are SGML mechanisms, CDATA and RCDATA 605 declared content, that allow most `<', `>', and `&' 606 characters to be entered without the use of entity 607 references. Because these mechanisms tend to be used and 608 implemented inconsistently, and because they conflict 609 with techniques for reducing HTML to 7 bit ASCII for 610 transport, they are deprecated in this version of HTML. 611 See 5.5.2.1, "Example and Listing: XMP, LISTING". 613 3.2.2. Tags 615 Tags delimit elements such as headings, paragraphs, lists, 616 character highlighting, and links. Most HTML elements are 617 identified in a document as a start-tag, which gives the element 618 name and attributes, followed by the content, followed by the 619 end tag. Start-tags are delimited by `<' and `>'; end tags are 620 delimited by `</' and `>'. An example is: 622 <H1>This is a Heading</H1> 624 Some elements only have a start-tag without an end-tag. For 625 example, to create a line break, you use the `<BR>' tag. 626 Additionally, the end tags of some other elements, such as 627 Paragraph (`</P>'), List Item (`</LI>'), Definition Term 628 (`</DT>'), and Definition Description (`<DD>') elements, may be 629 omitted. 631 The content of an element is a sequence of data character 632 strings and nested elements. Some elements, such as anchors, 633 cannot be nested. Anchors and character highlighting may be put 634 inside other constructs. See the HTML DTD, 9.1, "HTML DTD" for 635 full details. 637 NOTE - The SGML declaration for HTML specifies SHORTTAG 638 YES, which means that there are other valid syntaxes for 639 tags, such as NET tags, `<EM/.../'; empty start tags, 640 `<>'; and empty end-tags, `</>'. Until support for these 641 idioms is widely deployed, their use is strongly 642 discouraged. 644 3.2.3. Names 646 A name consists of a letter followed by letters, digits, 647 periods, or hyphens. The length of a name is limited to 72 648 characters by the `NAMELEN' parameter in the SGML delcaration 649 for HTML, 9.5, "SGML Declaration for HTML". Element and 650 attribute names are not case sensitive, but entity names are. 651 For example, `<BLOCKQUOTE>', `<BlockQuote>', and `<blockquote>' 652 are equivalent, whereas `&' is different from `&'. 654 In a start-tag, the element name must immediately follow the tag 655 open delimiter `<'. 657 3.2.4. Attributes 659 In a start-tag, white space and attributes are allowed between 660 the element name and the closing delimiter. An attribute 661 specification typically consists of an attribute name, an equal 662 sign, and a value, though some attribute specifications may be 663 just a name token. White space is allowed around the equal sign. 665 The value of the attribute may be either: 667 * A string literal, delimited by single quotes or double 668 quotes and not containing any occurrences of the delimiting 669 character. 671 NOTE - Some historical implementations consider any 672 occurrence of the `>' character to signal the end of 673 a tag. For compatibility with such implementations, 674 when `>' appears in an attribute value, it should be 675 represented with a numeric character reference. For 676 example, `<IMG SRC="eq1.jpg" alt="a>b">' should be 677 written `<IMG SRC="eq1.jpg" alt="a>b">' or `<IMG 678 SRC="eq1.jpg" alt="a>b">'. 680 * A name token (a sequence of letters, digits, periods, or 681 hyphens). Name tokens are not case sensitive. 683 NOTE - Some historical implementations allow any 684 character except space or `>' in a name token. 686 In this example, <img> is the element name, src is the attribute 687 name, and `http://host/dir/file.gif' is the attribute value: 689 <img src='http://host/dir/file.gif'> 691 A useful technique for computing an attribute value literal for 692 a given string is to replace each quote and white space 693 character by an entity reference or numeric character reference 694 as follows: 696 ENTITY NUMERIC 697 CHARACTER REFERENCE CHAR REF CHARACTER DESCRIPTION 698 --------- ---------- ----------- --------------------- 699 HT Tab 700 LF Line Feed 701 CR Carriage Return 702 SP Space 703 " " " Quotation mark 704 & & & Ampersand 706 For example: 708 <IMG SRC="image.jpg" alt="First "real" example"> 710 The `NAMELEN' parameter in the SGML declaration (9.5, "SGML 711 Declaration for HTML") limits the length of an attribute value 712 to 1024 characters. 714 Attributes such as ISMAP and COMPACT may be written using a 715 minimized syntax (see 7.9.1.2 ``Omitted Attribute Name'' in 716 [SGML]). The markup: 718 <UL COMPACT="compact"> 720 can be written using a minimized syntax: 722 <UL COMPACT> 724 NOTE - Some historical implementations only understand 725 the minimized syntax. 727 3.2.5. Comments 729 To include comments in an HTML document, use a comment 730 declaration. A comment declaration consists of `<!' followed by 731 zero or more comments followed by `>'. Each comment starts with 732 `--' and includes all text up to and including the next 733 occurrence of `--'. In a comment declaration, white space is 734 allowed after each comment, but not before the first comment. 735 The entire comment declaration is ignored. 737 NOTE - Some historical HTML implementations incorrectly 738 consider any `>' character to be the termination of a 739 comment. 741 For example: 743 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 744 <HEAD> 745 <TITLE>HTML Comment Example 746 747 748 749 750 751

753 3.3. HTML Public Text Identifiers 755 To identify information as an HTML document conforming to this 756 specification, each document should start with one of the 757 following document type declarations. 759 761 This document type declaration refers to the HTML DTD in 9.1, 762 "HTML DTD". 764 NOTE - If the body of a `text/html' message entity does 765 not begin with a document type declaration, an HTML user 766 agent should infer the above document type declaration. 768 770 This document type declaration also refers to the HTML DTD which 771 appears in 9.1, "HTML DTD". 773 775 This document type declaration refers to the level 1 HTML DTD in 776 9.3, "Level 1 HTML DTD". Form elements must not occur in level 1 777 documents. 779 780 782 These two document type declarations refer to the HTML DTD in 783 9.2, "Strict HTML DTD" and 9.4, "Strict Level 1 HTML DTD". They 784 refer to the more structurally rigid definition of HTML. 786 HTML user agents may support other document types. In 787 particular, they may support other formal public identifiers, or 788 other document types altogether. They may support an internal 789 declaration subset with supplemental entity, element, and other 790 markup declarations. 792 3.4. Example HTML Document 794 795 796 797 798 Structural Example 799 800

First Header

801

This is a paragraph in the example HTML file. Keep in mind 802 that the title does not appear in the document text, but that 803 the header (defined by H1) does.

804
    805
  1. First item in an ordered list. 806
  2. Second item in an ordered list. 807
      808
    • Note that lists can be nested; 809
    • Whitespace may be used to assist in reading the 810 HTML source. 811
    812
  3. Third item in an ordered list. 813
814

This is an additional paragraph. Technically, end tags are 815 not required for paragraphs, although they are allowed. You can 816 include character highlighting in a paragraph. This sentence 817 of the paragraph is emphasized. Note that the </P> 818 end tag has been omitted. 819

820 Warning: 821 Be sure to read these bold instructions. 822 824 4. HTML as an Internet Media Type 826 An HTML user agent allows users to interact with resources which 827 have HTML representations. At a minimum, it must allow users to 828 examine and navigate the content of HTML level 1 documents. HTML 829 user agents should be able to preserve all formatting 830 distinctions represented in an HTML document, and be able to 831 simultaneously present resources referred to by IMG elements 832 (they may ignore some formatting distinctions or IMG resources 833 at the request of the user). Level 2 HTML user agents should 834 support form entry and submission. 836 4.1. text/html media type 838 This specification defines the Internet Media Type[IMEDIA] 839 (formerly referred to as the Content Type[MIME]) called 840 `text/html'. The following is to be registered with [IANA]. 842 Media Type name 843 text 845 Media subtype name 846 html 848 Required parameters 849 none 851 Optional parameters 852 level, charset 854 Encoding considerations 855 any encoding is allowed 857 Security considerations 858 see 10, "Security Considerations" 860 The optional parameters are defined as follows: 862 Level 863 The level parameter specifies the feature set used in 864 the document. The level is an integer number, implying 865 that any features of same or lower level may be present 866 in the document. Level 1 is all features defined in this 867 specification except those that require the

868 element. Level 2 includes form processing. Level 2 is 869 the default. 871 Charset 872 The charset parameter (as defined in section 7.1.1 of 873 RFC 1521[MIME]) may be given to specify the character 874 encoding scheme used to represent the HTML document as a 875 sequence of octets. The default value is outside the 876 scope of this specification; but for example, the 877 default is `US-ASCII' in the context of MIME mail, and 878 `ISO-8859-1' in the context of HTTP. 880 4.2. HTML Document Representation 882 A message entity with a content type of `text/html' represents 883 an HTML document, consisting of a single text entity. The 884 `charset' parameter (whether implicit or explicit) identifies a 885 character encoding scheme. The text entity consists of the 886 characters determined by this character encoding scheme and the 887 octets of the body of the message entity. 889 4.2.1. Undeclared Markup Error Handling 891 To facilitate experimentation and interoperability between 892 implementations of various versions of HTML, the installed base 893 of HTML user agents supports a superset of the HTML 2.0 language 894 by reducing it to HTML 2.0: markup in the form of a start-tag or 895 end-tag, whose generic identifier is not declared is mapped to 896 nothing during tokenization. Undeclared attributes are treated 897 similarly. The entire attribute specification of an unknown 898 attribute (i.e., the unknown attribute and its value, if any) 899 should be ignored. On the other hand, references to undeclared 900 entities should be treated as data characters. 902 For example: 904

foo

...

905 =>

,"foo",

,

,"..." 906 xxx

yyy 907 => "xxx ",

," yyy 908 Let α & β be finite sets. 909 => "Let α & β be finite sets." 911 Support for notifying the user of such errors is encouraged. 913 Information providers are warned that this convention is not 914 binding: unspecified behavior may result, as such markup does 915 not conform to this specification. 917 4.2.2. Conventional Representation of Newlines 919 SGML specifies that a text entity is a sequence of records, each 920 beginning with a record start character and ending with a record 921 end character (code positions 10 and 13 respectively) (section 922 7.6.1, ``Record Boundaries'' in [SGML]). 924 [MIME] specifies that a body of type `text/*' is a sequence of 925 lines, each terminated by CRLF, that is, octets 13, 10. 927 In practice, HTML documents are frequently represented and 928 transmitted using an end of line convention that depends on the 929 conventions of the source of the document; frequently, that 930 representation consists of CR only, LF only, or a CR LF 931 sequence. Hence the decoding of the octets will often result in 932 a text entity with some missing record start and record end 933 characters. 935 Since there is no ambiguity, HTML user agents are encouraged to 936 infer the missing record start and end characters. 938 An HTML user agent should treat end of line in any of its 939 variations as a word space in all contexts except preformatted 940 text. Within preformatted text, an HTML user agent should treat 941 any of the three common representations of end-of-line as 942 starting a new line. 944 5. Document Structure 946 An HTML document is a tree of elements, including a head and 947 body, headings, paragraphs, lists, etc. Form elements are 948 discussed in 8, "Forms". 950 5.1. Document Element: HTML 952 The HTML document element consists of a head and a body, much 953 like a memo or a mail message. The head contains the title and 954 optional elements. The body is a text flow consisting of 955 paragraphs, lists, and other elements. 957 5.2. Head: HEAD 959 The head of an HTML document is an unordered collection of 960 information about the document. For example: 962 963 964 Introduction to HTML 965 966 ... 968 5.2.1. Title: TITLE 970 Every HTML document must contain a element. 972 The title should identify the contents of the document in a 973 global context. A short title, such as ``Introduction'' may be 974 meaningless out of context. A title such as ``Introduction to 975 HTML Elements'' is more appropriate. 977 NOTE - The length of a title is not limited; however, 978 long titles may be truncated in some applications. To 979 minimize this possibility, titles should be fewer than 980 64 characters. 982 A user agent may display the title of a document in a history 983 list or as a label for the window displaying the document. This 984 differs from headings (5.4, "Headings: H1 ... H6"), which are 985 typically displayed within the body text flow. 987 5.2.2. Base Address: BASE 989 The optional <BASE> element specifies the base address for 990 resolving relative links from the document, overriding any 991 context otherwise known to the user agent. The required HREF 992 attribute specifies the URI for navigating the document (see 7, 993 "Hyperlinks"). The value of the HREF attribute must be an 994 absolute URI. 996 5.2.3. Keyword Index: ISINDEX 998 The <ISINDEX> element indicates that the user agent should allow 999 the user to search an index by giving keywords. See 7.5, 1000 "Queries and Indexes" for details. 1002 5.2.4. Link: LINK 1004 The <LINK> element represents a hyperlink (see 7, "Hyperlinks"). 1005 It has the same attributes as the <A> element (see 5.7.3, 1006 "Anchor: A"). 1008 The <LINK> element is typically used to indicate authorship, 1009 related indexes and glossaries, older or more recent versions, 1010 style sheets, document hierarchy etc. 1012 5.2.5. Associated Meta-information: META 1014 The <META> element is an extensible container for use in 1015 identifying specialized document meta-information. 1016 Meta-information has two main functions: 1018 * to provide a means to discover that the data set exists 1019 and how it might be obtained or accessed; and 1021 * to document the content, quality, and features of a data 1022 set, indicating its fitness for use. 1024 Each <META> element specifies a name/value pair. If multiple 1025 META elements are provided with the same name, their combined 1026 contents--concatenated as a comma-separated list--is the value 1027 associated with that name. 1029 NOTE - The <META> element should not be used where a 1030 specific element, such as <TITLE>, would be more 1031 appropriate. 1033 HTTP servers may read the content of the document <HEAD> to 1034 generate header fields corresponding to any elements defining a 1035 value for the attribute HTTP-EQUIV. 1037 NOTE - The method by which the server extracts document 1038 meta-information is unspecified and not mandatory. The 1039 <META> element only provides an extensible mechanism for 1040 identifying and embedding document meta-information -- 1041 how it may be used is up to the individual server 1042 implementation and the HTML user agent. 1044 Attributes of the META element: 1046 HTTP-EQUIV 1047 binds the element to an HTTP header field. An HTTP 1048 server may use this information to process the document. 1049 In particular, it may include a header field in the 1050 responses to requests for this document: the header name 1051 is taken from the HTTP-EQUIV attribute value, and the 1052 header value is taken from the value of the CONTENT 1053 attribute. HTTP header names are not case sensitive. 1055 NAME 1056 specifies the name of the name/value pair. If not 1057 present, HTTP-EQUIV gives the name. 1059 CONTENT 1060 specifies the value of the name/value pair. 1062 Examples 1064 If the document contains: 1066 <META HTTP-EQUIV="Expires" 1067 CONTENT="Tue, 04 Dec 1993 21:29:02 GMT"> 1068 <meta http-equiv="Keywords" CONTENT="Fred"> 1069 <META HTTP-EQUIV="Reply-to" 1070 content="fielding@ics.uci.edu (Roy Fielding)"> 1071 <Meta Http-equiv="Keywords" CONTENT="Barney"> 1073 then the server may include the following header fields: 1075 Expires: Tue, 04 Dec 1993 21:29:02 GMT 1076 Keywords: Fred, Barney 1077 Reply-to: fielding@ics.uci.edu (Roy Fielding) 1079 as part of the HTTP response to a `GET' or `HEAD' request for 1080 that document. 1082 An HTTP server must not use the <META> element to form an HTTP 1083 response header unless the HTTP-EQUIV attribute is present. 1085 An HTTP server may disregard any <META> elements that specify 1086 information controlled by the HTTP server, for example `Server', 1087 `Date', and `Last-modified'. 1089 5.2.6. Next Id: NEXTID 1091 The <NEXTID> element is included for historical reasons only. 1092 HTML document should not contain <NEXTID> elements. 1094 The <NEXTID> element gives a hint for the name to use for a new 1095 <A> element when editing an HTML document. It should be distinct 1096 from all NAME attribute values on <A> elements. For example: 1098 <NEXTID N=Z27> 1100 5.3. Body: BODY 1102 The <BODY> element contains the text flow of the document, 1103 including headings, paragraphs, lists, etc. 1105 For example: 1107 <BODY> 1108 <h1>Important Stuff</h1> 1109 <p>Explanation about important stuff... 1110 </BODY> 1112 5.4. Headings: H1 ... H6 1114 The six heading elements, <H1> through <H6>, denote section 1115 headings. Although the order and occurrence of headings is not 1116 constrained by the HTML DTD, documents should not skip levels 1117 (for example, from H1 to H3), as converting such documents to 1118 other representations is often problematic. 1120 Example of use: 1122 <H1>This is a heading</H1> 1123 Here is some text 1124 <H2>Second level heading</H2> 1125 Here is some more text. 1127 Typical renderings are: 1129 H1 1130 Bold, very-large font, centered. One or two blank lines 1131 above and below. 1133 H2 1134 Bold, large font, flush-left. One or two blank lines 1135 above and below. 1137 H3 1138 Italic, large font, slightly indented from the left 1139 margin. One or two blank lines above and below. 1141 H4 1142 Bold, normal font, indented more than H3. One blank line 1143 above and below. 1145 H5 1146 Italic, normal font, indented as H4. One blank line 1147 above. 1149 H6 1150 Bold, indented same as normal text, more than H5. One 1151 blank line above. 1153 5.5. Block Structuring Elements 1155 Block structuring elements include paragraphs, lists, and block 1156 quotes. They must not contain heading elements, but they may 1157 contain phrase markup, and in some cases, they may be nested. 1159 5.5.1. Paragraph: P 1161 The <P> element indicates a paragraph. The exact indentation, 1162 leading space, etc. of a paragraph is not specified and may be a 1163 function of other tags, style sheets, etc. 1165 Typically, paragraphs are surrounded by a vertical space of one 1166 line or half a line. The first line in a paragraph is indented 1167 in some cases. 1169 Example of use: 1171 <H1>This Heading Precedes the Paragraph</H1> 1172 <P>This is the text of the first paragraph. 1173 <P>This is the text of the second paragraph. Although you do not 1174 need to start paragraphs on new lines, maintaining this 1175 convention facilitates document maintenance.</P> 1176 <P>This is the text of a third paragraph.</P> 1178 5.5.2. Preformatted Text: PRE 1180 The <PRE> element represents a character cell block of text and 1181 is suitable for text that has been formatted for a monospaced 1182 font. 1184 The <PRE> tag may be used with the optional WIDTH attribute. The 1185 WIDTH attribute specifies the maximum number of characters for a 1186 line and allows the HTML user agent to select a suitable font 1187 and indentation. 1189 Within preformatted text: 1191 * Line breaks within the text are rendered as a move to the 1192 beginning of the next line. 1194 NOTE - References to the ``beginning of a new line'' 1195 do not imply that the renderer is forbidden from 1196 using a constant left indent for rendering 1197 preformatted text. The left indent may be 1198 constrained by the width required. 1200 * Anchor elements and phrase markup may be used. 1202 NOTE - Constraints on the processing of <PRE> 1203 content may may limit or prevent the ability of the 1204 HTML user agent to faithfully render phrase markup. 1206 * Elements that define paragraph formatting (headings, 1207 address, etc.) must not be used. 1209 NOTE - Some historical documents contain <P> tags in 1210 <PRE> elements. User agents are encouraged to treat 1211 this as a line break. A <P> tag followed by a 1212 newline character should produce only one line 1213 break, not a line break plus a blank line. 1215 * The horizontal tab character (code position 9 in the HTML 1216 document character set) must be interpreted as the smallest 1217 positive nonzero number of spaces which will leave the 1218 number of characters so far on the line as a multiple of 8. 1219 Documents should not contain tab characters, as they are not 1220 supported consistently. 1222 Example of use: 1224 <PRE> 1225 Line 1. 1226 Line 2 is to the right of line 1. <a href="abc">abc</a> 1227 Line 3 aligns with line 2. <a href="def">def</a> 1228 </PRE> 1230 5.5.2.1. Example and Listing: XMP, LISTING 1232 The <XMP> and <LISTING> elements are similar to the <PRE> 1233 element, but they have a different syntax. Their content is 1234 declared as CDATA, which means that no markup except the end-tag 1235 open delimiter-in-context is recognized (see 9.6 ``Delimiter 1236 Recognition'' of [SGML]). 1238 NOTE - In a previous draft of the HTML specification, 1239 the syntax of <XMP> and <LISTING> elements allowed 1240 closing tags to be treated as data characters, as long 1241 as the tag name was not <XMP> or <LISTING>, 1242 respectively. 1244 Since CDATA declared content has a number of unfortunate 1245 interactions with processing techniques and tends to be used and 1246 implemented inconsistently, HTML documents should not contain 1247 <XMP> nor <LISTING> elements -- the <PRE> tag is more expressive 1248 and more consistently supported. 1250 The <LISTING> element should be rendered so that at least 132 1251 characters fit on a line. The <XMP> element should be rendered 1252 so that at least 80 characters fit on a line but is otherwise 1253 identical to the <LISTING> element. 1255 NOTE - In a previous draft, HTML included a <PLAINTEXT> 1256 element that is similar to the <LISTING> element, except 1257 that there is no closing tag: all characters after the 1258 <PLAINTEXT> start-tag are data. 1260 5.5.3. Address: ADDRESS 1262 The <ADDRESS> element contains such information as address, 1263 signature and authorship, often at the beginning or end of the 1264 body of a document. 1266 Typically, the <ADDRESS> element is rendered in an italic 1267 typeface and may be indented. 1269 Example of use: 1271 <ADDRESS> 1272 Newsletter editor<BR> 1273 J.R. Brown<BR> 1274 JimquickPost News, Jimquick, CT 01234<BR> 1275 Tel (123) 456 7890 1276 </ADDRESS> 1278 5.5.4. Block Quote: BLOCKQUOTE 1280 The <BLOCKQUOTE> element contains text quoted from another 1281 source. 1283 A typical rendering might be a slight extra left and right 1284 indent, and/or italic font. The <BLOCKQUOTE> typically provides 1285 space above and below the quote. 1287 Single-font rendition may reflect the quotation style of 1288 Internet mail by putting a vertical line of graphic characters, 1289 such as the greater than symbol (>), in the left margin. 1291 Example of use: 1293 I think the poem ends 1294 <BLOCKQUOTE> 1295 <P>Soft you now, the fair Ophelia. Nymph, in thy orisons, be all 1296 my sins remembered. 1297 </BLOCKQUOTE> 1298 but I am not sure. 1300 5.6. List Elements 1302 HTML includes a number of list elements. They may be used in 1303 combination; for example, a <OL> may be nested in an <LI> 1304 element of a <UL>. 1306 The COMPACT attribute suggests that a compact rendering be used. 1308 5.6.1. Unordered List: UL, LI 1310 The <UL> represents a list of items -- typically a bulleted 1311 list. 1313 The content of a <UL> element is a sequence of <LI> elements. 1314 For example: 1316 <UL> 1317 <LI>First list item 1318 <LI>Second list item 1319 <p>second paragraph of second item 1320 <LI>Third list item 1321 </UL> 1323 5.6.2. Ordered List: OL 1325 The <OL> element represents an ordered list of items, sorted by 1326 sequence or order of importance. It is typically rendered as a 1327 numbered list. 1329 The content of a <OL> element is a sequence of <LI> elements. 1330 For example: 1332 <OL> 1333 <LI>Click the Web button to open URI window. 1334 <LI>Enter the URI number in the text field of the Open URI 1335 window. The Web document you specified is displayed. 1336 <ol> 1337 <li>substep 1 1338 <li>substep 2 1339 </ol> 1340 <LI>Click highlighted text to move from one link to another. 1341 </OL> 1343 5.6.3. Directory List: DIR 1345 The <DIR> element is similar to the <UL> element. It represents 1346 a list of short items, typically up to 20 characters each. Items 1347 in a directory list may be arranged in columns, typically 24 1348 characters wide. 1350 The content of a <DIR> element is a sequence of <LI> elements. 1351 Nested block elements are not allowed in the content of <DIR> 1352 elements. For example: 1354 <DIR> 1355 <LI>A-H<LI>I-M 1356 <LI>M-R<LI>S-Z 1357 </DIR> 1359 5.6.4. Menu List: MENU 1361 The <MENU> element is a list of items with typically one line 1362 per item. The menu list style is typically more compact than the 1363 style of an unordered list. 1365 The content of a <MENU> element is a sequence of <LI> elements. 1366 Nested block elements are not allowed in the content of <MENU> 1367 elements. For example: 1369 <MENU> 1370 <LI>First item in the list. 1371 <LI>Second item in the list. 1372 <LI>Third item in the list. 1373 </MENU> 1375 5.6.5. Definition List: DL, DT, DD 1377 A definition list is a list of terms and corresponding 1378 definitions. Definition lists are typically formatted with the 1379 term flush-left and the definition, formatted paragraph style, 1380 indented after the term. 1382 The content of a <DL> element is a sequence of <DT> elements 1383 and/or <DD> elements, usually in pairs. Multiple <DT> may be 1384 paired with a single <DD> element. Documents should not contain 1385 multiple consecutive <DD> elements. 1387 Example of use: 1389 <DL> 1390 <DT>Term<DD>This is the definition of the first term. 1391 <DT>Term<DD>This is the definition of the second term. 1392 </DL> 1394 If the DT term does not fit in the DT column (typically one 1395 third of the display area), it may be extended across the page 1396 with the DD section moved to the next line, or it may be wrapped 1397 onto successive lines of the left hand column. 1399 The optional COMPACT attribute suggests that a compact rendering 1400 be used, because the list items are small and/or the entire list 1401 is large. 1403 Unless the COMPACT attribute is present, an HTML user agent may 1404 leave white space between successive DT, DD pairs. The COMPACT 1405 attribute may also reduce the width of the left-hand (DT) 1406 column. 1408 <DL COMPACT> 1409 <DT>Term<DD>This is the first definition in compact format. 1410 <DT>Term<DD>This is the second definition in compact format. 1411 </DL> 1413 5.7. Phrase Markup 1415 Phrases may be marked up according to idiomatic usage, 1416 typographic appearance, or for use as hyperlink anchors. 1418 User agents must render highlighted phrases distinctly from 1419 plain text. Additionally, <EM> content must be rendered as 1420 distinct from <STRONG> content, and <B> content must rendered as 1421 distinct from <I> content. 1423 Phrase elements may be nested within the content of other phrase 1424 elements; however, HTML user agents may render nested phrase 1425 elements indistinctly from non-nested elements: 1427 plain <B>bold <I>italic</I></B> may be rendered 1428 the same as plain <B>bold </B><I>italic</I> 1430 5.7.1. Idiomatic Elements 1432 Phrases may be marked up to indicate certain idioms. 1434 NOTE - User agents may support the <DFN> element, not 1435 included in this specification, as it has been deployed 1436 to some extent. It is used to indicate the defining 1437 instance of a term, and it is typically rendered in 1438 italic or bold italic. 1440 5.7.1.1. Citation: CITE 1442 The <CITE> element is used to indicate the title of a book or 1443 other citation. It is typically rendered as italics. For 1444 example: 1446 He just couldn't get enough of <cite>The Grapes of Wrath</cite>. 1448 5.7.1.2. Code: CODE 1450 The <CODE> element indicates an example of code, typically 1451 rendered in a mono-spaced font. The <CODE> element is intended 1452 for short words or phrases of code; the <PRE> block structuring 1453 element (5.5.2, "Preformatted Text: PRE") is more apropriate for 1454 multiple-line listings. For example: 1456 The expression <code>x += 1</code> 1457 is short for <code>x = x + 1</code>. 1459 5.7.1.3. Emphasis: EM 1461 The <EM> element indicates an emphasized phrase, typically 1462 rendered as italics. For example: 1464 A singular subject <em>always</em> takes a singular verb. 1466 5.7.1.4. Keyboard: KBD 1468 The <KBD> element indicates text typed by a user, typically 1469 rendered in a mono-spaced font. This is commonly used in 1470 instruction manuals. For example: 1472 Enter <kbd>FIND IT</kbd> to search the database. 1474 5.7.1.5. Sample: SAMP 1476 The <SAMP> element indicates a sequence of literal characters, 1477 typically rendered in a mono-spaced font. For example: 1479 The only word containing the letters <samp>mt</samp> is dreamt. 1481 5.7.1.6. Strong Emphasis: STRONG 1483 The <STRONG> element indicates strong emphasis, typically 1484 rendered in bold. For example: 1486 <strong>STOP</strong>, or I'll say "<strong>STOP</strong>" again!. 1488 5.7.1.7. Variable: VAR 1490 The <VAR> element indicates a placeholder variable, typically 1491 rendered as italic. For example: 1493 Type <SAMP>html-check <VAR>file</VAR> | more</SAMP> 1494 to check <VAR>file</VAR> for markup errors. 1496 5.7.2. Typographic Elements 1498 Typographic elements are used to specify the format of marked 1499 text. 1501 Typical renderings for idiomatic elements may vary between user 1502 agents. If a specific rendering is necessary -- for example, 1503 when referring to a specific text attribute as in ``The italic 1504 parts are mandatory'' -- a typographic element can be used to 1505 ensure that the intended typography is used where possible. 1507 NOTE - User agents may support some typographic elements 1508 not included in this specification, as they have been 1509 deployed to some extent. The <STRIKE> element indicates 1510 horizontal line through the characters, and the <U> 1511 element indicates an underline. 1513 5.7.2.1. Bold: B 1515 The <B> element indicates bold text. Where bold typography is 1516 unavailable, an alternative representation may be used. 1518 5.7.2.2. Italic: I 1520 The <I> element indicates italic text. Where italic typography 1521 is unavailable, an alternative representation may be used. 1523 5.7.2.3. Teletype: TT 1525 The <TT> element indicates teletype (monospaced )text. Where a 1526 teletype font is unavailable, an alternative representation may 1527 be used. 1529 5.7.3. Anchor: A 1531 The <A> element indicates a hyperlink anchor (see 7, 1532 "Hyperlinks"). At least one of the NAME and HREF attributes 1533 should be present. Attributes of the <A> element: 1535 HREF 1536 gives the URI of the head anchor of a hyperlink. 1538 NAME 1539 gives the name of the anchor, and makes it available as 1540 a head of a hyperlink. 1542 TITLE 1543 suggests a title for the destination resource -- 1544 advisory only. The TITLE attribute may be used: 1546 * for display prior to accessing the destination 1547 resource, for example, as a margin note or on a 1548 small box while the mouse is over the anchor, or 1549 while the document is being loaded; 1551 * for resources that do not include a title, such as 1552 graphics, plain text and Gopher menus, for use as a 1553 window title. 1555 REL 1556 The REL attribute gives the relationship(s) described by 1557 the hyperlink. The value is a whitespace separated list 1558 of relationship names. 1560 REV 1561 same as the REL attribute, but the semantics of the 1562 relationship are in the reverse direction. A link from A 1563 to B with REL=``X'' expresses the same relationship as a 1564 link from B to A with REV=``X''. An anchor may have both 1565 REL and REV attributes. 1567 URN 1568 specifies a preferred, more persistent identifier for 1569 the head anchor of the hyperlink. The syntax and 1570 semantics of the URN attribute are not yet specified. 1572 METHODS 1573 specifies methods to be used in accessing the 1574 destination, as a whitespace-separated list of names. 1575 The set of applicable names is a function of the scheme 1576 of the URI in the HREF attribute. For similar reasons as 1577 for the TITLE attribute, it may be useful to include the 1578 information in advance in the link. For example, the 1579 HTML user agent may chose a different rendering as a 1580 function of the methods allowed; for example, something 1581 that is searchable may get a different icon. 1583 5.8. Line Break: BR 1585 The <BR> element specifies a line break between words (see 6, 1586 "Characters, Words, and Paragraphs"). For example: 1588 <P> Pease porridge hot<BR> 1589 Pease porridge cold<BR> 1590 Pease porridge in the pot<BR> 1591 Nine days old. 1593 5.9. Horizontal Rule: HR 1595 The <HR> element is a divider between sections of text; 1596 typically a full width horizontal rule or equivalent graphic. 1597 For example: 1599 <HR> 1600 <ADDRESS>February 8, 1995, CERN</ADDRESS> 1601 </BODY> 1603 5.10. Image: IMG 1605 The <IMG> element refers to an image or icon via a hyperlink 1606 (see 7.3, "Simultaneous Presentation of Image Resources"). 1608 HTML user agents may process the value of the ALT attribute as 1609 an alternative to processing the image resource indicated by the 1610 SRC attribute. 1612 NOTE - Some HTML user agents can process graphics linked 1613 via anchors, but not <IMG> graphics. If a graphic is 1614 essential, it should be referenced from an <A> element 1615 rather than an <IMG> element. If the graphic is not 1616 essential, then the <IMG> element is appropriate. 1618 Attributes of the <IMG> element: 1620 ALIGN 1621 alignment of the image with respect to the text 1622 baseline. 1624 * `TOP' specifies that the top of the image aligns 1625 with the tallest item on the line containing the 1626 image. 1628 * `MIDDLE' specifies that the center of the image 1629 aligns with the baseline of the line containing the 1630 image. 1632 * `BOTTOM' specifies that the bottom of the image 1633 aligns with the baseline of the line containing the 1634 image. 1636 ALT 1637 text to use in place of the referenced image resource, 1638 for example due to processing constraints or user 1639 preference. 1641 ISMAP 1642 indicates an image map (see 7.6, "Image Maps"). 1644 SRC 1645 specifies the URI of the image resource. 1647 NOTE - In practice, the media types of image 1648 resources are limited to a few raster graphic 1649 formats: typically `image/gif', `image/jpeg'. In 1650 particular, `text/html' resources are not 1651 intended to be used as image resources. 1653 Examples of use: 1655 <IMG SRC="triangle.xbm" ALT="Warning:"> Be sure 1656 to read these instructions. 1658 <a href="http://machine/htbin/imagemap/sample"> 1659 <IMG SRC="sample.xbm" ISMAP> 1660 </a> 1662 6. Characters, Words, and Paragraphs 1664 An HTML user agent should present the body of an HTML document 1665 as a collection of typeset paragraphs and preformatted text. 1666 Except for preformatted elements (<PRE>, <XMP>, <LISTING>, 1667 <TEXTAREA>), each block structuring element is regarded as a 1668 paragraph by taking the data characters in its content and the 1669 content of its descendant elements, concatenating them, and 1670 splitting the result into words, separated by space, tab, or 1671 record end characters (and perhaps hyphen characters). The 1672 sequence of words is typeset as a paragraph by breaking it into 1673 lines. 1675 6.1. The HTML Document Character Set 1677 The document character set specified in 9.5, "SGML Declaration 1678 for HTML" must be supported by HTML user agents. It includes the 1679 graphic characters of Latin Alphabet No. 1, or simply Latin-1. 1680 Latin-1 comprises 191 graphic characters, including the 1681 alphabets of most Western European languages. 1683 NOTE - Use the non-breaking space and soft hyphen 1684 indicator characters is discouraged because support for 1685 them is not widely deployed. 1687 NOTE - To support non-western writing systems, a larger 1688 character repertoire will be specified in a future 1689 version of HTML. The document character set will be 1690 [ISO-10646], or some subset that agrees with 1691 [ISO-10646]; in particular, all numeric character 1692 references must use code positions assigned by 1693 [ISO-10646]. 1695 In SGML applications, the use of control characters is limited 1696 in order to maximize the chance of successful interchange over 1697 heterogeneous networks and operating systems. In the HTML 1698 document character set only three control characters are 1699 allowed: Horizontal Tab, Carriage Return, and Line Feed (code 1700 positions 9, 13, and 10). 1702 The HTML DTD references the Added Latin 1 entity set, to allow 1703 mnemonic representation of selected Latin 1 characters using 1704 only the widely supported ASCII character repertoire. For 1705 example: 1707 Kurt Gödel was a famous logician and mathematician. 1709 See 9.7.2, "ISO Latin 1 Character Entity Set" for a table of the 1710 ``Added Latin 1'' entities, and 13, "The HTML Coded Character 1711 Set" for a table of the code positions of [ISO 8859-1] and the 1712 control characters in the HTML document character set. 1714 7. Hyperlinks 1716 In addition to general purpose elements such as paragraphs and 1717 lists, HTML documents can express hyperlinks. A hyperlink is a 1718 relationship between two anchors, called the head and the tail 1719 of the hyperlink[DEXTER]. An anchor is a resource such as an 1720 HTML document, or some fragment of, i.e. view on or portion of a 1721 resource. Typically, the user activates a link by indicating the 1722 tail of the link; the head of the link is presented as a result. 1724 Anchors are addressed by Uniform Resource Identifiers (URI). 1725 URIs either refer directly to an anchor in absolute form for 1726 example as in [URL], or they refer to an anchor relative to a 1727 base URI which is absolute, as in [RELURL]. 1729 Each of the following markup constructs indicates the tail 1730 anchor of a hyperlink or set of hyperlinks: 1732 * <A> elements with HREF present. 1734 * <LINK> elements. 1736 * <IMG> elements. 1738 * <INPUT> elements with the SRC attribute present. 1740 * <ISINDEX> elements. 1742 * <FORM> elements with `METHOD=GET'. 1744 7.1. Accessing Resources 1746 To access the head anchor of a hyperlink, the user agent 1747 determines its URI from the URI given in the tail anchor, using 1748 the base URI of the document containing the tail anchor if 1749 necessary. Any fragment identifier is discarded, and the result 1750 is used to access a resource, for example as in [URL]. 1752 For example, if a document identified as `http://host/x/y.html' 1753 contains: 1755 <img src="../icons/abc.gif"> 1757 then the user agent must use the URI `http://host/icons/abc.gif' 1758 to access the resource linked from the <IMG> element. 1760 7.2. Activation of Hyperlinks 1762 An HTML user agent allows the user to navigate the content of 1763 the document and request activation of <A> element hyperlinks. A 1764 request to activate a link is essentially a request to process 1765 the resource indicated by the head anchor of the link, for 1766 example to display the indicated HTML document. HTML user agents 1767 should also allow activation of <LINK> element hyperlinks. 1769 The base URI for navigating the head anchor may be different 1770 from the URI used to access it. For example, it may be replaced 1771 by a <BASE> tag in the destination document or by an HTTP 1772 redirection transaction. 1774 7.3. Simultaneous Presentation of Image Resources 1776 An HTML user agent may activate hyperlinks indicated by <IMG> 1777 and <INPUT> elements concurrently with processing the document; 1778 that is, image hyperlinks may be processed without explicit 1779 request by the user. Image resources should be embedded in the 1780 presentation at the point of the tail anchor, that is the <IMG> 1781 or <INPUT> element. 1783 <LINK> hyperlinks may also be processed without explicit user 1784 request; for example, style sheet resources may be processed 1785 before or during the processing of the document. 1787 7.4. Fragment Identifiers 1789 Any characters following a `#' character in a URI constitute a 1790 fragment identifier. As a degenerate case, a URI of the form 1791 `#fragment' refers to an anchor in the same document. 1793 The meaning of fragment identifiers depends on the media type of 1794 the resource containing the head anchor. For `text/html' 1795 resources, it refers to the <A> element with a NAME attribute 1796 whose value is the same as the fragment identifier. The matching 1797 is case sensitive. The document should have exactly one such 1798 element. The user agent should indicate the anchor element, for 1799 example by scrolling to and/or highlighting the phrase. 1801 For example, if a user agent was processing a document 1802 identified as `http://host/x/y.html' and the user indicated the 1803 following anchor: 1805 <p> See: <a href="app1.html#bananas">appendix 1</a> 1806 for more detail on bananas. 1808 then the user agent URI must access the resource 1809 `http://host/x/app1.html'. Assuming the resource is represented 1810 using the `text/html' media type, the user agent must locate the 1811 anchor named `bananas' and begin navigation there. 1813 7.5. Queries and Indexes 1815 The <ISINDEX> element represents a set of hyperlinks. The user 1816 can choose from the set by providing keywords to the user agent. 1818 The user agent computes the head URI by appending `?' and the 1819 keywords to the base URI. The keywords are escaped according to 1820 [URL] and joined by `+'. For example, if a document contains: 1822 <BASE HREF="http://host/index"> 1823 <ISINDEX> 1825 and the user provides the keywords `apple' and `berry', then the 1826 user agent must access the resource 1827 `http://host/index?apple+berry'. 1829 <FORM> elements with `METHOD=GET' also represent sets of 1830 hyperlinks. See 8.2.2, "Query Forms: METHOD=GET" for details. 1832 7.6. Image Maps 1834 If the ISMAP attribute is present on an <IMG> element, the <IMG> 1835 element must be contained in an <A> element with an HREF 1836 present. This construct represents a set of hyperlinks. The user 1837 can choose from the set by choosing a pixel of the image. The 1838 user agent computes the head URI by appending `?' and the x and 1839 y coordinates of the pixel to the URI given in the <A> element. 1840 For example, if a document contains: 1842 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 1843 <head><title>ImageMap Example 1844 1845 1846

Choose any of these icons:
1847
1849 and the user chooses the upper-leftmost pixel, the chosen 1850 hyperlink is the one with the URI 1851 `http://host/cgi-bin/imagemap?0,0'. 1853 8. Forms 1855 A form is a template for a form data set and an associated 1856 method and action URI. A form data set is a sequence of 1857 name/value pair fields. The names are specified on the NAME 1858 attributes of form input elements, and the values are given 1859 initial values by various forms of markup and edited by the 1860 user. The resulting form data set is used to access an 1861 information service as a function of the action and method. 1863 Forms elements can be mixed in with document structuring 1864 elements. For example, a

 element may contain a 
1865	    element, or a  element may contain lists which contain
1866	     elements. This gives considerable flexibility in
1867	    designing the layout of forms.

1869	    Form processing is a level 2 feature.

1871	8.1. Form Elements

1873	8.1.1. Form: FORM

1875	    The  element contains a sequence of input elements, along
1876	    with document structuring elements. The attributes are:

1878	    ACTION
1879	            specifies the action URI for the form. The action URI of
1880	            a form defaults to the base URI of the document (see 7,
1881	            "Hyperlinks").

1883	    METHOD
1884	            selects a method of accessing the action URI. The set of
1885	            applicable methods is a function of the scheme of the
1886	            action URI of the form. See 8.2.2, "Query Forms:
1887	            METHOD=GET" and 8.2.3, "Forms with Side-Effects:
1888	            METHOD=POST".

1890	    ENCTYPE
1891	            specifies the media type used to encode the name/value
1892	            pairs for transport, in case the protocol does not
1893	            itself impose a format. See 8.2.1, "The form-urlencoded
1894	            Media Type".

1896	8.1.2. Input Field: INPUT

1898	    The  element represents a field for user input. The TYPE
1899	    attribute discriminates between several variations of fields.

1901	    The  element has a number of attributes. The set of
1902	    applicable attributes depends on the value of the TYPE
1903	    attribute.

1905	8.1.2.1. Text Field: INPUT TYPE=TEXT

1907	    The default vaule of the TYPE attribute is `TEXT', indicating a
1908	    single line text entry fields. (Use the 

2128	    The content of the 
2236	    
2237	    Nickname: 
2238	    

Thank you for responding to this questionnaire. 2239

2240

2242 The initial state of the form data set is: 2244 name 2245 ``'' 2247 gender 2248 ``male'' 2250 family 2251 ``'' 2253 other 2254 ``'' 2256 nickname 2257 ``'' 2259 Note that the radio input has an initial value, while the 2260 checkbox has none. 2262 The user might edit the fields and request that the form be 2263 submitted. At that point, suppose the values are: 2265 name 2266 ``John Doe'' 2268 gender 2269 ``male'' 2271 family 2272 ``5'' 2274 city 2275 ``kent,miami'' 2277 other 2278 ``abc\ndef'' 2280 nickname 2281 ``J&D'' 2283 The user agent then conducts an HTTP POST transaction using the 2284 URI `http://www.w3.org/sample'. The message body would be 2285 (ignore the line break): 2287 name=John+Doe&gender=male&family=5&city=kent%2Cmiami& 2288 other=abc%0D%0Adef&nickname=J%26D 2290 9. HTML Public Text 2292 9.1. HTML DTD 2294 This is the Document Type Definition for the HyperText Markup 2295 Language, level 2. 2297 2309 2315 2316 ... 2317 2318 -- 2319 > 2321 2323 2333 2335 ]]> 2337 2346 2352 2358 2360 2365 2369 2382 2383 2385 2387 2389 2391 %ISOlat1; 2393 2394 2395 2396 2398 2400 2416 2418 2420 2422 2424 2427 2429 2433 2434 2436 2437 2440 2443 2447 2448 2449 2451 2452 2453 2454 2455 2456 2457 2459 2461 ]]> 2463 2465 2466 2470 2472 2474 2476 2484 Heading 2487 is preferred to 2488

Heading

2489 --> 2490 ]]> 2492 2494 2495 " 2500 > 2501 2502 2503 2504 2505 2506 2507 2508 2510 2512 2513 #AttVal(Alt)" 2519 > 2521 2522 2523 2524 2525 2527 2529 2530 2534 2536 2537 2538 2542 2544 2545 2548 2551 2554 2557 2560 2564 2565 2566 2567 2568 2569 2571 2573 2575 ]]> 2577 2579 2581 ]]> 2583 2585 2589 2590 2591 2592 2597 2598 2600 2608 2609 2613 2618 2619 2621 2622 2624 2627 ]]> 2629 2631 2632 2638 2639 2643 2644 2648 2649 2650 2651 2653 2654 2658 2662 2663 2664 2665 2667 2668 Directory" 2672 > 2673 Menu" 2677 > 2679 2680 2681 2682 2684 2685 2689 2691 2693 Heading 2696

Text ... 2697 is preferred to 2698

Heading

2699 Text ... 2700 --> 2701 ]]> 2703 2706 2708 2710 2711 2715 2717 2718 2723 2725 2727 2730 Form:" 2735 %SDASUFF; "Form End." 2736 > 2738 2739 2740 2741 2743 2746 2747 2759 2760 2761 2762 2763 2764 2765 2766 2767 2769 2770 Select #AttVal(Multiple)" 2777 > 2779 2780 2781 2782 2784 2785 2793 2794 2795 2797 2798 2806 2807 2808 2809 2811 ]]> 2813 2815 2817 ]]> 2818 2820 2822 2824 2826 2827 2830 2832 2833 " > 2838 2839 2840 2841 2842 2843 2844 2846 2847 [Document is indexed/searchable.]"> 2851 2853 2854 2857 2858 2860 2861 2864 2865 2867 2868 2873 2874 2875 2876 2878 2880 2882 ]]> 2883 2885 2886 2888 2893 2895 9.2. Strict HTML DTD 2897 This document type declaration refers to the HTML DTD with the 2898 `HTML.Recommended' entity defined as `INCLUDE' rather than 2899 IGNORE; that is, it refers to the more structurally rigid 2900 definition of HTML. 2902 2912 2919 2920 ... 2921 2922 -- 2923 > 2925 2926 2928 2929 %html; 2931 9.3. Level 1 HTML DTD 2933 This document type declaration refers to the HTML DTD with the 2934 `HTML.Forms' entity defined as `IGNORE' rather than `INCLUDE'. 2935 Documents which contain
elements do not conform to this 2936 DTD, and must use the level 2 DTD. 2938 2949 2956 2957 ... 2958 2959 -- 2960 > 2962 2963 2965 2966 %html; 2968 9.4. Strict Level 1 HTML DTD 2970 This document type declaration refers to the level 1 HTML DTD 2971 with the `HTML.Recommended' entity defined as `INCLUDE' rather 2972 than IGNORE; that is, it refers to the more structurally rigid 2973 definition of HTML. 2975 2986 2993 2994 ... 2995 2996 -- 2997 > 2999 3000 3002 3003 %html-1; 3005 9.5. SGML Declaration for HTML 3007 This is the SGML Declaration for HyperText Markup Language. 3009 3089 3097 9.6. Sample SGML Open Entity Catalog for HTML 3099 The SGML standard describes an ``entity manager'' as the portion 3100 or component of an SGML system that maps SGML entities into the 3101 actual storage model (e.g., the file system). The standard 3102 itself does not define a particular mapping methodology or 3103 notation. 3105 To assist the interoperability among various SGML tools and 3106 systems, the SGML Open consortium has passed a technical 3107 resolution that defines a format for an application- independent 3108 entity catalog that maps external identifiers and/or entity 3109 names to file names. 3111 Each entry in the catalog associates a storage object identifier 3112 (such as a file name) with information about the external entity 3113 that appears in the SGML document. In addition to entries that 3114 associate public identifiers, a catalog entry can associate an 3115 entity name with a storage object identifier. For example, the 3116 following are possible catalog entries: 3118 -- catalog: SGML Open style entity catalog for HTML -- 3119 -- $Id: catalog,v 1.2 1994/11/30 23:45:18 connolly Exp $ -- 3121 -- Ways to refer to Level 2: most general to most specific -- 3122 PUBLIC "-//IETF//DTD HTML//EN" html.dtd 3123 PUBLIC "-//IETF//DTD HTML 2.0//EN" html.dtd 3124 PUBLIC "-//IETF//DTD HTML Level 2//EN" html.dtd 3125 PUBLIC "-//IETF//DTD HTML 2.0 Level 2//EN" html.dtd 3127 -- Ways to refer to Level 1: most general to most specific -- 3128 PUBLIC "-//IETF//DTD HTML Level 1//EN" html-1.dtd 3129 PUBLIC "-//IETF//DTD HTML 2.0 Level 1//EN" html-1.dtd 3131 -- Ways to refer to Level 0: most general to most specific -- 3132 PUBLIC "-//IETF//DTD HTML Level 0//EN" html-0.dtd 3133 PUBLIC "-//IETF//DTD HTML 2.0 Level 0//EN" html-0.dtd 3135 -- Ways to refer to Strict Level 2: most general to most specif\ 3136 c -- 3137 PUBLIC "-//IETF//DTD HTML Strict//EN" html-s.dtd 3138 PUBLIC "-//IETF//DTD HTML 2.0 Strict//EN" html-s.dtd 3139 PUBLIC "-//IETF//DTD HTML Strict Level 2//EN" html-s.dtd 3140 PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 2//EN" html-s.dtd 3142 -- Ways to refer to Strict Level 1: most general to most specif\ 3143 c -- 3144 PUBLIC "-//IETF//DTD HTML Strict Level 1//EN" html-1s.dtd 3145 PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 1//EN" html-1s.dtd 3147 -- Ways to refer to Strict Level 0: most general to most specif\ 3148 c -- 3149 PUBLIC "-//IETF//DTD HTML Strict Level 0//EN" html-0s.dtd 3150 PUBLIC "-//IETF//DTD HTML 2.0 Strict Level 0//EN" html-0s.dtd 3152 -- ISO latin 1 entity set for HTML -- 3153 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" ISOlat1\ 3154 sgml 3156 9.7. Character Entity Sets 3158 The HTML DTD defines the following entities. They represent 3159 particular graphic characters which have special meanings in 3160 places in the markup, or may not be part of the character set 3161 available to the writer. 3163 9.7.1. Numeric and Special Graphic Entity Set 3165 The following table lists each of the characters included from 3166 the Numeric and Special Graphic entity set, along with its name, 3167 syntax for use, and description. This list is derived from `ISO 3168 Standard 8879:1986//ENTITIES Numeric and Special Graphic//EN'. 3170 However, HTML does not include for the entire entity set -- only 3171 the entities listed below are included. 3173 GLYPH NAME SYNTAX DESCRIPTION 3174 < lt < Less than sign 3175 > gt > Greater than sign 3176 & amp & Ampersand 3177 " quot " Double quote sign 3179 9.7.2. ISO Latin 1 Character Entity Set 3181 The following public text lists each of the characters specified 3182 in the Added Latin 1 entity set, along with its name, syntax for 3183 use, and description. This list is derived from ISO Standard 3184 8879:1986//ENTITIES Added Latin 1//EN. HTML includes the entire 3185 entity set. 3187 3192 3197 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3255 3256 3257 3258 3259 3260 3261 3263 10. Security Considerations 3265 Anchors, embedded images, and all other elements which contain 3266 URIs as parameters may cause the URI to be dereferenced in 3267 response to user input. In this case, the security 3268 considerations of [URL] apply. 3270 The widely deployed methods for submitting forms requests -- 3271 HTTP and SMTP -- provide little assurance of confidentiality. 3272 Information providers who request sensitive information via 3273 forms -- especially by way of the `PASSWORD' type input field 3274 (see 8.1.2, "Input Field: INPUT") -- should be aware and make 3275 their users aware of the lack of confidentiality. 3277 11. References 3279 [URI] 3280 T. Berners-Lee. ``Universal Resource Identifiers in WWW: 3281 A Unifying Syntax for the Expression of Names and 3282 Addresses of Objects on the Network as used in the 3283 World- Wide Web.'' RFC 1630, CERN, June 1994. 3284 3286 [URL] 3287 T. Berners-Lee, L. Masinter, and M. McCahill. ``Uniform 3288 Resource Locators (URL).'' RFC 1738, CERN, Xerox PARC, 3289 University of Minnesota, October 1994. 3290 3292 [HTTP] 3293 T. Berners-Lee, R. T. Fielding, and H. Frystyk Nielsen. 3294 ``Hypertext Transfer Protocol - HTTP/1.0.'' Work in 3295 Progress, MIT, UC Irvine, CERN, March 1995. 3296 3298 [MIME] 3299 N. Borenstein and N. Freed. ``MIME (Multipurpose 3300 Internet Mail Extensions) Part One: Mechanisms for 3301 Specifying and Describing the Format of Internet Message 3302 Bodies.'' RFC 1521, Bellcore, Innosoft, September 1993. 3303 3305 [RELURL] 3306 R. T. Fielding. ``Relative Uniform Resource Locators.'' 3307 Work in Progress, UC Irvine, March 1995. 3308 3310 [GOLD90] 3311 C. F. Goldfarb. ``The SGML Handbook.'' Y. Rubinsky, Ed., 3312 Oxford University Press, 1990. 3314 [DEXTER] 3315 Frank Halasz and Mayer Schwartz, ``The Dexter Hypertext 3316 Reference Model'', ``Communications of the ACM'', pp. 3317 30-39, vol. 37 no. 2, Feb 1994, 3319 [IMEDIA] 3320 J. Postel. ``Media Type Registration Procedure.'', 3321 USC/ISI, March 1994. 3322 3324 [IANA] 3325 J. Reynolds and J. Postel. ``Assigned Numbers.'' STD 2, 3326 RFC 1700, USC/ISI, October 1994. 3327 3329 [SQ91] 3330 SoftQuad. ``The SGML Primer.'' 3rd ed., SoftQuad Inc., 3331 1991. 3333 [ISO-646] 3334 ISO/IEC 646:1991 Information technology -- ISO 7-bit 3335 coded character set for information interchange 3336 3338 [ISO-10646] 3339 ISO/IEC 10646-1:1993 Information technology -- Universal 3340 Multiple-Octet Coded Character Set (UCS) -- Part 1: 3341 Architecture and Basic Multilingual Plane 3342 3344 [ISO-8859-1] 3345 ISO 8859. International Standard -- Information 3346 Processing -- 8-bit Single-Byte Coded Graphic Character 3347 Sets -- Part 1: Latin Alphabet No. 1, ISO 8859-1:1987. 3348 3350 [SGML] 3351 ISO 8879. Information Processing -- Text and Office 3352 Systems - Standard Generalized Markup Language (SGML), 3353 1986. 3355 12. Acknowledgments 3357 The HTML document type was designed by Tim Berners-Lee at CERN 3358 as part of the 1990 World Wide Web project. In 1992, Dan 3359 Connolly wrote the HTML Document Type Definition (DTD) and a 3360 brief HTML specification. 3362 Since 1993, a wide variety of Internet participants have 3363 contributed to the evolution of HTML, which has included the 3364 addition of in-line images introduced by the NCSA Mosaic 3365 software for WWW. Dave Raggett played an important role in 3366 deriving the FORMS material from the HTML+ specification. 3368 Dan Connolly and Karen Olson Muldrow rewrote the HTML 3369 Specification in 1994. The document was then edited by the HTML 3370 working group as a whole, with updates being made by Eric 3371 Schieler, Mike Knezovich, and Eric W. Sink at Spyglass, Inc. 3372 Finally, Roy Fielding restructured the entire draft into its 3373 current form. 3375 Special thanks to the many active participants in the HTML 3376 working group, too numerous to list individually, without whom 3377 there would be no standards process and no standard. That this 3378 document approaches its objective of carefully converging a 3379 description of current practice and formalization of HTML's 3380 relationship to SGML is a tribute to their effort. 3382 12.1. Authors' Addresses 3384 Tim Berners-Lee 3386 Director, W3 Consortium 3387 MIT Laboratory for Computer Science 3388 545 Technology Square 3389 Cambridge, MA 02139, U.S.A. 3390 Tel: +1 (617) 253 9670 3391 Fax: +1 (617) 258 8682 3392 Email: timbl@w3.org 3394 Daniel W. Connolly 3396 Research Technical Staff, W3 Consortium 3397 MIT Laboratory for Computer Science 3398 545 Technology Square 3399 Cambridge, MA 02139, U.S.A. 3400 Fax: +1 (617) 258 8682 3401 Email: connolly@w3.org 3402 URI: http://www.w3.org/hypertext/WWW/People/Connolly/ 3404 13. The HTML Coded Character Set 3406 This list details the code positions and characters of the HTML 3407 document character set, specified in 9.5, "SGML Declaration for 3408 HTML". This coded character set is based on [ISO-8859-1]. 3410 REFERENCE DESCRIPTION 3411 -------------- ----------- 3412 � -  Unused 3413 Horizontal tab 3414 Line feed 3415 - Unused 3416 Carriage Return 3417  -  Unused 3418 Space 3419 ! Exclamation mark 3420 " Quotation mark 3421 # Number sign 3422 $ Dollar sign 3423 % Percent sign 3424 & Ampersand 3425 ' Apostrophe 3426 ( Left parenthesis 3427 ) Right parenthesis 3428 * Asterisk 3429 + Plus sign 3430 , Comma 3431 - Hyphen 3432 . Period (fullstop) 3433 / Solidus (slash) 3434 0 - 9 Digits 0-9 3435 : Colon 3436 ; Semi-colon 3437 < Less than 3438 = Equals sign 3439 > Greater than 3440 ? Question mark 3441 @ Commercial at 3442 A - Z Letters A-Z 3443 [ Left square bracket 3444 \ Reverse solidus (backslash) 3445 ] Right square bracket 3446 ^ Caret 3447 _ Horizontal bar (underscore) 3448 ` Acute accent 3449 a - z Letters a-z 3450 { Left curly brace 3451 | Vertical bar 3452 } Right curly brace 3453 ~ Tilde 3454  - Ÿ Unused 3455   Non-breaking Space 3456 ¡ Inverted exclamation 3457 ¢ Cent sign 3458 £ Pound sterling 3459 ¤ General currency sign 3460 ¥ Yen sign 3461 ¦ Broken vertical bar 3462 § Section sign 3463 ¨ Umlaut (dieresis) 3464 © Copyright 3465 ª Feminine ordinal 3466 « Left angle quote, guillemotleft 3467 ¬ Not sign 3468 ­ Soft hyphen 3469 ® Registered trademark 3470 ¯ Macron accent 3471 ° Degree sign 3472 ± Plus or minus 3473 ² Superscript two 3474 ³ Superscript three 3475 ´ Acute accent 3476 µ Micro sign 3477 ¶ Paragraph sign 3478 · Middle dot 3479 ¸ Cedilla 3480 ¹ Superscript one 3481 º Masculine ordinal 3482 » Right angle quote, guillemotright 3483 ¼ Fraction one-fourth 3484 ½ Fraction one-half 3485 ¾ Fraction three-fourths 3486 ¿ Inverted question mark 3487 À Capital A, grave accent 3488 Á Capital A, acute accent 3489 Â Capital A, circumflex accent 3490 Ã Capital A, tilde 3491 Ä Capital A, dieresis or umlaut mark 3492 Å Capital A, ring 3493 Æ Capital AE dipthong (ligature) 3494 Ç Capital C, cedilla 3495 È Capital E, grave accent 3496 É Capital E, acute accent 3497 Ê Capital E, circumflex accent 3498 Ë Capital E, dieresis or umlaut mark 3499 Ì Capital I, grave accent 3500 Í Capital I, acute accent 3501 Î Capital I, circumflex accent 3502 Ï Capital I, dieresis or umlaut mark 3503 Ð Capital Eth, Icelandic 3504 Ñ Capital N, tilde 3505 Ò Capital O, grave accent 3506 Ó Capital O, acute accent 3507 Ô Capital O, circumflex accent 3508 Õ Capital O, tilde 3509 Ö Capital O, dieresis or umlaut mark 3510 × Multiply sign 3511 Ø Capital O, slash 3512 Ù Capital U, grave accent 3513 Ú Capital U, acute accent 3514 Û Capital U, circumflex accent 3515 Ü Capital U, dieresis or umlaut mark 3516 Ý Capital Y, acute accent 3517 Þ Capital THORN, Icelandic 3518 ß Small sharp s, German (sz ligature) 3519 à Small a, grave accent 3520 á Small a, acute accent 3521 â Small a, circumflex accent 3522 ã Small a, tilde 3523 ä Small a, dieresis or umlaut mark 3524 å Small a, ring 3525 æ Small ae dipthong (ligature) 3526 ç Small c, cedilla 3527 è Small e, grave accent 3528 é Small e, acute accent 3529 ê Small e, circumflex accent 3530 ë Small e, dieresis or umlaut mark 3531 ì Small i, grave accent 3532 í Small i, acute accent 3533 î Small i, circumflex accent 3534 ï Small i, dieresis or umlaut mark 3535 ð Small eth, Icelandic 3536 ñ Small n, tilde 3537 ò Small o, grave accent 3538 ó Small o, acute accent 3539 ô Small o, circumflex accent 3540 õ Small o, tilde 3541 ö Small o, dieresis or umlaut mark 3542 ÷ Division sign 3543 ø Small o, slash 3544 ù Small u, grave accent 3545 ú Small u, acute accent 3546 û Small u, circumflex accent 3547 ü Small u, dieresis or umlaut mark 3548 ý Small y, acute accent 3549 þ Small thorn, Icelandic 3550 ÿ Small y, dieresis or umlaut mark 3552 14. Proposed Entities 3554 The HTML DTD references the ``Added Latin 1'' entity set, which 3555 only supplies named entities for a subset of the non-ASCII 3556 characters in [ISO-8859-1], namely the accented characters. The 3557 following entities should be supported so that all ISO 8859-1 3558 characters may only be referenced symbolically. The names for 3559 these entities are taken from the appendixes of [SGML]. 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656