idnits 2.17.1 draft-eastlake-xxx-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 7 instances of lines with non-RFC2606-compliant FQDNs in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 2003) is 7620 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1035' on line 320 -- Looks like a reference, but probably isn't: '2822' on line 471 -- Looks like a reference, but probably isn't: '1980' on line 537 == Missing Reference: 'RFC 2810-2813' is mentioned on line 570, but not defined -- Looks like a reference, but probably isn't: '2374' on line 694 == Unused Reference: 'RFC 791' is defined on line 847, but no explicit reference was found in the text == Unused Reference: 'RFC 1035' is defined on line 856, but no explicit reference was found in the text == Unused Reference: 'RFC 1591' is defined on line 859, but no explicit reference was found in the text == Unused Reference: 'RFC 2606' is defined on line 883, but no explicit reference was found in the text == Unused Reference: 'RFC 2810' is defined on line 893, but no explicit reference was found in the text == Unused Reference: 'RFC 2811' is defined on line 896, but no explicit reference was found in the text == Unused Reference: 'RFC 2812' is defined on line 899, but no explicit reference was found in the text == Unused Reference: 'RFC 2813' is defined on line 902, but no explicit reference was found in the text == Unused Reference: 'RFC 2822' is defined on line 908, but no explicit reference was found in the text == Unused Reference: 'RFC 2980' is defined on line 911, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'BT' -- Possible downref: Non-RFC (?) normative reference: ref. 'CDA' -- Possible downref: Non-RFC (?) normative reference: ref. 'COPAREPORT' -- Possible downref: Non-RFC (?) normative reference: ref. 'GAO' ** Downref: Normative reference to an Unknown state RFC: RFC 97 (ref. 'GTLD-MOU') -- Possible downref: Non-RFC (?) normative reference: ref. 'HOUSEREPORT' -- Possible downref: Non-RFC (?) normative reference: ref. 'ICM-REGISTRY' -- Possible downref: Non-RFC (?) normative reference: ref. 'LIEBERMAN' -- Possible downref: Non-RFC (?) normative reference: ref. 'PICS' ** Obsolete normative reference: RFC 977 (Obsoleted by RFC 3977) ** Downref: Normative reference to an Informational RFC: RFC 1591 ** Downref: Normative reference to an Informational RFC: RFC 1715 ** Downref: Normative reference to an Informational RFC: RFC 1945 ** Obsolete normative reference: RFC 2373 (Obsoleted by RFC 3513) ** Obsolete normative reference: RFC 2374 (Obsoleted by RFC 3587) ** Obsolete normative reference: RFC 2396 (Obsoleted by RFC 3986) ** Obsolete normative reference: RFC 2460 (Obsoleted by RFC 8200) ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) ** Obsolete normative reference: RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) ** Downref: Normative reference to an Informational RFC: RFC 2663 ** Downref: Normative reference to an Informational RFC: RFC 2810 ** Downref: Normative reference to an Informational RFC: RFC 2811 ** Downref: Normative reference to an Informational RFC: RFC 2812 ** Downref: Normative reference to an Informational RFC: RFC 2813 ** Obsolete normative reference: RFC 2821 (Obsoleted by RFC 5321) ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322) ** Downref: Normative reference to an Informational RFC: RFC 2980 ** Downref: Normative reference to an Informational RFC: RFC 3514 -- Possible downref: Non-RFC (?) normative reference: ref. 'WARSHAVSKY' Summary: 25 errors (**), 0 flaws (~~), 15 warnings (==), 15 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Donald Eastlake 3rd 2 Motorola Laboratories 3 Declan McCullagh 4 Wired News 5 Expires: December 2003 June 2003 7 .sex Considered Dangerous 8 ---- ---------- --------- 9 11 Status of This Document 13 Distribution of this draft is unlimited. Comments should be sent to 14 the authors. 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC 2026. Internet-Drafts are 18 working documents of the Internet Engineering Task Force (IETF), its 19 areas, and its working groups. Note that other groups may also 20 distribute working documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet- Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Copyright Notice 35 Copyright (C) The Internet Society (2003). All Rights Reserved. 37 Abstract 39 Periodically there are proposals to mandate the use of a special top 40 level name or an IP address bit to flag "adult" or "unsafe" material 41 or the like. This document explains why this is an ill considered 42 idea from the legal, philosophical, and, particularly, the technical 43 points of view. 45 Table of Contents 47 Status of This Document....................................1 48 Copyright Notice...........................................1 49 Abstract...................................................1 51 Table of Contents..........................................2 53 1. Introduction............................................3 54 2. Background..............................................3 55 3. Legal and Philosophical Problems........................4 56 4. Technical Difficulties..................................7 57 4.1 Content Filtering Using Names..........................7 58 4.1.1 Linguistic Problems..................................8 59 4.1.2 Explosion of Top Level Domain Names (TLDs)...........9 60 4.1.3 You Can't Control What Names Point At You!...........9 61 4.1.4 Particular Protocol Difficulties....................10 62 4.1.4.1 Electronic Mail (SMTP)............................11 63 4.1.4.2 Web Access (HTTP).................................12 64 4.1.4.3 News (NNTP).......................................12 65 4.1.4.4 Internet Relay Chat (IRC).........................13 66 4.2 Content Filtering Using IP Addressing.................13 67 4.2.1 Hierarchical Routing................................14 68 4.2.2 IP Version 4 Addresses..............................15 69 4.2.3 IP Version 6 Addresses..............................15 70 4.3 PICS Labels...........................................16 71 5. Security Considerations................................17 72 6. Conclusions............................................17 74 References................................................19 76 Authors Addresses.........................................22 78 Full Copyright Statement..................................23 79 Expiration and File Name..................................23 81 1. Introduction 83 Periodically there are proposals to mandate the use of a special top 84 level name or an IP address bit to flag "adult" or "unsafe" material 85 or the like. This document explains why this is an ill considered 86 idea from the legal, philosophical, and the technical points of view. 88 2. Background 90 The concept of a .sex, .xxx, .adult, or similar top-level domain in 91 which it would be mandatory to locate salacious or similar material 92 is periodically suggested by some politicians and commentators. Other 93 proposals have included a domain reserved exclusively for material 94 viewed as appropriate for minors, or using IP address bits or ranges 95 to segregate content. 97 In an October 1998 report accompanying the Child Online Protection 98 Act, the House Commerce committee said "there are no technical 99 barriers to creating an adult domain, and it would be very easy to 100 block all websites within an adult domain." The report also said that 101 the committee was wary of regulating the computer industry and that 102 any decision by the U.S. government "will have international 103 consequences." [HOUSEREPORT] 105 British Telecom has backed adult top-level domains, saying in a 1998 106 letter to the U .S. Department of Commerce that it "strongly 107 supported" that plan. The reason: "Sexually explicit services could 108 then be legally required to operate with domain names in this gTLD 109 [that] would make it much simpler and easier to control access to 110 such sites..." [BT] One of ICANN's progenitors, the GTLD-MOU 111 committee, suggested a "red-light-zone" top-level domain in a 112 September 1997 request for comment. [GTLD-MOU] 114 Some adult industry executives have endorsed the concept. In 1998, 115 Seth Warshavsky, president of the Internet Entertainment Group, told 116 the U.S. Senate Commerce committee that he would like to see a .adult 117 domain. "We're suggesting the creation of a new top-level domain 118 called '.adult' where all sexually explicit material on the Net would 119 reside," Warshavsky said in an interview at the time. [WARSHAVSKY] 120 More recently, other entrepreneurs in the industry have said that 121 they do not necessarily object to the creation of an adult domain as 122 long as they may continue to use .com. 124 Conservative groups in the U.S. say they are not eager for such a 125 domain, and prefer criminal laws directed at publishers and 126 distributors of sexually-explicit material. The National Law Center 127 for Children and Families in Fairfax, Virginia, said in February 2001 128 that it did not favor any such proposal. For different reasons, the 129 American Civil Liberties Union and civil liberties groups also oppose 130 it. 132 Sen. Joseph Lieberman, the U.S. Democratic Party's vice presidential 133 nominee, endorsed the idea at a June 2000 meeting of the federal 134 Commission on Child Online Protection. Lieberman said in a prepared 135 statement that "we would ask the arbiters of the Internet to simply 136 abide by the same standard as the proprietor of an X-rated movie 137 theater or the owner of a convenience store who sells sexually- 138 explicit magazines." [LIEBERMAN] 140 In the 1998 law creating this commission, the U.S. Congress required 141 the members to investigate "the establishment of a domain name for 142 posting of any material that is harmful to minors." The commission 143 devoted a section of its October 2000 report to that topic. It 144 concluded that both a .xxx and a .kids domain are technically 145 possible, but would require action by ICANN. The report said that an 146 adult domain might be only "moderately effective" and raises privacy 147 and free speech concerns. [COPAREPORT] 149 The commission also explored the creation of a so-called red zone or 150 green zone for content by means of allocation of a new set of IP 151 addresses under IPv6. Any material not in one of those two zones 152 would be viewed as in a gray zone and not necessarily appropriate or 153 inappropriate for minors. Comments from commissioners were largely 154 negative: "Effectiveness would require substantial effort to attach 155 content to specific IP numbers. This approach could potentially 156 reduce flexibility and impede optimal network performance. It would 157 not be effective at blocking access to chat, newsgroups, or instant 158 messaging." 160 In October 2000, ICANN rejected a .xxx domain during its initial 161 round of approving additional top-level domains. The reasons are not 162 entirely clear, but former ICANN Chairwoman Esther Dyson said that 163 the adult industry did not entirely agree that such a domain would be 164 appropriate. One .xxx hopeful, ICM Registry of Ontario, Canada, in 165 December 2000 asked ICANN to reconsider its decision. [ICM-REGISTRY] 167 In 2002, the US Congress mandated the creation of a kids.us domain 168 for "child safe" material. This was after being convinced that, for 169 reasons some of which are described in the following section, trying 170 to legislate standards for the whole world with a .kids domain was 171 inappropriate. 173 3. Legal and Philosophical Problems 175 When it comes to sexually-explicit material, every person, court, and 176 government has a different view of what's acceptable and what is not. 178 Attitudes change over time, and what is viewed as appropriate in one 179 town or year may spark protests in the next. When faced with the 180 slippery nature of what depictions of sexual activity should be 181 illegal or not, one U.S. Supreme Court justice blithely defined 182 obscenity as: "I know it when I see it." 184 In the U.S.A., obscenity is defined as explicit sexual material that, 185 among other things, violates "contemporary community standards" -- in 186 other words, even at the national level, there is no agreed-upon rule 187 governing what is illegal and what is not. Making matters more knotty 188 is that there are over 200 United Nations country codes, and in most 189 of them political subdivisions can impose their own restrictions. 190 Even for legal nude modeling, age restrictions differ. They're 191 commonly 18 years of age, but only 17 years of age in one 192 Scandinavian country. A photographer there conducting what's viewed 193 as a legal and proper photo shoot would be branded a felon and child 194 pornographer in the U.S.A. In yet other countries and groups, the 195 entire concept of nude photography or even any photography of a 196 person in any form may be religiously unacceptable. 198 Saudi Arabia, Iran, Northern Nigeria, and China are not likely to 199 have the same liberal views as, say, the Netherlands or Denmark. 200 Saudi Arabia and China, like some other nations, extensively filter 201 their Internet connection and have created a government agencies to 202 protect their society from web sites that officials view as immoral. 203 Their views on what should be included in a .sex domain would hardly 204 be identical to those in liberal western nations. 206 Those wildly different opinions on sexual material make it 207 inconceivable that a global consensus can ever be reached on what is 208 appropriate or inappropriate for a .sex or .adult top-level domain. 209 Moreover, the existence of such a domain would create an irresistible 210 temptation on the part of conservative legislators to require 211 controversial publishers to move to that domain and punish those who 212 do not. 214 Some conservative politicians already have complained that ICANN did 215 not approve .xxx in its October 2000 meeting. During a February 2001 216 hearing in the U.S. House of Representatives, legislators warned that 217 they "want to explore ICANN's rationale for not approving two 218 particular top level domain names -- .kids and .xxx -- as a means to 219 protect kids from the awful smut which is so widespread on the 220 Internet." 222 It seems plausible that only a few adult publishers, and not those 223 who have invested resources in building a brand around a .com site, 224 would voluntarily abandon their current domain name. Instead, they'd 225 likely add a .xxx variant and keep their original address. The 226 existence of .xxx could propel legislators in the U.S. and other 227 countries to require them to publish exclusively from an adult 228 domain, a move that would invite ongoing political interference with 229 Internet governance and raise concerns about forced speech and self- 230 labeling. 232 In fact, the ultimate arbiter of generic top-level domain names -- at 233 least currently -- is not ICANN, but the U.S. government. The U.S. 234 Congress' General Accounting Office in July 2000 reported that the 235 Commerce Department continues to be responsible for domain names 236 allowed by the authoritative root. [GAO] The GAO's auditors concluded 237 it was unclear whether the Commerce Department has the "requisite 238 authority" under current law to transfer that responsibility to 239 ICANN. 241 The American Civil Liberties Union -- and other members of the 242 international Global Internet Liberty Campaign -- caution that 243 publishers speaking frankly about birth control, AIDS prevention, gay 244 and lesbian sex, the social problem of prison rape, etc., could be 245 coerced into moving to an adult domain. Once there, they would be 246 stigmatized and easily blocked by schools, libraries, companies, and 247 other groups using filtering software. Publishers of such information 248 who do not view themselves as pornographers and retain their existing 249 addresses could be targeted for prosecution. 251 The existence of an adult top-level domain would likely open the door 252 for related efforts, either policy or legislative. There are many 253 different axes through which offensive material can be defined: Sex, 254 violence, hate, heresy, subversion, blasphemy, illegal drugs, 255 profanity, political correctness, glorification of crime, incitement 256 to break the law, and so on. Such suggestions invite the ongoing 257 lobbying of ICANN, the U.S. government, or other policy-making bodies 258 by special-interest groups that are not concerned with the technical 259 feasibility or practicality of their advice. 261 An adult top-level domain could have negative legal repercussions by 262 endangering free expression. U.S. Supreme Court Justice Sandra Day 263 O'Connor has suggested that the presence of "adult zones" on the 264 Internet would make a future Communications Decency Act (CDA) more 265 likely to be viewed as constitutional. In her partial dissent to the 266 Supreme Court's rejection of the CDA in 1997 [CDA], O'Connor said 267 that "the prospects for the eventual zoning of the Internet appear 268 promising." (The Supreme Court ruled the CDA violated free speech 269 rights by making it a crime to distribute "indecent" or "patently 270 offensive" material online.) 272 Privacy could be harmed by such a proposal. It would become easier 273 for repressive governments and other institutions to track visits to 274 sites in a domain labeled as adult and record personally-identifiable 275 information about the visitor. Repressive governments would instantly 276 have more power to monitor naive users and prosecute them for their 277 activities. It's also implausible that a top-level domain would be 278 effective in controlling access to chat, email, newsgroups, instant 279 messaging, and new services as yet to be invented. 281 4. Technical Difficulties 283 Even ignoring the philosophical and legal difficulties outlined 284 above, there are substantial technical difficulties in attempting to 285 impose content classification by domain names or IP addresses. 286 Mandatory content labeling is usually advanced with the idea of using 287 a top level domain name, discussed in section 4.1, but we also 288 discuss the possibility of using IP address bits or ranges in section 289 4.2. 291 In section 4.1.4 difficulties with a few particular higher level 292 protocols are discussed. In some cases, these protocols use 293 different name spaces. It should be kept in mind that additional 294 future protocols may be devised with as yet undreamed of naming 295 characteristics. 297 We also discuss PICS labels [PICS] as an alternative technology in 298 section 4.3. 300 Only a limited technical background is assumed so some basic 301 information is included below. In some cases descriptions are 302 simplified and details omitted. 304 This technical discussion minimizes the definitional problems. 305 However, it is still necessary for evaluating some technical 306 considerations to have some estimate of the amount of categorization 307 that would be necessary for a realistic global censorship system. 308 There is no hope of agreement on this point. For our purposes, we 309 will arbitrarily assume that the world's population consists of 310 approximately 90,000 overlapping communities, each of which would 311 have a different categorization of interest. Further, we arbitrarily 312 assume that some unspecified but clever encoding scheme enables a 313 proper global categorization of all information by a 300 bit label. 314 Some would say a 300 bit label is too large, others that it is too 315 small. Regardless, we will use it for some technical evaluations. 317 4.1 Content Filtering Using Names 319 The most prominent user visible part of Internet naming and 320 addressing is the domain name system [RFC 1034, 1035]. Domain Names 321 are dotted sequences of labels such as aol.com, world.std.com, 322 www.rosslynchapel.org.uk, or ftp.gnu.lcs.mit.edu [RFC 1035, 1591, 323 2606]. Domain Names form an important part of most World Wide Web 324 addresses or URLs [RFC 2396], commonly appearing after "//". Security 325 for the domain name system is being standardized [RFC 2535] but has 326 not been deployed to any significant extent. 328 Domain names designate nodes in a global distributed hierarchically 329 delegated database. A wide variety of information can be stored at 330 these nodes including IP addresses of machines on the network (see 331 section 4.2 below), mail delivery information, and other types of 332 information. Thus, the data stored at foo.example.com could be the 333 numeric information for sending data to a particular machine, which 334 would be used if you tried to browse , the 335 name of a computer (say mailhost.example.com) to handle mail 336 addressed to anyone "@foo.example.com", and/or other information. 338 There are also other naming systems in use, such as news group names 339 and Internet Relay Chat (IRC) channel names. 341 The usual labeling idea presented is to reserve a top level name, 342 such as .sex or .xxx for "adult" material and/or .kids for "safe" 343 material or the like. The technical and linguistic problems with 344 this are described in the subsections below. 346 4.1.1 Linguistic Problems 348 When using name labeling, the first problem is from whose language do 349 you take the names to impose? Words and acronyms can have very 350 different meanings in different languages and the probability of 351 confusion is multiplied when phonetic collisions are considered. 353 As an example of possible problems, note that for several years the 354 government of Turkmenistan suspended new registrations in ".tm", 355 which had previously been a source of revenue, because some of the 356 registered second level domain names may have been problematic. In 357 particular, their web home page at said: 359 Statement from the .TM NIC 361 The response to the .TM registry has been overwhelming. Thousands 362 of names have been registered from all over the world. Some of 363 the names registered, however, may be legally obscene in 364 Turkmenistan, and as a result the .TM NIC registry is reviewing 365 its naming policy for future registrations. The .TM NIC has 366 suspended registrations until a new policy can be implemented. We 367 hope to be live again shortly. 369 4.1.2 Explosion of Top Level Domain Names (TLDs) 371 An important aspect of the design of the Domain Name System (DNS) is 372 the hierarchical delegation of data maintenance. The DNS really only 373 works, and has been able to scale the over five orders of magnitude 374 it has grown since its initial deployment, due to this delegation. 376 The first problem is that one would expect most computers or web 377 sites to have a mix of material only some of which should be 378 specially classified. Using special top level domain names (TLDs) 379 multiplies the number of DNS zones the site has to worry about. For 380 example, assume the site has somehow already sorted its material into 381 "kids", "normal", and "adult" piles. Without special TLD labels, it 382 can store them under kids.example.net, adult.example.net, and 383 other.example.net, for instance. This would require only the 384 maintenance of the single example.net zone of database entries. With 385 special TLD labeling, at least example.net (for normal stuff), 386 example.net.sex, and example.net.kids would need to be maintained 387 which are three separate zones in different parts of the DNS tree 388 under three separate delegations. 390 As the number of categories expands, the number of category 391 combinations explodes, and this quickly becomes completely 392 unmanageable. If 300 bits worth of labeling is required, the system 393 could, in theory, need 2**300 name categories, an impossibility. No 394 individual site would need to use all categories and the category 395 domain names would not all have to be top level names. But it would 396 still be an unmanageable nightmare. 398 4.1.3 You Can't Control What Names Point At You! 400 Providers of data on the Internet cannot stop anyone from creating 401 names pointing to their computer's IP address with misleading domain 402 names. 404 The DNS system works as a database. It associates certain data, 405 called resource records, or RRs, with domain names. In particular, 406 it can associate IP address resource records with domain names. For 407 example, when you browse a URL, most commonly a domain name within 408 that URL is looked up in the DNS. The resulting address is then used 409 to address the packets sent from your web browser or other software 410 to the server or peer. 412 Remember what we said in Section 4.1.1 about hierarchical delegation? 413 Control is delegated and anyone controlling a DNS zone of data, say 414 example.com, can insert data at that name or any deeper name (except 415 to the extent they delegate some of the deeper namespace to yet 416 others). So the controller of example.com can insert data so that 417 purity.example.com has associated with it the same computer address 418 which is associated with www.obscene.example.sex. This directs any 419 reference to purity.example.com to use the associated IP address 420 which is the same as the www.obscene.example.sex web site. The 421 manager of that hypothetical web site, who controls the 422 obscene.example.xxx zone, has no control over the example.com DNS 423 zone. They are technically incapable of causing it to conform to any 424 ".sex" labeling law. In the alternative, someone could create a name 425 conforming to an adult labeling requirement, such as foo.stuff.sex, 426 that actually pointed to someone else's entirely unobjectionable 427 site, perhaps for the purpose of polluting the labeling. See diagram 428 below. Each "zone" could be hosted on a different set of physical 429 computers. 431 +-----------------------------------------+ 432 | . (root) zone | 433 | .com .org .net .us .uk .sex ... | 434 +---+---------------------------+---------+ 435 | | 436 V V 437 +--------------------+ +--------------------+ 438 | .com zone | | .sex zone | 439 | example.com ... | | example.sex ... | 440 +---------------+----+ +---------------+----+ 441 | | 442 V V 443 +---------------------+ +----------------------+ 444 | example.com zone | | example.sex zone | 445 | | | | 446 | purity.example.com -+--+ +---+- obscene.example.sex | 447 | virtue.example.com | | | | porn.example.sex | 448 | | | | | | | | 449 +------+--------------+ | | +--------+-------------+ 450 | +------+------+ | 451 | +-------------+ | | 452 V V V V 453 +-----------------+ +------------------+ 454 | Virtuous Data | | Salacious Data | 455 +-----------------+ +------------------+ 457 4.1.4 Particular Protocol Difficulties 459 There are additional considerations related to particular protocols. 460 We consider only a few here. The first two, electronic mail and the 461 World Wide Web, use domain name addressing. The second two, net news 462 and IRC, use different name spaces and illustrate further technical 463 problems with name based labeling. 465 4.1.4.1 Electronic Mail (SMTP) 467 Standard Internet tools provide no way to stop usders from putting 468 arbitrary domain names inside email headers. 470 The standard Internet electronic mail protocol separates "envelope" 471 information from content [RFC 2821, 2822]. The envelope information 472 indicates where a message claims to have originated and to whom it 473 should be delivered. The content has fields starting with labels 474 like "From:" and "To:" but these content fields actually have no 475 effect and can be arbitrarily forged using simple, normally available 476 software, such a telnetting to the SMTP port on a mail server. 477 Content fields are not compared with envelope fields. To require them 478 to be the same would be like requiring that postal letters deposited 479 in a mail box list that mail box as their return address and only 480 allowing residence or business return addresses on mail picked up by 481 the post office from that residence or business. 483 While different mail clients display envelope information and headers 484 from the content of email differently, generally the principle 485 content fields are given prominence. Thus, while not exactly the 486 same as content labeling, it should be noted that it is trivial to 487 send mail to anyone with arbitrary domain names in the email 488 addresses appearing in the From and To headers, etc. 490 It is also easy set up a host to forward mail to an email address or 491 mailing list. Mail sent with normal mail tools to this forwarder 492 will automatically have content headers reflecting the forwarder's 493 name but the forwarder will change the envelope information and cause 494 the mail to be actually sent to the forwarding destination mail 495 address. 497 For example, (with names disguised) there is a social mailing list 498 innocuous@foo.example.org and someone set up a forwarder at cat- 499 torturers@other.example. Mail sent to the forwarder is forwarded and 500 appears on the innocuous mailing list but with a "To: cat- 501 torturers@other.example" header in its body instead of the usual "To: 502 innocuous@foo.example.org" content header. Mail reader software then 503 displays the cat-torturers header. Similar things can be done using 504 the "bcc" or "blind courtesy copy" feature of Internet mail. 506 There is work proceeding on securing email; however, such efforts at 507 present only allow you to verify whether or not a particular entity 508 was the actual author of the mail. When providing authentication, 509 they add yet a third type of "From" address to the envelope and 510 content "From" addresses. But they do not relate to controlling or 511 authenticating domain names in the content of the mail. 513 4.1.4.2 Web Access (HTTP) 515 With modern web servers and browsers supporting HTTP 1.1 [RFC 2616], 516 the domain name used to access the site is available. Thus web sites 517 with different domain names can be accessed even if they are on the 518 same machine at the same IP address. This is a small plus for name- 519 based labeling since different categories of information on the same 520 computer can be set up to be accessed via different domain names. But 521 for a computer with any reasonable variety of data, the explosion of 522 trying to differently name all types of data would require an 523 unmanageable number of names. 525 With earlier HTTP 1.0 [RFC 1945], when a web request was sent to a 526 server machine, the original domain name used in the URI was not 527 included. 529 On the other hand, the web has automatic forwarding. Thus, when one 530 tries to access data at a particular domain name, the server there 531 can re-direct your browser, temporarily or permanently, to a 532 different name. Or it can re-direct you to a numeric IP address so as 533 to by-pass name filtering. 535 4.1.4.3 News (NNTP) 537 Net news [RFC 977, 1980] uses hierarchical structured newsgroup names 538 that are similar in appearance to domain names except that the most 539 significant label is on the left and the least on the right, the 540 opposite of domain names. However, while the names are structured 541 hierarchically, there is no central control. Instead, news servers 542 periodically connect to other news servers that have agreed to 543 exchange messages with them and they update each other on messages 544 only in those newsgroups in which they wish to exchange messages. 546 Although hierarchical zones in the domain name system are locally 547 managed, they need to be reachable starting at the top level root 548 servers which are in turn more or less controlled by ICANN and the US 549 Department of Commerce. With no such central point or points in the 550 net news world, any pair or larger set of news servers anywhere in 551 the world can agree to exchange news messages under any news group 552 names they like, including duplicates of those used elsewhere in the 553 net, making central control or even influence virtually impossible. 554 In fact, within some parts of the news group namespace on some 555 servers, anyone can create new newsgroups with arbitrary names. 557 Even if news group names could be controlled, the contents of the 558 messages are determined by posters. While some groups are moderated, 559 most are not. "Cancel" messages can be sent out for news messages, 560 but that mechanism is subject to abuse so some servers are configured 561 to ignore cancels. In any case, the message may have been distributed 562 to a huge number of computers world wide before any cancel is sent 563 out. 565 And of course, fitting 300 bits worth of labeling into news group 566 names is just as impossible as it is to fit into domain names. 568 4.1.4.4 Internet Relay Chat (IRC) 570 Internet Relay Chat [RFC 2810-2813] is another example of a service 571 which uses a different name space. It uses a single level space of 572 "channel names" which are meaningful within a particular network of 573 IRC servers. Because it is not hierarchical, each server must know 574 about all names, which limits the size of a network of servers. 576 As with newsgroup names, the fact that IRC channel names are local 577 decisions not subject to or reachable from any global "root" makes 578 centralized political control virtually impossible. 580 4.2 Content Filtering Using IP Addressing 582 A key characteristic of the Internet Protocol (IP) on which the 583 Internet is based is that it breaks data up into "packets". These 584 packets are individually handled and routed from source to 585 destination. Each packet carries a numeric address for the 586 destination point to which the Internet will try to deliver the 587 packet. 589 (End users do not normally see these numeric addresses but instead 590 deal with "domain names" as described in section 4.1 above.) 592 The predominant numeric address system now in use is called IPv4, or 593 Internet Protocol Version 4, which provides for 32 bit addresses [RFC 594 791]. There is increasing migration to the newer IPv6 [RFC 2460], 595 which provides for 128 bit addresses [RFC 2373, 2374]. 597 Packets can be modified maliciously in transit but the most common 598 resultof this is denial of service. 600 One problem in using addressing for content filtering is that this is 601 a very coarse technique. IP addresses refer to network interfaces, 602 which usually correspond to entire computer systems which could house 603 multiple web pages, sets of files, etc., only a small part of which 604 it was desired to block or enable. Increasingly, a single IP address 605 may correspond to a NAT (Network Address Translation) box [RFC 2663] 606 which hides multiple computers behind it, although in that case these 607 computers are usually not servers. 609 However, even beyond this problem of coarse granularity, the 610 practical constraints of hierarchical routing make the allocation of 611 even a single IPv4 address bit or a significant number of IPv6 612 address bits impossible. 614 4.2.1 Hierarchical Routing 616 IP addresses are technically inappropriate for content filtering 617 because their assignment is intimately tied to network routing and 618 topology. 620 As packets of data flow through the Internet, decisions must be made 621 as to how to forward them "towards" their destination. This is done 622 by comparing the initial bits of the packet destination address to 623 entries in a "routing table" and forwarding the packets as indicated 624 by the table entry with the longest prefix match. 626 While the Internet is actually a mesh, if, for simplicity, we 627 consider it to have a central backbone at the "top", a packet is 628 typically routed as follows: 630 The local networking code looks at its routing table to determine if 631 the packet should be sent directly to another computer on the "local" 632 network, to a router to specially forward it to another nearby 633 network, or routed "up" to a "default" router to forward it to a 634 higher level service provider's network. If the packet's destination 635 is "far enough away" it will eventually get forwarded up to a router 636 on the backbone. Such a router cannot send the packet "up" since it 637 is at the top or "default free" zone and must have a complete table 638 of what other top level router to send the packet to. Currently, 639 such top level routers are very large and expensive devices. They 640 must be able to maintain tables of tens of thousands of routes. When 641 the packet gets to the top level router of the part of the network 642 within which its destination lies, it get forwarded "down" to 643 successive routers which are more and more specific and local until 644 eventually its gets to a router on the local network where its 645 destination address lies. This local router sends the packet 646 directly to the destination computer. 648 Because all of these routing decisions are made on a longest prefix 649 match basis, it can be seen that IP addresses are not general names 650 or labels but are critically and intimately associated with the 651 actual topology and routing structure of the network. If they were 652 assigned at random, routers would be required to remember so many 653 specific routes for specific addresses that it would far exceed the 654 current technical capabilities for router design. The Internet would 655 be fatally disrupted and would not work. 657 It should also be noted that there is some inefficiency in allocation 658 at each level of hierarchy [RFC 1715]. Generally allocations are of 659 a power of two addresses and as requirements grow and/or shrink, it 660 is not practical to use every address. 662 (The above simplified description ignores multi-homing and many other 663 details.) 665 4.2.2 IP Version 4 Addresses 667 There just isn't any practical way to reallocate even one bit of IPv4 668 global Internet Addresses for content filtering use. Such addresses 669 are in short supply. Such an allocation would, in effect, cut the 670 number of available addresses in half. There just aren't enough 671 addresses, even without the inefficiency of hierarchical allocation 672 [RFC 1715] and routing, to do this. Even if there were, current 673 numbers have not been allocated with this in mind so that a 674 renumbering by every organization with hosts on the Internet would be 675 required, a Herculean task costing in the billions of dollars. 677 Even if these problems were overcome, the allocation of even a single 678 bit near the top of the address bits would likely double the number 679 of routes in the default free zone. This would exceed the capacity of 680 current routers and requiring the upgrade of thousands of them to new 681 routers that do not exist yet and a further gargantuan cost. The 682 allocation of a bit near the bottom of the address bits would require 683 world-wide local reconfiguration which would be impractical to 684 require or enforce, even if the bit were available. 686 And all this is if only a single bit is allocated to content 687 labeling, let alone more than one. And we are assuming you would 688 actually need 300 bits, more than there are! 690 Basically, the idea is a non-starter. 692 4.2.3 IP Version 6 Addresses 694 IPv6 provides 128 bit address fields [RFC 2373, 2374]. Furthermore, 695 allocation of IPv6 addresses is in its infancy. Thus the allocation 696 of, say, one bit of IPv6 address for labeling is conceivable. 698 However, as discussed above (section 4.2.1), every high bit allocated 699 for labeling doubles the cost imposed on the routing system. 700 Allocating one bit would generally double the size of routing tables. 702 Allocating two bits would multiply them by four. Allocating the 300 703 bits we assume necessary for realistic world wide labeling is 704 logically impossible for IPv6, 300 being a lot larger than 128, and 705 if it were, would result in technically unachievable routing table 706 sizes. Even allocating, say, 20 bits, if that were possible, would 707 impossibly multiply table sizes by a million. 709 Allocating low bits also has problems. There are technical proposals 710 that use the bottom 64 bits in a manner incompatible with their use 711 for labels [RFC 2374]. So it would probably have to be "middle bits" 712 (actually low bits of the upper half). As with IPv4, it would be 713 impossible to enforce this world wide. If it were possible, one or 714 two bits could be allocated there, which would be clearly inadequate. 716 4.3 PICS Labels 718 PICS Labels (Platform for Internet Content Selection) is a 719 generalized system for providing "ratings" for Internet accessible 720 material. The PICS documents [PICS] should be consulted for the 721 details. In general, PICS assumes an arbitrarily large number of 722 rating services and rating systems. Each service and system is 723 identified by a URL. 725 It would be quite reasonable to have multiple PICS services that, in 726 the aggregate, provided 300 bits of label information or more. There 727 could be a PICS service for every community of interest. This sort 728 of technology is really the only reasonable way to make 729 categorizations or labelings of material available in a diverse and 730 dynamic world. 732 While such PICS label services could be used to distribute government 733 promulgated censorship categories, for example, it is not clear how 734 this is any worse than government censorship via national firewalls. 736 A PICS rating system is essentially a definition of one or more 737 dimensions and the numeric range of the values that can be assigned 738 in each dimension to a rated object. A service is a source of labels 739 where a label includes actual ratings. Ratings are either specific 740 or generic. A specific rating applies only to the material at a 741 particular URL [RFC 2396] and does not cover anything referenced from 742 it, even included image files. A generic rating applies to the URL 743 specified and to all URLs for which the stated URL is a prefix. 745 A simplified example label might look like the following: 747 (PICS-1.1 "http://movie-rating-service.example.net" 748 labels for "ftp://movies.example.sex/raunchy-movie" 749 ratings (sex 6 violence 1 language 8 drugs 2 Satanism 0)) 751 Machine readable rating system descriptions include the range of 752 values and set of dimensions provded. Additional information, such as 753 beginning and ending time of validity, can be incorporated into 754 labels. 756 Labels can currently be made available in three ways: (1) embedded in 757 HTML, (2) provided with data in an HTTP response, and (3) separately 758 from a third party. If content is required to have labels embedded in 759 it or transmitted by the source when data is returned, the first two 760 ways listed above, it raises the problems of categorization 761 granularity and forced speech. However, if used in the third way 762 whereby a separate party determines and provides labels for content 763 and users are free to select whatever such third party or parties 764 they wish to consult, it can support a myriad of categories, editors, 765 and evaluators to exist in parallel. 767 Digital signatures are available to secure PICS Labels [PICS]. 769 5. Security Considerations 771 Any labeling or categorization scheme must assume that there will be 772 deliberate attempts to cause data to be incorrectly labeled and 773 incorrectly categorized. This might be due to some perceived 774 advantage of particlar labeling or merely to disrupt the system. 775 After all, if sources would always accurately and conveniently label 776 information they send, security would be much easier [RFC 3514]. Such 777 enforceability considerations are discussed in conjunction with the 778 various mechanisms mentioned in this document. 780 6. Conclusions 782 The concept that a single top level domain name, such as .sex, or a 783 single IP address bit, could be allocated and become the mandatory 784 home of "adult" or "offensive" material world wide is legal and 785 technical nonsense. 787 Global agreement on what sort of material should be in such a ghetto 788 is impossible. In the world wide context, the use of a single 789 category or small number of categories is absurd. The implementation 790 of a reasonable size label that could encompass the criterion of the 791 many communities of the world, such as 300 bits, is technically 792 impossible at the domain name or IP address level and will remain so 793 for the foreseeable future. Besides technical impossibility, such a 794 mandate would be an illegal forcing of speech in some jurisdictions 795 and for domain or other character string names faces severe 796 linguistic problems. 798 However, the concept of a plethora of independent reviewers, some of 799 which might be governmental agencies, and the ability of those 800 accessing information to select and utilize ratings assigned by such 801 reviewers, is possible. 803 References 805 [BT] - "British Telecom comments to U.S. Commerce Department", 806 February 20, 1998, 807 809 [CDA] - "Reno v. American Civil Liberties Union", 117 S.Ct. 2329, 810 June 26, 1997, 811 813 [COPAREPORT] - "Final Report of the COPA Commission to the U.S. 814 Congress", October 20, 2000, 815 817 [GAO] - "GAO Report OGC-00-33R", July 7, 2000, 818 820 [GTLD-MOU] - "GTLD-MOU Policy Oversight committee RFC 97-02", 821 September 13, 1997, 823 [HOUSEREPORT] - "U.S. House Commerce Committee report", 105th 824 Congress, October 5, 1998. 825 827 [ICM-REGISTRY] - "Request for reconsideration from ICM Registry to 828 ICANN", December 15, 2000, 829 832 [LIEBERMAN] - "Testimony of Senator Joe Lieberman before Children's 833 Online Protection Act Commission", June 8, 2000, 834 836 [PICS] - Platform for Internet Content Selection 837 PICS 1.1 Rating Services and Rating Systems -- and Their Machine 838 Readable Descriptions , 839 October 1996 840 PICS 1.1 Label Distribution -- Label Syntax and Communication 841 Protocols , October 1996 842 PICSRules 1.1 Specification , 843 December 1997 844 PICS Signed Labels (DSIG) 1.0 Specification 845 , May 1998 847 [RFC 791] - "Internet Protocol", J. Postel, September 1981. 849 [RFC 977] - "Network News Transfer Protocol: A Proposed Standard for 850 the Stream-Based Transmission of News", B. Kantor, P. Lapsky, 851 February 1986. 853 [RFC 1034] - "Domain Names - Concepts and Facilities", P. 854 Mockapetris, STD 13, November 1987. 856 [RFC 1035] - "Domain Names - Implementation and Specifications", P. 857 Mockapetris, STD 13, November 1987. 859 [RFC 1591] - "Domain Name System Structure and Delegation", J. 860 Postel, March 1994. 862 [RFC 1715] - "The H Ratio for Address Assignment Efficiency", C. 863 Huitema, November 1994. 865 [RFC 1945] - "Hypertext Transfer Protocol -- HTTP/1.0", T. Berners- 866 Lee, R. Fielding, H. Frystyk, May 1996. 868 [RFC 2373] -"IP Version 6 Addressing Architecture", R. Hinden, S. 869 Deering, July 1998. 871 [RFC 2374] - "An IPv6 Aggregatable Global Unicast Address Format", R. 872 Hinden, M. O'Dell, S. Deering, July 1998. 874 [RFC 2396] - "Uniform Resource Identifiers (URI): Generic Syntax", T. 875 Berners-Lee, R. Fielding, L. Masinter, August 1998. 877 [RFC 2460] - "Internet Protocol, Version 6 (IPv6) Specification", S. 878 Deering and R. Hinden, December 1998. 880 [RFC 2535] - "Domain Name System Security Extensions", D. Eastlake, 881 March 1999. 883 [RFC 2606] - "Reserved Top Level DNS Names", D. Eastlake, A. Panitz, 884 June 1999. 886 [RFC 2616] - "Hypertext Transfer Protocol -- HTTP/1.1", R. Fielding, 887 J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners- 888 Lee, June 1999. 890 [RFC 2663] - "IP Network Address Translator (NAT) Terminology and 891 Considerations", P. Srisuresh, M. Holdrege, August 1999. 893 [RFC 2810] - "Internet Relay Chat: Architecture", C. Kalt, April 894 2000. 896 [RFC 2811] - "Internet Relay Chat: Channel Management", C. Kalt, 897 April 2000. 899 [RFC 2812] - "Internet Relay Chat: Client Protocol", C. Kalt, April 900 2000. 902 [RFC 2813] - "Internet Relay Chat: Server Protocol", C. Kalt, April 903 2000. 905 [RFC 2821] - "Simple Mail Transfer Protocol", J. Klensin, Editor, 906 April 2001. 908 [RFC 2822] - "Internet Message Format", P. Resnick, Editor, April 909 2001. 911 [RFC 2980] - "Common NNTP Extensions", S. Barber, October 2000. 913 [RFC 3514] - "The Security Flag in the IPv4 Header", S. Bellovin, 1 914 April 2003. 916 [WARSHAVSKY] - "Congress weighs Net porn bills," CNET article, 917 February 10, 1998, 919 Authors Addresses 921 Donald E. Eastlake 3rd 922 Motorola Laboratories 923 155 Beaver Street 924 Milford, MA 01757 USA 926 Telephone: +1-508-851-8280 (w) 927 +1-508-634-2066 (h) 928 EMail: Donald.Eastlake@motorola.com 930 Declan McCullagh 932 Telephone: +1-202-986-3455 933 FAX: +1-202-986-3472 934 EMail: Declan.McCullagh@cnet.com 936 Full Copyright Statement 938 Copyright (C) The Internet Society (2003). All Rights Reserved. 940 This document and translations of it may be copied and furnished to 941 others, and derivative works that comment on or otherwise explain it 942 or assist in its implementation may be prepared, copied, published 943 and distributed, in whole or in part, without restriction of any 944 kind, provided that the above copyright notice and this paragraph are 945 included on all such copies and derivative works. However, this 946 document itself may not be modified in any way, such as by removing 947 the copyright notice or references to the Internet Society or other 948 Internet organizations, except as needed for the purpose of 949 developing Internet standards in which case the procedures for 950 copyrights defined in the Internet Standards process must be 951 followed, or as required to translate it into languages other than 952 English. 954 The limited permissions granted above are perpetual and will not be 955 revoked by the Internet Society or its successors or assigns. 957 This document and the information contained herein is provided on an 958 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 959 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 960 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 961 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 962 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 964 Expiration and File Name 966 This draft expires December 2003. 968 Its file name is draft-eastlake-xxx-06.txt.