idnits 2.17.1 draft-newton-shafranovich-distributed-blacklists-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 574. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 551. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 558. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 564. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 580), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 38. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There are 11 instances of too long lines in the document, the longest one being 25 characters in excess of 72. ** There is 1 instance of lines with control characters in the document. == There is 2 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 135 has weird spacing: '...ould be noted...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 9, 2005) is 7015 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' ** Obsolete normative reference: RFC 3513 (ref. '6') (Obsoleted by RFC 4291) == Outdated reference: A later version (-08) exists of draft-irtf-asrg-dnsbl-01 Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Newton 3 Internet-Draft VeriSign, Inc. 4 Expires: August 10, 2005 Y. Shafranovich 5 SolidMatrix Technologies, Inc. 6 February 9, 2005 8 Distributed Black/White Lists 9 draft-newton-shafranovich-distributed-blacklists-00 11 Status of this Memo 13 By submitting this Internet-Draft, I certify that any applicable 14 patent or other IPR claims of which I am aware have been disclosed, 15 and any of which I become aware will be disclosed, in accordance with 16 RFC 3668. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as 21 Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on August 10, 2005. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). All Rights Reserved. 40 Abstract 42 Many traditional, centrally-managed blacklists and whitelists 43 describe Internet end-points by characteristics such as connectivity 44 type or network function, and these characteristics are often used to 45 infer behavior from which authorization is derived. However, it is 46 often the case that connectivity type or network function are not 47 related to good or bad behavior. This document describes a means of 48 creating blacklists and whitelists representative of Internet 49 end-points based on observed behavior by many participants in a 50 distributed monitoring network. The authors hope that distributed 51 lists will mitigate some of the problems associated with existing 52 centrally managed lists. While the concept, architecture, and data 53 model are general enough to be applied to any type of network 54 service, the authors of this document are specifically addressing the 55 problem of spam in blogs. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Document Terminology . . . . . . . . . . . . . . . . . . . . . 5 61 3. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 6 62 4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8 63 5. Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9 64 6. Formal XML Syntax . . . . . . . . . . . . . . . . . . . . . . 12 65 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 66 7.1 Normative References . . . . . . . . . . . . . . . . . . . . 15 67 7.2 Informative References . . . . . . . . . . . . . . . . . . . 15 68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 15 69 Intellectual Property and Copyright Statements . . . . . . . . 17 71 1. Introduction 73 For years, blacklists have been used as an authorization policy 74 mechanism for public network services, mostly email. These 75 centrally-managed blacklists lists can be categorized into two 76 groups: 77 o lists containing Internet end-points based on certain 78 characteristics, such as how they are connected to the Internet 79 (e.g. dial-up or residential broadband) or a type of network 80 function they may serve (e.g. proxy or relay) 81 o lists containing Internet end-points that have been observed to 82 exhibit certain behavior (e.g. sending unsolicited email). 84 Additionally, recently a smaller but evergrowing number of whitelists 85 have been developed and deployed to assist network administrators in 86 determining authorization rights for public network services. 87 Centrally managed whitelists usually contain positive information 88 about Internet end-points that is being vouched for by the party that 89 administers the list. In some cases this information is collected by 90 the administrating party independently of the end points listed, but 91 in many cases the party administering the list charges a fee for 92 inclusion, thus essentially operating an accreditation service. 94 Some blacklists and whitelists are do not necessarally list bad or 95 good information, but rather seek to provide reputation information 96 about Internet end points. Unfortunatly, as the case with 97 blacklists, reputation services tend to suffer from many of the same 98 problems stemming from accountability issues. 100 The purpose of such lists is to erradicate certain undesirable 101 side-effects of a highly successful network, usually unsolicited 102 email. However, these lists have a great tendacy to inhibit 103 universal network access, in many cases outweighing their perceived 104 benefits. For example: 105 o While it is true that many senders of unsolicited email (spam) use 106 dial-up network connections, it is not reasonable to assume that 107 all dial-up network connections are used to send spam: the two are 108 unrelated. 109 o Constrained by the need for human verification, many lists 110 specializing in observed unwanted behavior tend to mark whole 111 networks as bad versus specific end-points, though there is no 112 evidence that every end-point in a network has exhibited 113 undesirable behavior. 114 o There is often little guidance available on the criteria used to 115 create these lists and seldom useful information on how to correct 116 errors in these lists. 117 o In the case of whitelists, a fee chargable for accreditation and 118 inclusion into a whitelist may inhibit certain Internet users from 119 obtaining network access. For example, individuals and 120 non-commercial users, especially ones from poorer countries may 121 not have the resources to pay an admission fee for inclusion into 122 a whitelist. If multiple whitelists become popular, the financial 123 burden will greatly descrease accessibility of Internet services 124 to those users. 125 For these reasons and more, these centrally-managed lists have failed 126 to make an impact on the spam problem and to be universally adopted. 127 This is all too evident given that spam continues to be a growing 128 problem not only in email, but slowly spreading to other network 129 services as well. 131 This document describes an architecture and data model for 132 Distributed Black/White Lists (DxL). The intent is to leverage an 133 peer to peer web-of-trust as opposed to a centrally managed list, 134 hopefully providing greater accuracy and understood accountability. 135 It should be noted, however, that the concept, architecture, and 136 data-model for DxLs could be applied to other network services. 137 However, the authors chose to target the design of DxLs toward a 138 relatively new type of web application called blogging. 140 2. Document Terminology 142 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 143 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 144 document are to be interpreted as described in RFC 2119 [1]. 146 3. Motivations 148 Many of the problems arising in the use of blacklists and whitelists 149 is the fact that they are centrally managed by a third-party which 150 may not be accountable to or trusted by a network administrator who 151 wishes to use such lists. List users may also wish to express their 152 opinion on specific list entries or entire lists, but due to the 153 central nature of these lists that is not currently possible. 154 Additionally, many Internet users and network operators already have 155 existing relationships in place with others which can be utilized to 156 pass along blacklist and whitelist information, instead of 157 establishing new ones with the parties administering central lists. 159 In the real world, existing relationships and social networks are 160 often used to pass along reputation information, and the digital 161 world should in theory be no different. Thefore, in order to step 162 around the problem of trusting the party administering the central 163 list, we choose to distribute DxL information in a peer to peer 164 fashion. This gives users the ability to use their existing 165 relationships to establish a web of trust for the purposes of 166 authorizing access to public network services (which in this case are 167 ability to leave comments and trackbacks on blog posts, and passing 168 referer information). We also chose to allow lists to be combined 169 and passed on as new lists, thus allowing trust information to be 170 propogates via a social network. 172 Aditionally, in order to enforce accountability and transparency, we 173 chose to require URLs pointing to the original list from which the 174 information originates, URLs pointing to a removal page, and 175 creation/update data for all entries. While these may not be checked 176 for validity in all cases, nevertheless their presence indicates to 177 the list creators and users that these are mattersnot to be ignored. 178 Additionally, we believe that users will take the validity of this 179 information into account when trusting or not trusting specific 180 lists. 182 In order to allow flexibility for this system, we choose to add 183 weights to the list entries indicating the "black" or "white" value. 184 Many existing lists provides a binary "yes/no" decision in regards to 185 their entries which may not be flexible enough for all cases. 186 Additionally, a weight mechanism allows users to adjust weight 187 ratings on lists coming from other users based on their trust level. 189 Though this document may be the first formulization of a distributed 190 black/white list using XML, the concept of a peer-to-peer style 191 distribution of these lists has been seen in 192 193 and 195 196 . 198 4. Architecture 200 Unlike DNS-based blacklists [9] (known as DNSBLs) which operate over 201 DNS, a DxL is an XML document and is retrieved over the Internet by 202 using a protocol such as HTTP. This is modelled after RSS, which is 203 commonly found in the "blogosphere". Once retreived, a DxL is cached 204 for a period of time and checked for updates upon expiration. Note, 205 that this is not the only possible implementation or exchange 206 mechanism available for this data. 208 A DxL can be composed of entries derived from a private list based on 209 direct observation and other DxLs, known as component DxLs. Hence, a 210 DxL propogates data from many sources. 212 5. Data Model 214 This section describes the data model of a DxL. The formal syntax 215 for a DxL is described in Section 6. 217 Each DxL has the following attributes: 218 o DxL URI - a URI pointing to the DxL 219 o description - a short, textual description describing the DxL 220 o description URI - a URI pointing to a longer description of the 221 DxL 222 o expiration date and time 223 o creation date and time 224 o last updated date and time 225 Each of these attributes is optional. 227 Each item in a DxL describes an observed instance with the following 228 trace data: 229 o either an IPv4 or IPv6 address 230 o a protocol identifier: either a domain name or a URI (a domain 231 name is RECOMMENDED given that URIs are free to manufacture) 232 o protocol content: domain names, URIs, or regular expressions 233 (regex) describing parts of content (domain names are RECOMMENDED) 234 - regular expressions must be typed with one of the following 235 identifiers: 236 * Perl - denotes a Perl style regular expression 237 * POSIX-enhanced - denotes a POSIX enhanced style regular 238 expression 239 * POSIX-basic - denotes a POSIX basic style regular expression 240 o proxy - a simple note indicating it was possible to detect that 241 the end-point served as a protocol-level proxy 242 o user agent 243 o application: text in the form of XXX.YYY where XXX is an 244 application name and YYY is a sub-application name - describes the 245 application or network service type specific to the trace data. 246 These values are defined as: 247 * web.referrer - web-based referrals 248 * blog.comments 249 * blog.trackbacks 251 The following are two examples of trace data from observed incidents: 252 1. A comment is left on a blog. The blog software records the 253 comment as coming from 192.0.2.1. The "URL" field was submitted 254 with the URI "http://example.org/foo" and the "comment" field was 255 submitted with the text "Buy all your foos at foo.example.org for 256 the lowest prices". The trace data would consist of the 257 following: 258 * an IPv4 address of 192.0.2.1 259 * a protocol URI of http://example.org/foo 260 * a content domain of foo.example.org or example.org 261 2. An entry is left in a referrer log on a web server. The entry 262 shows the request coming from 192.0.10.1 with a referral URI of 263 http://example.com/bar. The trace data would consist of the 264 following: 265 * an IPv4 address of 192.0.10.1 266 * a protocol URI of http://example.com/bar or a protocol domain 267 name of example.com 269 Each item in a DxL as the following meta-data associated with it: 270 o URI of DxL source - taken directly from the Dxl URI of the DxL 271 document where the item originated 272 o description 273 o description URI 274 o removal URI - points to a location where instructions may be found 275 for removing an item from the source DxL 276 o method - describes what process was used to determine inclusion of 277 the item if it originated from a component DxL. These methods 278 are: 279 * intersection - the item was found in a component DxL and by 280 direct observation of this DxL publisher 281 * union - the item was found in a component DxL and was not 282 directly observed by the publisher of this DxL 283 * direct - the item was found only by direct obersvation 284 o hops - a non-negative integer indicating the number of times the 285 item has been derived from a component DxL. Zero indicates the 286 item is in the DxL of the publisher who made the observation. 287 o weight - a value between -1.0 and 1.0 indicating a value judgement 288 on the item. Values less than 0 are considered negative (i.e. a 289 blacklisted item) and values greater than 0 are considered 290 positive (i.e. a whitelisted item). Zero is considered neutral. 291 If value judgements are simply to be boolean (either positive or 292 negative), the values 1.0 and -1.0 SHOULD be used. 293 o expiration date and time 294 o created date and time 295 o last updated date and time 297 The following is an example of a DxL document: 299 300 306 307 308 192.0.2.1 309 online-poker.com 310 311 www.online-poker.com 312 online-poker.com 313 http://www.online-poker.com/bogus 314 315 false 316 SpamBuddy/1.0 317 318 http://hxr.us/grumpops/dxl.xml 319 a persistent spammer 320 http://hxr.us/grumpops/dxl?item=abc123 321 http://hxr.us/grumpops/dxl-removal?item=abc123 322 intersection 323 0 324 1.0 325 2005-01-30T12:00:00Z 326 2005-01-20T12:00:00Z 327 2005-01-25T12:00:00Z 328 329 330 331 ff:ee::00 332 http://vegas-hotels.com/ 333 334 www.vegas-hotels.com 335 visit.vegas-hotels.com 336 http://www.vegas-hotels.com/offer 337 http://www.vegas-hotels.com/redeem 338 339 true 340 SpamBuddy/1.0 341 342 http://shaftek.org/dxl.xml 343 a very persistent spammer 344 http://shaftek.org/dxl?item=def456 345 http://shaftek.org/dxl-removal?item=def456 346 intersection 347 1 348 0.7 349 2005-01-31T12:00:00Z 350 2005-01-22T12:00:00Z 351 2005-01-25T12:00:00Z 352 353 355 6. Formal XML Syntax 357 The following describes the formal XML syntax for DxL instances using 358 XML Schema (see [2], [3], [5], and [4]). Implementors should note 359 that this is only a formalization of the syntax for creation of 360 interoperable processes and that an XML Schema capable parser is not 361 required. 363 This formal definition uses the XML Schema 'anyType' is places where 364 formal syntax definitions already exist: 365 o the syntax for domains is defined in [8] 366 o the syntax for IPv4 addresses is defined in [7] 367 o the syntax for IPv6 addresses is defined in [6] 368 In these cases, the formal syntax defers to the appropriate original 369 defintion. 371 372 377 378 379 A schema for describing 380 distributed black/white lists (DxL) 381 382 384 385 386 387 389 391 392 393 394 395 396 397 398 399 401 402 403 404 405 406 407 408 409 as defined by RFC 0791 410 411 412 413 414 as defined by RFC 3513 415 416 417 418 419 420 421 as defined by RFC 1035 422 423 424 425 426 427 428 429 430 431 as defined by RFC 1035 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 452 453 454 455 457 458 459 460 461 462 463 464 465 466 467 470 471 473 474 475 476 477 478 479 481 482 483 484 485 487 489 7. References 491 7.1 Normative References 493 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 494 Levels", RFC 2119, BCP 14, March 1997. 496 [2] World Wide Web Consortium, "Extensible Markup Language (XML) 497 1.0", W3C XML, February 1998, 498 . 500 [3] World Wide Web Consortium, "Namespaces in XML", W3C XML 501 Namespaces, January 1999, 502 . 504 [4] World Wide Web Consortium, "XML Schema Part 2: Datatypes", W3C 505 XML Schema, October 2000, 506 . 508 [5] World Wide Web Consortium, "XML Schema Part 1: Structures", W3C 509 XML Schema, October 2000, 510 . 512 [6] Hinden, R. and S. Deering, "Internet Protocol Version 6 (IPv6) 513 Addressing Architecture", RFC 3513, April 2003. 515 [7] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. 517 [8] Mockapetris, P., "Domain names - implementation and 518 specification", STD 13, RFC 1035, November 1987. 520 7.2 Informative References 522 [9] Levine, J., "DNS Based Blacklists and Whitelists for E-Mail", 523 draft-irtf-asrg-dnsbl-01.txt (work in progress), November 2004. 525 Authors' Addresses 527 Andrew L. Newton 528 VeriSign, Inc. 529 21345 Ridgetop Circle 530 Sterling, VA 20166 531 USA 533 Phone: +1 703 948 3382 534 EMail: anewton@verisignlabs.com; andy@hxr.us 535 URI: http://www.verisignlabs.com/ 536 Yakov Shafranovich 537 SolidMatrix Technologies, Inc. 539 EMail: YakovS@solidmatrix.com; ietf@shaftek.org 540 URI: http://www.shaftek.org/ 542 Intellectual Property Statement 544 The IETF takes no position regarding the validity or scope of any 545 Intellectual Property Rights or other rights that might be claimed to 546 pertain to the implementation or use of the technology described in 547 this document or the extent to which any license under such rights 548 might or might not be available; nor does it represent that it has 549 made any independent effort to identify any such rights. Information 550 on the procedures with respect to rights in RFC documents can be 551 found in BCP 78 and BCP 79. 553 Copies of IPR disclosures made to the IETF Secretariat and any 554 assurances of licenses to be made available, or the result of an 555 attempt made to obtain a general license or permission for the use of 556 such proprietary rights by implementers or users of this 557 specification can be obtained from the IETF on-line IPR repository at 558 http://www.ietf.org/ipr. 560 The IETF invites any interested party to bring to its attention any 561 copyrights, patents or patent applications, or other proprietary 562 rights that may cover technology that may be required to implement 563 this standard. Please address the information to the IETF at 564 ietf-ipr@ietf.org. 566 Disclaimer of Validity 568 This document and the information contained herein are provided on an 569 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 570 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 571 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 572 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 573 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 574 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 576 Copyright Statement 578 Copyright (C) The Internet Society (2005). This document is subject 579 to the rights, licenses and restrictions contained in BCP 78, and 580 except as set forth therein, the authors retain all their rights. 582 Acknowledgment 584 Funding for the RFC Editor function is currently provided by the 585 Internet Society.