idnits 2.17.1
draft-newton-shafranovich-distributed-blacklists-00.txt:
Checking boilerplate required by RFC 5378 and the IETF Trust (see
https://trustee.ietf.org/license-info):
----------------------------------------------------------------------------
** It looks like you're using RFC 3978 boilerplate. You should update this
to the boilerplate described in the IETF Trust License Policy document
(see https://trustee.ietf.org/license-info), which is required now.
-- Found old boilerplate from RFC 3667, Section 5.1 on line 16.
-- Found old boilerplate from RFC 3978, Section 5.5 on line 574.
-- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 551.
-- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 558.
-- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 564.
** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line
580), which is fine, but *also* found old RFC 2026, Section 10.4C,
paragraph 1 text on line 38.
** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure
Acknowledgement -- however, there's a paragraph with a matching
beginning. Boilerplate error?
** This document has an original RFC 3978 Section 5.4 Copyright Line,
instead of the newer IETF Trust Copyright according to RFC 4748.
** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
of the newer disclaimer which includes the IETF Trust according to RFC
4748.
** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate
instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission
of drafts without verbatim RFC 3978 boilerplate is not accepted.
The following non-3978 patterns matched text found in the document.
That text should be removed or replaced:
By submitting this Internet-Draft, I certify that any applicable patent
or other IPR claims of which I am aware have been disclosed, or
will be disclosed, and any of which I become aware will be
disclosed, in accordance with RFC 3668.
Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
----------------------------------------------------------------------------
== No 'Intended status' indicated for this document; assuming Proposed
Standard
Checking nits according to https://www.ietf.org/id-info/checklist :
----------------------------------------------------------------------------
** The document seems to lack a Security Considerations section.
** The document seems to lack an IANA Considerations section. (See Section
2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
when there are no actions for IANA.)
** There are 11 instances of too long lines in the document, the longest
one being 25 characters in excess of 72.
** There is 1 instance of lines with control characters in the document.
== There is 2 instances of lines with non-RFC6890-compliant IPv4 addresses
in the document. If these are example addresses, they should be changed.
-- The document has examples using IPv4 documentation addresses according
to RFC6890, but does not use any IPv6 documentation addresses. Maybe
there should be IPv6 examples, too?
Miscellaneous warnings:
----------------------------------------------------------------------------
== The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
match the current year
== Line 135 has weird spacing: '...ould be noted...'
-- The document seems to lack a disclaimer for pre-RFC5378 work, but may
have content which was first submitted before 10 November 2008. If you
have contacted all the original authors and they are all willing to grant
the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
this comment. If not, you may need to add the pre-RFC5378 disclaimer.
(See the Legal Provisions document at
https://trustee.ietf.org/license-info for more information.)
-- The document date (February 9, 2005) is 7015 days in the past. Is this
intentional?
Checking references for intended status: Proposed Standard
----------------------------------------------------------------------------
(See RFCs 3967 and 4897 for information about using normative references
to lower-maturity documents in RFCs)
-- Possible downref: Non-RFC (?) normative reference: ref. '2'
-- Possible downref: Non-RFC (?) normative reference: ref. '3'
-- Possible downref: Non-RFC (?) normative reference: ref. '4'
-- Possible downref: Non-RFC (?) normative reference: ref. '5'
** Obsolete normative reference: RFC 3513 (ref. '6') (Obsoleted by RFC 4291)
== Outdated reference: A later version (-08) exists of
draft-irtf-asrg-dnsbl-01
Summary: 11 errors (**), 0 flaws (~~), 5 warnings (==), 12 comments (--).
Run idnits with the --verbose option for more detailed information about
the items above.
--------------------------------------------------------------------------------
2 Network Working Group A. Newton
3 Internet-Draft VeriSign, Inc.
4 Expires: August 10, 2005 Y. Shafranovich
5 SolidMatrix Technologies, Inc.
6 February 9, 2005
8 Distributed Black/White Lists
9 draft-newton-shafranovich-distributed-blacklists-00
11 Status of this Memo
13 By submitting this Internet-Draft, I certify that any applicable
14 patent or other IPR claims of which I am aware have been disclosed,
15 and any of which I become aware will be disclosed, in accordance with
16 RFC 3668.
18 Internet-Drafts are working documents of the Internet Engineering
19 Task Force (IETF), its areas, and its working groups. Note that
20 other groups may also distribute working documents as
21 Internet-Drafts.
23 Internet-Drafts are draft documents valid for a maximum of six months
24 and may be updated, replaced, or obsoleted by other documents at any
25 time. It is inappropriate to use Internet-Drafts as reference
26 material or to cite them other than as "work in progress."
28 The list of current Internet-Drafts can be accessed at
29 http://www.ietf.org/ietf/1id-abstracts.txt.
31 The list of Internet-Draft Shadow Directories can be accessed at
32 http://www.ietf.org/shadow.html.
34 This Internet-Draft will expire on August 10, 2005.
36 Copyright Notice
38 Copyright (C) The Internet Society (2005). All Rights Reserved.
40 Abstract
42 Many traditional, centrally-managed blacklists and whitelists
43 describe Internet end-points by characteristics such as connectivity
44 type or network function, and these characteristics are often used to
45 infer behavior from which authorization is derived. However, it is
46 often the case that connectivity type or network function are not
47 related to good or bad behavior. This document describes a means of
48 creating blacklists and whitelists representative of Internet
49 end-points based on observed behavior by many participants in a
50 distributed monitoring network. The authors hope that distributed
51 lists will mitigate some of the problems associated with existing
52 centrally managed lists. While the concept, architecture, and data
53 model are general enough to be applied to any type of network
54 service, the authors of this document are specifically addressing the
55 problem of spam in blogs.
57 Table of Contents
59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
60 2. Document Terminology . . . . . . . . . . . . . . . . . . . . . 5
61 3. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 6
62 4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 8
63 5. Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9
64 6. Formal XML Syntax . . . . . . . . . . . . . . . . . . . . . . 12
65 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
66 7.1 Normative References . . . . . . . . . . . . . . . . . . . . 15
67 7.2 Informative References . . . . . . . . . . . . . . . . . . . 15
68 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 15
69 Intellectual Property and Copyright Statements . . . . . . . . 17
71 1. Introduction
73 For years, blacklists have been used as an authorization policy
74 mechanism for public network services, mostly email. These
75 centrally-managed blacklists lists can be categorized into two
76 groups:
77 o lists containing Internet end-points based on certain
78 characteristics, such as how they are connected to the Internet
79 (e.g. dial-up or residential broadband) or a type of network
80 function they may serve (e.g. proxy or relay)
81 o lists containing Internet end-points that have been observed to
82 exhibit certain behavior (e.g. sending unsolicited email).
84 Additionally, recently a smaller but evergrowing number of whitelists
85 have been developed and deployed to assist network administrators in
86 determining authorization rights for public network services.
87 Centrally managed whitelists usually contain positive information
88 about Internet end-points that is being vouched for by the party that
89 administers the list. In some cases this information is collected by
90 the administrating party independently of the end points listed, but
91 in many cases the party administering the list charges a fee for
92 inclusion, thus essentially operating an accreditation service.
94 Some blacklists and whitelists are do not necessarally list bad or
95 good information, but rather seek to provide reputation information
96 about Internet end points. Unfortunatly, as the case with
97 blacklists, reputation services tend to suffer from many of the same
98 problems stemming from accountability issues.
100 The purpose of such lists is to erradicate certain undesirable
101 side-effects of a highly successful network, usually unsolicited
102 email. However, these lists have a great tendacy to inhibit
103 universal network access, in many cases outweighing their perceived
104 benefits. For example:
105 o While it is true that many senders of unsolicited email (spam) use
106 dial-up network connections, it is not reasonable to assume that
107 all dial-up network connections are used to send spam: the two are
108 unrelated.
109 o Constrained by the need for human verification, many lists
110 specializing in observed unwanted behavior tend to mark whole
111 networks as bad versus specific end-points, though there is no
112 evidence that every end-point in a network has exhibited
113 undesirable behavior.
114 o There is often little guidance available on the criteria used to
115 create these lists and seldom useful information on how to correct
116 errors in these lists.
117 o In the case of whitelists, a fee chargable for accreditation and
118 inclusion into a whitelist may inhibit certain Internet users from
119 obtaining network access. For example, individuals and
120 non-commercial users, especially ones from poorer countries may
121 not have the resources to pay an admission fee for inclusion into
122 a whitelist. If multiple whitelists become popular, the financial
123 burden will greatly descrease accessibility of Internet services
124 to those users.
125 For these reasons and more, these centrally-managed lists have failed
126 to make an impact on the spam problem and to be universally adopted.
127 This is all too evident given that spam continues to be a growing
128 problem not only in email, but slowly spreading to other network
129 services as well.
131 This document describes an architecture and data model for
132 Distributed Black/White Lists (DxL). The intent is to leverage an
133 peer to peer web-of-trust as opposed to a centrally managed list,
134 hopefully providing greater accuracy and understood accountability.
135 It should be noted, however, that the concept, architecture, and
136 data-model for DxLs could be applied to other network services.
137 However, the authors chose to target the design of DxLs toward a
138 relatively new type of web application called blogging.
140 2. Document Terminology
142 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
143 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
144 document are to be interpreted as described in RFC 2119 [1].
146 3. Motivations
148 Many of the problems arising in the use of blacklists and whitelists
149 is the fact that they are centrally managed by a third-party which
150 may not be accountable to or trusted by a network administrator who
151 wishes to use such lists. List users may also wish to express their
152 opinion on specific list entries or entire lists, but due to the
153 central nature of these lists that is not currently possible.
154 Additionally, many Internet users and network operators already have
155 existing relationships in place with others which can be utilized to
156 pass along blacklist and whitelist information, instead of
157 establishing new ones with the parties administering central lists.
159 In the real world, existing relationships and social networks are
160 often used to pass along reputation information, and the digital
161 world should in theory be no different. Thefore, in order to step
162 around the problem of trusting the party administering the central
163 list, we choose to distribute DxL information in a peer to peer
164 fashion. This gives users the ability to use their existing
165 relationships to establish a web of trust for the purposes of
166 authorizing access to public network services (which in this case are
167 ability to leave comments and trackbacks on blog posts, and passing
168 referer information). We also chose to allow lists to be combined
169 and passed on as new lists, thus allowing trust information to be
170 propogates via a social network.
172 Aditionally, in order to enforce accountability and transparency, we
173 chose to require URLs pointing to the original list from which the
174 information originates, URLs pointing to a removal page, and
175 creation/update data for all entries. While these may not be checked
176 for validity in all cases, nevertheless their presence indicates to
177 the list creators and users that these are mattersnot to be ignored.
178 Additionally, we believe that users will take the validity of this
179 information into account when trusting or not trusting specific
180 lists.
182 In order to allow flexibility for this system, we choose to add
183 weights to the list entries indicating the "black" or "white" value.
184 Many existing lists provides a binary "yes/no" decision in regards to
185 their entries which may not be flexible enough for all cases.
186 Additionally, a weight mechanism allows users to adjust weight
187 ratings on lists coming from other users based on their trust level.
189 Though this document may be the first formulization of a distributed
190 black/white list using XML, the concept of a peer-to-peer style
191 distribution of these lists has been seen in
192
193 and
195
196 .
198 4. Architecture
200 Unlike DNS-based blacklists [9] (known as DNSBLs) which operate over
201 DNS, a DxL is an XML document and is retrieved over the Internet by
202 using a protocol such as HTTP. This is modelled after RSS, which is
203 commonly found in the "blogosphere". Once retreived, a DxL is cached
204 for a period of time and checked for updates upon expiration. Note,
205 that this is not the only possible implementation or exchange
206 mechanism available for this data.
208 A DxL can be composed of entries derived from a private list based on
209 direct observation and other DxLs, known as component DxLs. Hence, a
210 DxL propogates data from many sources.
212 5. Data Model
214 This section describes the data model of a DxL. The formal syntax
215 for a DxL is described in Section 6.
217 Each DxL has the following attributes:
218 o DxL URI - a URI pointing to the DxL
219 o description - a short, textual description describing the DxL
220 o description URI - a URI pointing to a longer description of the
221 DxL
222 o expiration date and time
223 o creation date and time
224 o last updated date and time
225 Each of these attributes is optional.
227 Each item in a DxL describes an observed instance with the following
228 trace data:
229 o either an IPv4 or IPv6 address
230 o a protocol identifier: either a domain name or a URI (a domain
231 name is RECOMMENDED given that URIs are free to manufacture)
232 o protocol content: domain names, URIs, or regular expressions
233 (regex) describing parts of content (domain names are RECOMMENDED)
234 - regular expressions must be typed with one of the following
235 identifiers:
236 * Perl - denotes a Perl style regular expression
237 * POSIX-enhanced - denotes a POSIX enhanced style regular
238 expression
239 * POSIX-basic - denotes a POSIX basic style regular expression
240 o proxy - a simple note indicating it was possible to detect that
241 the end-point served as a protocol-level proxy
242 o user agent
243 o application: text in the form of XXX.YYY where XXX is an
244 application name and YYY is a sub-application name - describes the
245 application or network service type specific to the trace data.
246 These values are defined as:
247 * web.referrer - web-based referrals
248 * blog.comments
249 * blog.trackbacks
251 The following are two examples of trace data from observed incidents:
252 1. A comment is left on a blog. The blog software records the
253 comment as coming from 192.0.2.1. The "URL" field was submitted
254 with the URI "http://example.org/foo" and the "comment" field was
255 submitted with the text "Buy all your foos at foo.example.org for
256 the lowest prices". The trace data would consist of the
257 following:
258 * an IPv4 address of 192.0.2.1
259 * a protocol URI of http://example.org/foo
260 * a content domain of foo.example.org or example.org
261 2. An entry is left in a referrer log on a web server. The entry
262 shows the request coming from 192.0.10.1 with a referral URI of
263 http://example.com/bar. The trace data would consist of the
264 following:
265 * an IPv4 address of 192.0.10.1
266 * a protocol URI of http://example.com/bar or a protocol domain
267 name of example.com
269 Each item in a DxL as the following meta-data associated with it:
270 o URI of DxL source - taken directly from the Dxl URI of the DxL
271 document where the item originated
272 o description
273 o description URI
274 o removal URI - points to a location where instructions may be found
275 for removing an item from the source DxL
276 o method - describes what process was used to determine inclusion of
277 the item if it originated from a component DxL. These methods
278 are:
279 * intersection - the item was found in a component DxL and by
280 direct observation of this DxL publisher
281 * union - the item was found in a component DxL and was not
282 directly observed by the publisher of this DxL
283 * direct - the item was found only by direct obersvation
284 o hops - a non-negative integer indicating the number of times the
285 item has been derived from a component DxL. Zero indicates the
286 item is in the DxL of the publisher who made the observation.
287 o weight - a value between -1.0 and 1.0 indicating a value judgement
288 on the item. Values less than 0 are considered negative (i.e. a
289 blacklisted item) and values greater than 0 are considered
290 positive (i.e. a whitelisted item). Zero is considered neutral.
291 If value judgements are simply to be boolean (either positive or
292 negative), the values 1.0 and -1.0 SHOULD be used.
293 o expiration date and time
294 o created date and time
295 o last updated date and time
297 The following is an example of a DxL document:
299
300
306 -
307
308 192.0.2.1
309 online-poker.com
310
311 www.online-poker.com
312 online-poker.com
313 http://www.online-poker.com/bogus
314
315 false
316 SpamBuddy/1.0
317
318 http://hxr.us/grumpops/dxl.xml
319 a persistent spammer
320 http://hxr.us/grumpops/dxl?item=abc123
321 http://hxr.us/grumpops/dxl-removal?item=abc123
322 intersection
323 0
324 1.0
325 2005-01-30T12:00:00Z
326 2005-01-20T12:00:00Z
327 2005-01-25T12:00:00Z
328
329 -
330
331 ff:ee::00
332 http://vegas-hotels.com/
333
334 www.vegas-hotels.com
335 visit.vegas-hotels.com
336 http://www.vegas-hotels.com/offer
337 http://www.vegas-hotels.com/redeem
338
339 true
340 SpamBuddy/1.0
341
342 http://shaftek.org/dxl.xml
343 a very persistent spammer
344 http://shaftek.org/dxl?item=def456
345 http://shaftek.org/dxl-removal?item=def456
346 intersection
347 1
348 0.7
349 2005-01-31T12:00:00Z
350 2005-01-22T12:00:00Z
351 2005-01-25T12:00:00Z
352
353
355 6. Formal XML Syntax
357 The following describes the formal XML syntax for DxL instances using
358 XML Schema (see [2], [3], [5], and [4]). Implementors should note
359 that this is only a formalization of the syntax for creation of
360 interoperable processes and that an XML Schema capable parser is not
361 required.
363 This formal definition uses the XML Schema 'anyType' is places where
364 formal syntax definitions already exist:
365 o the syntax for domains is defined in [8]
366 o the syntax for IPv4 addresses is defined in [7]
367 o the syntax for IPv6 addresses is defined in [6]
368 In these cases, the formal syntax defers to the appropriate original
369 defintion.
371
372
377
378
379 A schema for describing
380 distributed black/white lists (DxL)
381
382
384
385
386
387
389
391
392
393
394
395
396
397
398
399
401
402
403
404
405
406
407
408
409 as defined by RFC 0791
410
411
412
413
414 as defined by RFC 3513
415
416
417
418
419
420
421 as defined by RFC 1035
422
423
424
425
426
427
428
429
430
431 as defined by RFC 1035
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
452
453
454
455
457
458
459
460
461
462
463
464
465
466
467
470
471
473
474
475
476
477
478
479
481
482
483
484
485
487
489 7. References
491 7.1 Normative References
493 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
494 Levels", RFC 2119, BCP 14, March 1997.
496 [2] World Wide Web Consortium, "Extensible Markup Language (XML)
497 1.0", W3C XML, February 1998,
498 .
500 [3] World Wide Web Consortium, "Namespaces in XML", W3C XML
501 Namespaces, January 1999,
502 .
504 [4] World Wide Web Consortium, "XML Schema Part 2: Datatypes", W3C
505 XML Schema, October 2000,
506 .
508 [5] World Wide Web Consortium, "XML Schema Part 1: Structures", W3C
509 XML Schema, October 2000,
510 .
512 [6] Hinden, R. and S. Deering, "Internet Protocol Version 6 (IPv6)
513 Addressing Architecture", RFC 3513, April 2003.
515 [7] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981.
517 [8] Mockapetris, P., "Domain names - implementation and
518 specification", STD 13, RFC 1035, November 1987.
520 7.2 Informative References
522 [9] Levine, J., "DNS Based Blacklists and Whitelists for E-Mail",
523 draft-irtf-asrg-dnsbl-01.txt (work in progress), November 2004.
525 Authors' Addresses
527 Andrew L. Newton
528 VeriSign, Inc.
529 21345 Ridgetop Circle
530 Sterling, VA 20166
531 USA
533 Phone: +1 703 948 3382
534 EMail: anewton@verisignlabs.com; andy@hxr.us
535 URI: http://www.verisignlabs.com/
536 Yakov Shafranovich
537 SolidMatrix Technologies, Inc.
539 EMail: YakovS@solidmatrix.com; ietf@shaftek.org
540 URI: http://www.shaftek.org/
542 Intellectual Property Statement
544 The IETF takes no position regarding the validity or scope of any
545 Intellectual Property Rights or other rights that might be claimed to
546 pertain to the implementation or use of the technology described in
547 this document or the extent to which any license under such rights
548 might or might not be available; nor does it represent that it has
549 made any independent effort to identify any such rights. Information
550 on the procedures with respect to rights in RFC documents can be
551 found in BCP 78 and BCP 79.
553 Copies of IPR disclosures made to the IETF Secretariat and any
554 assurances of licenses to be made available, or the result of an
555 attempt made to obtain a general license or permission for the use of
556 such proprietary rights by implementers or users of this
557 specification can be obtained from the IETF on-line IPR repository at
558 http://www.ietf.org/ipr.
560 The IETF invites any interested party to bring to its attention any
561 copyrights, patents or patent applications, or other proprietary
562 rights that may cover technology that may be required to implement
563 this standard. Please address the information to the IETF at
564 ietf-ipr@ietf.org.
566 Disclaimer of Validity
568 This document and the information contained herein are provided on an
569 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
570 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
571 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
572 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
573 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
574 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
576 Copyright Statement
578 Copyright (C) The Internet Society (2005). This document is subject
579 to the rights, licenses and restrictions contained in BCP 78, and
580 except as set forth therein, the authors retain all their rights.
582 Acknowledgment
584 Funding for the RFC Editor function is currently provided by the
585 Internet Society.