idnits 2.17.1 

draft-levine-iprangepub-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses
     in the document.  If these are example addresses, they should be changed.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 28, 2010) is 4868 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Internet Research Task Force                                   J. Levine
3	Internet-Draft                                      Taughannock Networks
4	Intended status: Experimental                          December 28, 2010
5	Expires: July 1, 2011

7	    An efficient method to publish ranges of IP addresses in the DNS
8	                       draft-levine-iprangepub-01

10	Abstract

12	   The DNS has long been used to publish lists of IPv4 address ranges in
13	   blacklists and whitelists.  The size of the IPv6 address space makes
14	   the entry-per-IP approach used for IPv4 lists impractical.  A new
15	   technique for publishing IP address ranges is described.  It is
16	   intended to permit efficient publishing and querying, and to have
17	   good DNS cache behavior.

19	Status of this Memo

21	   This Internet-Draft is submitted in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF).  Note that other groups may also distribute
26	   working documents as Internet-Drafts.  The list of current Internet-
27	   Drafts is at http://datatracker.ietf.org/drafts/current/.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   This Internet-Draft will expire on July 1, 2011.

36	Copyright Notice

38	   Copyright (c) 2010 IETF Trust and the persons identified as the
39	   document authors.  All rights reserved.

41	   This document is subject to BCP 78 and the IETF Trust's Legal
42	   Provisions Relating to IETF Documents
43	   (http://trustee.ietf.org/license-info) in effect on the date of
44	   publication of this document.  Please review these documents
45	   carefully, as they describe your rights and restrictions with respect
46	   to this document.  Code Components extracted from this document must
47	   include Simplified BSD License text as described in Section 4.e of
48	   the Trust Legal Provisions and are provided without warranty as
49	   described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
54	   2.  Assumptions and Goals  . . . . . . . . . . . . . . . . . . . .  4
55	   3.  DNS record format  . . . . . . . . . . . . . . . . . . . . . .  4
56	   4.  Lookup algorithm . . . . . . . . . . . . . . . . . . . . . . .  5
57	   5.  Details of blob representation . . . . . . . . . . . . . . . .  6
58	   6.  Building and updating DNSBLs . . . . . . . . . . . . . . . . .  6
59	     6.1.  Building static DNSxLs . . . . . . . . . . . . . . . . . .  7
60	     6.2.  Building and updating dynamic DNSxLs . . . . . . . . . . .  7
61	   7.  Estimated performance  . . . . . . . . . . . . . . . . . . . .  8
62	   8.  Security considerations  . . . . . . . . . . . . . . . . . . .  9
63	   9.  Topics for further consideration . . . . . . . . . . . . . . .  9
64	   10. IANA considerations  . . . . . . . . . . . . . . . . . . . . . 10
65	   11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
66	     11.1. References - Normative . . . . . . . . . . . . . . . . . . 10
67	     11.2. References - Informative . . . . . . . . . . . . . . . . . 10
68	   Appendix A.  Change Log  . . . . . . . . . . . . . . . . . . . . . 10
69	     A.1.  Changes from -00 to -01  . . . . . . . . . . . . . . . . . 11
70	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11

72	1.  Introduction

74	   For many years, the Domain Name System[RFC1034] [RFC1035] has been
75	   the Internet's de facto distributed database.  Blacklists and
76	   whitelists of IPv4 addresses have been published in the DNS using a
77	   simple system adapted from rDNS[RFC5782].  A DNSxL (a DNSBL or DNSWL)
78	   is a DNS sub-tree, typically also a DNS zone, with each listed IP
79	   having an A and/or TXT record at a name corresponding to the IP
80	   address.  While this publication method has worked well for IPv4
81	   addresses, the size of the IPv6 address space makes an analogous
82	   approach unworkable.

84	   In an IPv4 Internet, each network is typically limited to a few
85	   thousand or at most a few million addresses.  A single host typically
86	   has a single address, or at most a few hundred addresses.  The
87	   limited size of each network forces a host to use its assigned
88	   address or addresses.  In IPv6 networks, hosts typically use
89	   Stateless Address Autoconfiguration [RFC4862] to select an IP
90	   address, with the low 64 bits of the address being almost entirely
91	   arbitrary.  A hostile host sending mail can easily switch to a new
92	   address for every message it sends, never reusing an address, due to
93	   the vast size of the IPv6 address space.

95	   An IPv6 DNS blacklist would use wildcards or a specialized server
96	   such as rbldnsd [RBLDNSD] to list entire /64 or larger ranges, but
97	   that does not help DNS performance.  Since wildcards are expanded by
98	   the DNS server, every query for a unique IP address causes a unique
99	   query to the DNSBL's server.  Moreover, every unique query will take
100	   up a cache entry in the client's local DNS cache, either a regular
101	   entry if the DNSBL lists the query's address, or a negative
102	   entry[RFC2308] if it doesn't.  In the event that hostile mailers
103	   (which we will call "spammers" for short) use a unique address per
104	   message, the normal DNSBL query traffic will both flood the DNSBL's
105	   server and fill local caches with useless single-use entries, forcing
106	   out other cached data and causing excess traffic to all the other
107	   servers the caches query as well.

109	   For blacklists, an obvious approach would be to limit the granularity
110	   of DNSBLs, so that, say, each /64 had a separate listing, and the
111	   queries only used the high 64 bits of each address.  While this might
112	   limit the damage from DNSBL queries, it is not helpful for DNS
113	   whitelists, which by their nature list individual IP addresses, and
114	   are likely to be far more popular with IPv6 mail than they have been
115	   with IPv4 mail.

117	   The problem of looking up an address in a sorted list of IP addresses
118	   stored on a remote DNS server is not unlike the problem of searching
119	   for a key in a sorted list of keys stored on a disk, a problem that
120	   is common in database applications.  An approach called _B-trees_ has
121	   been widely used in databases since the 1970s, and is described in
122	   standard references such as [KNUTHV3].  The technique in this
123	   document stores ordered sets of IP address ranges in DNS records,
124	   using a straightforward adaptation of the way that B-trees store
125	   ordered sets of key strings in disk blocks.

127	2.  Assumptions and Goals

129	   This design is intended to meet this set of design goals:

131	   1.  Handle arbitrary mixtures of prefix ranges and individual IPs.

133	   2.  Work well with existing DNS servers and caches.  The number of
134	       different queries should be limited, both to limit cache->server
135	       traffic and the number of cache entries it uses.  I assume that
136	       client->cache queries are cheap, while cache->server queries are
137	       much more expensive.

139	   3.  Don't assume senders will be well behaved.  In particular, bad
140	       guys may use a new IPv6 address for every message, hopping around
141	       within each /64 (or whatever the network size is) in which a
142	       zombie lives, and there will be many zombies active at the same
143	       time.

145	   4.  Don't assume MTAs remember query results from one lookup to the
146	       next, so the necessary info should be available in the DNS cache.
147	       (If they do, it doesn't hurt.)

149	   5.  Be compatible with DNSSEC, but don't depend on DNSSEC.

151	   6.  Don't attempt to be compatible with existing DNSBLs.  I expect
152	       these lists will mostly be served from something like rbldnsd,
153	       although it would also be possible to have an external program
154	       create the zone files and serve them from a conventional DNS
155	       server such as BIND.

157	3.  DNS record format

159	   Each DNS record is a blob of bytes containing multiple self-
160	   describing logical entries, each of which is a an IP address prefix,
161	   which might be a single IP if the length of the prefix is 128 (IPv6)
162	   or 32 (IPv4).  In each blob, the prefixes are in address order from
163	   lowest to highest.  The address ranges of prefixes cannot overlap.
164	   The name of each record is the address entry that implicitly points
165	   to it, which is the entry immediately preceding it.  In this context,
166	   there is no advantage to doing rDNS-style nibble reversed naming, so
167	   the name is just 32 ASCII characters for an IPv6 DNSxL or 8
168	   characters for an IPv4 DNSxL, the address as a hex number.  One blob
169	   is the tree root, which is named with all zeros, such as
170	   00000000000000000000000000000000.dnsxl.example or
171	   00000000.dnsxl.example.

173	   The simplest way to represent a blob is as a TXT record, with all the
174	   strings concatenated.  The more entries that fit in a blob, the
175	   better this will work, so although it would be possible for each
176	   logical entry to be a separate string in the TXT record, it doesn't
177	   to do that.

179	   Each blob contains a set of the prefixes that are in the DNSxL.  For
180	   blobs that are not leaves (identified by a per-blob flag, described
181	   below), the the entries in the blob also partition the address space,
182	   with the prefixes between the ones in the current blob in sub-blobs,
183	   each named by the entries in the current blob.

185	   To minimize edge cases, the root blob always contains the lowest and
186	   highest entries.

188	4.  Lookup algorithm

190	   A client looks up an IP address in the DNSxL as follows:

192	   1.  Make the root blob the current blob and fetch it.

194	   2.  Search through the current blob for a prefix entry that contains
195	       the target IP address.  If you find it, stop, the DNSxL contains
196	       the IP.

198	   3.  If the IP address is lower than the first entry in the current
199	       blob, or higher than the last entry, stop, the DNSxL does not
200	       contain the IP.

202	   4.  If this is a leaf blob, stop, the DNSxL does not contain the IP.

204	   5.  Find the entry in the current blob that is just below the IP.
205	       Use the address in that entry as the name of new blob, fetch that
206	       blob, and make it the current one.

208	   6.  Go to Paragraph 2.

210	   It should be evident that this is analogous to a binary tree search,
211	   except that each node has a lot more than two descendants.

213	5.  Details of blob representation

215	   The first byte of each blob is a flag:

217	   L P P P P P P P

219	   The L bit is set if this is a leaf.  The PPPPPPP is the implicit
220	   prefix size.  If all of the addresses in the blob have the same
221	   initial bits as the name of the blob does, which is fairly likely due
222	   to the way that IPv6 addresses are allocated, this is the number of
223	   common initial bits.  The common prefix bits are not stored in the
224	   blob's entries, but are logically prefixed to each address in the
225	   blob.

227	   After that is some number of prefixes:

229	   X S S S S S S S
230	   Address

232	   X is reserved for now.  SSSSSSS is 0-127 (IPv6) or 0-31 (IPv4), the
233	   prefix size minus one.  The Address is 128-P-S or 32-P-S bits long,
234	   rounded up to a byte.  That is, it omits the common prefix bits which
235	   are obtained from the name of the blob, and the suffix bits beyond
236	   the prefix size.

238	   For example, say an entry is 2001:1234:5678:9ABC::/64, and the common
239	   prefix size is 16.  Then the entry would be (in hex)

241	   3f 12 34 56 78 9A BC

243	   The 3f is the prefix size (64-1), the 2001 is omitted since that's
244	   the prefix, and the rest is the address.

246	   Each blob is as big as will fit in a DNS answer.  If you don't
247	   believe in EDNS0, the limit is about 450 bytes.  If you do believe in
248	   EDNS0, it's whatever size you think clients ask for, probably in the
249	   2K to 4K range.

251	6.  Building and updating DNSBLs

253	   [[ Note: This section is somewhat speculative.  I have experience
254	   with disk-based B-trees, but haven't implemented any of this yet. ]]

256	   DNSxLs are typically compiled as a list of prefixes and lengths.
257	   They must be turned into a tree of named blobs before being served in
258	   the DNS.  The technique varies a little depending on whether the tree
259	   will be updated incrementally, or rebuilt from scratch if it changes.

261	6.1.  Building static DNSxLs

263	   A static DNSxL should have the minimum number of blobs, each of which
264	   should be as full as possible.  The technique to build the tree is a
265	   direct adaptation of the B-tree building technique in [WPBTREE].

267	   Start with a sorted list of prefix entries.  Save one entry for the
268	   next pass, then take as many entries as possible and make a blob out
269	   of them.  Repeat saving an entry and creating a blob until all
270	   entries are used.  These will be the leaf blobs.  Now, take the list
271	   of saved entries and repeat to create the blobs at the next level up.
272	   Keep repeating to create each level of the tree until the process
273	   creates only one blob.  Insert the one saved entry into that blob
274	   which will be the root.  That might make the blob overflow, in which
275	   case split it in half and move the first, middle, and last entries
276	   into a new root blob.

278	   When the list changes, rebuild it from scratch.

280	6.2.  Building and updating dynamic DNSxLs

282	   One of the reasons that B-trees are so widely used is that it is
283	   possible to update them efficiently without rebuilding them.  The
284	   same should apply here.

286	   The general approach to updates is to add or delete an entry, then if
287	   that makes a blob too big or makes it empty, rebalance the tree to
288	   fix the problem.  If a blob is too big, move entries into an adjacent
289	   blob if possible, otherwise split the blob.  This will require
290	   updating the blob above, which in turn might overflow if the update
291	   involves adding rather than replacing an entry.  (It might overflow
292	   even with a replacement if it makes the compressible prefix shorter.)
293	   In that case, repeat the process, potentially all the way to the
294	   root.  When deleting an entry, if the blob becomes empty, move its
295	   pointer entry from the blob above up into one of the adjacent blobs,
296	   then adjust the upper blob as needed.  Again, this might cause
297	   overflow in which case move entries between blobs or split the full
298	   one.

300	   A tree in which each blob is completely full is quite expensive to
301	   update.  The first insertion will cause a leaf to overflow, with
302	   overflows rippling all way up the tree to the root.  It would
303	   probably be a good idea when building a list that is intended to be
304	   updated to leave some slack space in each blob, to limit the ripple
305	   effect from changes.

307	   A significant difference between this design and a conventional
308	   B-tree is the version skew due to DNS caching.  In a normal B-tree,
309	   the pages (blobs) are locked while being changed, and the changes are
310	   immediately visible to all the clients.  In this case, the clients
311	   cache each blob for the DNS TTL.  If updates change the entries in
312	   non-leaf blobs, they will break the links between blobs since they
313	   use the entries as pointers.  A possible band-aid is to add temporary
314	   CNAME records at the former names pointing to the closest new name,
315	   so most (admittedly not all) of the entries can still be located.
316	   Once the TTL on the old blobs has expired, the CNAMEs can be deleted.

318	7.  Estimated performance

320	   The size of entries varies depending on the length of the prefixes
321	   and the amount of common prefix compression.  A /64 with no common
322	   prefix would take 9 bytes, so I'll use 10 bytes as an estimate of
323	   average entry size.  With EDNS0 and 4K records, that would allow 400
324	   entries per blob.  A two-level tree could hold 160,000 entries, a
325	   three level tree 64 million entries, which would need 160,000 blobs.
326	   Large v4 DNSBLs like the CBL have about seven million entries now, so
327	   this should be adequate.  If blobs have to fit in 512 byte responses,
328	   that would be about 40 entries per blob.  A five-level tree could
329	   hold 100 million entries in about 2.5 million blobs, still adequate.

331	   The number of queries for any particular lookup is the number of
332	   levels, which is unlikely to be more than five in a DNSxL of
333	   plausible size.  The cache behavior obviously depends on both the
334	   layout of the entries and the query pattern, but this design avoids
335	   some obvious worst cases.  If a /64 is either entirely listed, not
336	   listed at all, or just has a single /128 listed, all queries for
337	   addresses in that /64 will refetch the same four or five records.  If
338	   a large range of addresses is either listed in one prefix, or not
339	   listed at all, all queries will refetch the same set of blobs, which
340	   would be likely to be cached.

342	   The total number of DNS records used is always less than the number
343	   of records for a traditional entry-per-IP DNSxL for the same set of
344	   entries.  Since all the DNS queries are made by following the tree of
345	   entries, clients shouldn't make queries that fail, so there will be
346	   no negative cache entries.  (This isn't quite true due to version
347	   skew in updated DNSxLs, but it's hard to imagine a plausible scenario
348	   in which there would be a lot of different failing queries.)  This
349	   suggests that the overall cache behavior will be no worse than, and
350	   quite possibly much better than the behavior of traditional IPv4
351	   DNSxLs.

353	8.  Security considerations

355	   Semantically, there is little difference between a DNSxL published
356	   using this scheme and one published using the traditional entry per
357	   IP approach, since both publish the operator's opinion about some
358	   subset of the IP address space.

360	   One significant practical difference is that it is much easier for
361	   clients to obtain copies of all or part of the database.  For a
362	   traditional DNSxL, the only way to determine its contents is to query
363	   the entire address space (or at least the active part of it) one
364	   address at a time, which would require several billion queries for
365	   IPv4, and is deterred by rate limiting the queries.  In this scheme,
366	   the names of all of the DNS records are easy for clients to
367	   determine, so they can efficiently walk the tree.  While rate
368	   limiting is possible, it is less effective since clients fetch more
369	   data with each query.  It is also easy for a client to fetch all the
370	   entries for a particular IP range, such as the range of a network the
371	   client controls to see what parts of it are blacklisted.

373	9.  Topics for further consideration

375	   o  Conventional IPv4 DNSBLs generally can return an A record or a TXT
376	      record.  The A record often has information coded in the low byte
377	      or two that identifies the type of listing, or which of several
378	      sub-lists the entry is on.  If it's important to return a byte or
379	      two of result, it would be straightforward to add the byte or two
380	      to each entry, at a modest increase in entry size and hence some
381	      loss in performance.  The TXT records are typically either all the
382	      same, or there's one per A value, perhaps with the IP address
383	      interpolated into the string to provide a URL to look up.  Those
384	      strings could be constructed in client libraries, with templates
385	      stored in the DNS, e.g. the string for code 42 might be in a TXT
386	      record at 42._strings.dnsxl.example.

388	   o  There might be better ways to do prefix compression, e.g., a per-
389	      entry field that says how many bits are the same as the previous
390	      entry.  Entries in blobs could be bit rather than byte aligned,
391	      although I expect that would be a lot more work for minimal extra
392	      compression.  There may be clever tricks to allocate entries into
393	      blobs to maximize the size of the prefix in each blob.  If a blob
394	      consists entirely of /128's it might be worth a special case,
395	      leaving out the length byte on each entry.

397	   o  When adding a new entry to a full leaf blob, another possibility
398	      would be to make the blob a non-leaf, and create two new leaves
399	      below it.  This would make updates faster and less disruptive, at
400	      the cost of possibly slower lookups since some parts of the tree
401	      will be deeper.  Perhaps a hybrid approach would make sense,
402	      rebuild or rebalance the tree when it gets too ragged, with more
403	      than a one-level depth difference between the deepest and
404	      shallowest leaves.

406	10.  IANA considerations

408	   This document makes no requests to IANA.  All data are stored and
409	   queried using existing DNS record types and operations.

411	11.  References

413	11.1.  References - Normative

415	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
416	              STD 13, RFC 1034, November 1987.

418	   [RFC1035]  Mockapetris, P., "Domain names - implementation and
419	              specification", STD 13, RFC 1035, November 1987.

421	11.2.  References - Informative

423	   [KNUTHV3]  Knuth, D., "The Art of Computer Programming: Volume 3,
424	              Sorting and Searching", 1998.

426	   [RBLDNSD]  Tokarev, M., "rbldnsd: Small Daemon for DNSBLs".

428	   [RFC2308]  Andrews, M., "Negative Caching of DNS Queries (DNS
429	              NCACHE)", RFC 2308, March 1998.

431	   [RFC4862]  Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless
432	              Address Autoconfiguration", RFC 4862, September 2007.

434	   [RFC5782]  Levine, J., "DNS Blacklists and Whitelists", RFC 5782,
435	              February 2010.

437	   [WPBTREE]  Wikipedia, "B-tree", December 2010.

439	Appendix A.  Change Log

441	   *NOTE TO RFC EDITOR: This section may be removed upon publication of
442	   this document as an RFC.*

444	A.1.  Changes from -00 to -01

446	   Change CIDRs to prefixes.  Allow for IPv4 addresses.

448	   Add possible updates producing unbalanced trees.

450	Author's Address

452	   John Levine
453	   Taughannock Networks
454	   PO Box 727
455	   Trumansburg, NY  14886

457	   Phone: +1 831 480 2300
458	   Email: standards@taugh.com
459	   URI:   http://jl.ly