idnits 2.17.1 draft-levine-iprangepub-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 21, 2010) is 4876 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Research Task Force J. Levine 3 Internet-Draft Taughannock Networks 4 Intended status: Experimental December 21, 2010 5 Expires: June 24, 2011 7 An efficient method to publish ranges of IP addresses in the DNS 8 draft-levine-iprangepub-00 10 Abstract 12 The DNS has long been used to publish lists of IPv4 address ranges in 13 blacklists and whitelists. The size of the IPv6 address space makes 14 the entry-per-IP approach used for IPv4 lists impractical. A new 15 technique for publishing IPv6 address ranges is described. It is 16 intended to permit efficient publishing and querying, and to have 17 good DNS cache behavior. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on June 24, 2011. 36 Copyright Notice 38 Copyright (c) 2010 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Assumptions and Goals . . . . . . . . . . . . . . . . . . . . 4 55 3. DNS record format . . . . . . . . . . . . . . . . . . . . . . 4 56 4. Lookup algorithm . . . . . . . . . . . . . . . . . . . . . . . 5 57 5. Details of blob representation . . . . . . . . . . . . . . . . 5 58 6. Building and updating DNSBLs . . . . . . . . . . . . . . . . . 6 59 6.1. Building static DNSxLs . . . . . . . . . . . . . . . . . . 6 60 6.2. Building and updating dynamic DNSxLs . . . . . . . . . . . 7 61 7. Estimated performance . . . . . . . . . . . . . . . . . . . . 8 62 8. Security considerations . . . . . . . . . . . . . . . . . . . 8 63 9. Topics for further consideration . . . . . . . . . . . . . . . 9 64 10. IANA considerations . . . . . . . . . . . . . . . . . . . . . 9 65 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 66 11.1. References - Normative . . . . . . . . . . . . . . . . . . 10 67 11.2. References - Informative . . . . . . . . . . . . . . . . . 10 68 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10 70 1. Introduction 72 For many years, the Domain Name System[RFC1034] [RFC1035] has been 73 the Internet's de facto distributed database. Blacklists and 74 whitelists of IPv4 addresses have been published in the DNS using a 75 simple system adapted from rDNS[RFC5782]. A DNSxL (a DNSBL or DNSWL) 76 is a DNS sub-tree, typically also a DNS zone, with each listed IP 77 having an A and/or TXT record at a name corresponding to the IP 78 address. While this publication method has worked well for IPv4 79 addresses, the size of the IPv6 address space makes an analogous 80 approach unworkable. 82 In an IPv4 Internet, each network is typically limited to a few 83 thousand or at most a few million addresses. A single host typically 84 has a single address, or at most a few hundred addresses. The 85 limited size of each network forces a host to use its assigned 86 address or addresses. In IPv6 networks, hosts typically use 87 Stateless Address Autoconfiguration [RFC4862] to select an IP 88 address, with the low 64 bits of the address being almost entirely 89 arbitrary. A hostile host sending mail can easily switch to a new 90 address for every message it sends, never reusing an address, due to 91 the vast size of the IPv6 address space. 93 An IPv6 DNS blacklist would use wildcards or a specialized server 94 such as rbldnsd [RBLDNSD] to list entire /64 or larger ranges, but 95 that does not help DNS performance. Since wildcards are expanded by 96 the DNS server, every query for a unique IP address causes a unique 97 query to the DNSBL's server. Moreover, every unique query will take 98 up a cache entry in the client's local DNS cache, either a regular 99 entry if the DNSBL lists the query's address, or a negative 100 entry[RFC2308] if it doesn't. In the event that hostile mailers 101 (which we will call "spammers" for short) use a unique address per 102 message, the normal DNSBL query traffic will both flood the DNSBL's 103 server and fill local caches with useless single-use entries, forcing 104 out other cached data and causing excess traffic to all the other 105 servers the caches query as well. 107 For blacklists, an obvious approach would be to limit the granularity 108 of DNSBLs, so that, say, each /64 had a separate listing, and the 109 queries only used the high 64 bits of each address. While this might 110 limit the damage from DNSBL queries, it is not helpful for DNS 111 whitelists, which by their nature list individual IP addresses, and 112 are likely to be far more popular with IPv6 mail than they have been 113 with IPv4 mail. 115 The problem of looking up an address in a sorted list of IP addresses 116 stored on a remote DNS server is not unlike the problem of searching 117 for a key in a sorted list of keys stored on a disk, a problem that 118 is common in database applications. An approach called _B-trees_ has 119 been widely used in databases since the 1970s, and is described in 120 standard references such as [KNUTHV3]. The technique in this 121 document stores ordered sets of IP address ranges in DNS records, 122 using a straightforward adaptation of the way that B-trees store 123 ordered sets of key strings in disk blocks. 125 2. Assumptions and Goals 127 This design is intended to meet this set of design goals: 129 1. Handle arbitrary mixtures of CIDR ranges and individual IPs. 131 2. Work well with existing DNS servers and caches. The number of 132 different queries should be limited, both to limit cache->server 133 traffic and the number of cache entries it uses. I assume that 134 client->cache queries are cheap, while cache->server queries are 135 much more expensive. 137 3. Don't assume senders will be well behaved. In particular, bad 138 guys may use a new IP for every message, hopping around within 139 each /64 (or whatever the network size is) in which a zombie 140 lives, and there will be many zombies active at the same time. 142 4. Don't assume MTAs remember query results from one lookup to the 143 next, so the necessary info should be available in the DNS cache. 144 (If they do, it doesn't hurt.) 146 5. Be compatible with DNSSEC, but don't depend on DNSSEC. 148 6. Don't attempt to be compatible with existing DNSBLs. I expect 149 these lists will mostly be served from something like rbldnsd, 150 although it would also be possible to have an external program 151 create the zone files and serve them from a conventional DNS 152 server such as BIND. 154 3. DNS record format 156 Each DNS record is a blob of bytes containing multiple self- 157 describing logical entries, each of which is a CIDR range, which 158 might be a single IP if the CIDR is a /128. In each blob, the CIDRs 159 are in address order from lowest to highest. The name of each record 160 is the address of the entry that implicitly points to it, which is 161 the entry immediately preceding it. In this context, there is no 162 advantage to doing rDNS-style nibble reversed naming, so the name is 163 just 32 ASCII characters, the address as a hex number. One blob is 164 the tree root, which is named with all zeros, such as 165 00000000000000000000000000000000.dnsxl.example. 167 The simplest way to represent a blob is as a TXT record, with all the 168 strings concatenated. The more entries that fit in a blob, the 169 better this will work, so although it would be possible for each 170 logical entry to be a separate string in the TXT record, it doesn't 171 to do that. 173 Each blob contains a set of the CIDRs that are in the DNSxL. For 174 blobs that are not leaves (identified by a per-blob flag, described 175 below), the the entries in the blob also partition the address space, 176 with the CIDRs between the ones in the current blob in sub-blobs, 177 each named by the entries in the current blob. 179 To minimize edge cases, the root blob always contains the lowest and 180 highest entries. 182 4. Lookup algorithm 184 A client looks up an IP address in the DNSxL as follows: 186 1. Make the root blob the current blob and fetch it. 188 2. Search through the current blob for a CIDR entry that contains 189 the target IP address. If you find it, stop, the DNSxL contains 190 the IP. 192 3. If the IP address is lower than the first entry in the current 193 blob, or higher than the last entry, stop, the DNSxL does not 194 contain the IP. 196 4. If this is a leaf blob, stop, the DNSxL does not contain the IP. 198 5. Find the entry in the current blob that is just below the IP. 199 Use the address in that entry as the name of new blob, fetch that 200 blob, and make it the current one. 202 6. Go to Paragraph 2. 204 It should be evident that this is analogous to a binary tree search, 205 except that each node has a lot more than two descendants. 207 5. Details of blob representation 209 The first byte of each blob is a flag: 211 L P P P P P P P 213 The L bit is set if this is a leaf. The PPPPPPP is the prefix size. 214 If all of the CIDR addresses in the blob have the same initial bits 215 as the name of the blob does, which is fairly likely due to the way 216 that IPv6 addresses are allocated, this is the number of common 217 initial bits. The prefix bits are not stored in the blob's entries, 218 but are logically prefixed to each CIDR address in the blob. 220 After that is some number of CIDRs: 222 X S S S S S S S 223 Address 225 X is reserved for now. SSSSSSS is 0-127, the CIDR size minus one. 226 The Address is 128-P-S bits long, rounded up to a byte. That is, it 227 omits the prefix bits which are obtained from the name of the blob, 228 and the suffix bits beyond the CIDR size. 230 For example, say an entry is 2001:1234:5678:9ABC::/64, and the prefix 231 size is 16. Then the entry would be (in hex) 233 3f 12 34 56 78 9A BC 235 The 3f is the CIDR size (64-1), the 2001 is omitted since that's the 236 prefix, and the rest is the address. 238 Each blob is as big as will fit in a DNS answer. If you don't 239 believe in EDNS0, the limit is about 450 bytes. If you do believe in 240 EDNS0, it's whatever size you think clients ask for, probably in the 241 2K to 4K range. 243 6. Building and updating DNSBLs 245 [[ Note: This section is somewhat speculative. I have experience 246 with disk-based B-trees, but haven't implemented any of this yet. ]] 248 DNSxLs are typically compiled as a list of IP addresses or CIDR 249 ranges. They must be turned into a tree of named blobs before being 250 served in the DNS. The technique varies a little depending on 251 whether the tree will be updated incrementally, or rebuilt from 252 scratch if it changes. 254 6.1. Building static DNSxLs 256 A static DNSxL should have the minimum number of blobs, each of which 257 should be as full as possible. The technique to build the tree is a 258 direct adaptation of the B-tree building technique in [WPBTREE]. 260 Start with a sorted list of addresses or CIDRs. Save one entry for 261 the next pass, then take as many entries as possible and make a blob 262 out of them. Repeat saving and entry and creating a blob until all 263 entries are used. These will be the leaf blobs. Now, take the list 264 of saved addresses and repeat to create the blobs at the next level 265 up. Keep repeating to create each level of the tree until the 266 process creates only one blob. Insert the one saved entry into that 267 blob which will be the root. That might make the blob overflow, in 268 which case split it in half and move the first, middle, and last 269 entries into a new root blob. 271 When the list changes, rebuild it from scratch. 273 6.2. Building and updating dynamic DNSxLs 275 One of the reasons that B-trees are so widely used is that it is 276 possible to update them efficiently without rebuilding them. The 277 same should apply here. 279 The general approach to updates is to add or delete an entry, then if 280 that makes a blob too big or makes it empty, rebalance the tree to 281 fix the problem. If a blob is too big, move entries into an adjacent 282 blob if possible, otherwise split the blob. This will require 283 updating the blob above, which in turn might overflow if the update 284 involves adding rather than replacing an entry. (It might overflow 285 even with a replacement if it makes the compressible prefix shorter.) 286 In that case, repeat the process, potentially all the way to the 287 root. When deleting an entry, if the blob becomes empty, move its 288 pointer entry from the blob above up into one of the adjacent blobs, 289 then adjust the upper blob as needed. Again, this might cause 290 overflow in which case move entries between blobs or split the full 291 one. 293 A tree in which each blob is completely full is quite expensive to 294 update. The first insertion will cause a leaf to overflow, with 295 overflows rippling all way up the tree to the root. It would 296 probably be a good idea when building a list that is intended to be 297 updated to leave some slack space in each blob, to limit the ripple 298 effect from changes. 300 A significant difference between this design and a conventional 301 B-tree is the version skew due to DNS caching. In a normal B-tree, 302 the pages (blobs) are locked while being changed, and the changes are 303 immediately visible to all the clients. In this case, the clients 304 cache each blob for the DNS TTL. If updates change the entries in 305 non-leaf blobs, they will break the links between blobs since they 306 use the entries as pointers. A possible band-aid is to add temporary 307 CNAME records at the former names pointing to the closest new name, 308 so most (admittedly not all) of the entries can still be located. 309 Once the TTL on the old blobs has expired, the CNAMEs can be deleted. 311 7. Estimated performance 313 The size of entries varies depending on the CIDR size and the amount 314 of prefix compression. An uncompressed /64 would take 9 bytes, so 315 I'll use 10 bytes as an estimate of average entry size. With EDNS0 316 and 4K records, that would allow 400 entries per blob. A two-level 317 tree could hold 160,000 entries, a three level tree 64 million 318 entries, which would need 160,000 blobs. Large v4 DNSBLs like the 319 CBL have about seven million entries now, so this should be adequate. 320 If blobs have to fit in 512 byte responses, that would be about 40 321 entries per blob, A five-level tree could hold 100 million entries in 322 about 2.5 million blobs, still adequate. 324 The number of queries for any particular lookup is the number of 325 levels, which is unlikely to be more than five. The cache behavior 326 obviously depends on both the layout of the entries and the query 327 pattern, but this design avoids some obvious worst cases. If a /64 328 is either entirely listed, not listed at all, or just has a single 329 /128, all queries for addresses in that /64 will refetch the same 330 four records. If a large range of addresses is either listed in one 331 CIDR, or not listed at all, all queries will refetch the same set of 332 blobs, which would be likely to be cached. 334 The total number of DNS records used is always less than the number 335 of records for a traditional entry-per-IP DNSxL. Since all the DNS 336 queries are made by following the tree of entries, clients shouldn't 337 make queries that fail, so there will be no negative cache entries. 338 (This isn't quite true due to version skew in updated DNSxLs, but 339 it's hard to imagine a plausible scenario in which there would be a 340 large number of failing queries.) This suggests that the overall 341 cache behavior will be no worse than, and quite possibly much better 342 than the behavior of IPv4 DNSxLs. 344 8. Security considerations 346 Semantically, there is little difference between a DNSxL published 347 using this scheme and one published using the traditional entry per 348 IP approach, since both publish the operators opinion about some 349 subset of the IP address space. 351 One significant practical difference is that it is much easier for 352 clients to obtain copies of all or part of the database. For a 353 traditional DNSxL, the only way to determine its contents is to query 354 the entire IPv4 address space (or at least the active part of it), 355 which would require several billion queries, and is deterred by rate 356 limiting the queries. In this scheme, the names of all of the DNS 357 records are easy for clients to determine, so they can efficiently 358 walk the tree. While rate limiting is possible, it is less effective 359 since clients fetch more data with each query. It is also easy for a 360 client to fetch all the entries for a particular IP range, such as 361 the range of a network the client controls to see what parts of it 362 are blacklisted. 364 9. Topics for further consideration 366 o Conventional IPv4 DNSBLs generally can return an A record or a TXT 367 record. The A record often has information coded in the low byte 368 or two that identifies the type of listing, or which of several 369 sub-lists the entry is on. If it's important to return a byte or 370 two of result, it would be straightforward to add the byte or two 371 to each entry, at a modest increase in entry size and hence some 372 loss in performance. The TXT records are typically either all the 373 same, or there's one per A value, perhaps with the IP address 374 interpolated into the string to provide a URL to look up. Those 375 strings could be constructed in client libraries, with templates 376 stored in the DNS, e.g. the string for code 42 might be in a TXT 377 record at 42._strings.dnsxl.example. 379 o There might be better ways to do prefix compression, e.g., a per- 380 entry field that says how many bits are the same as the previous 381 entry. Entries in blobs could be bit rather than byte aligned, 382 although I expect that would be a lot more work for minimal extra 383 compression. There may be clever tricks to allocate entries into 384 blobs to maximize the size of the prefix in each blob. If a blob 385 consists entirely of /128's it might be worth a special case, 386 leaving out the length byte on each entry. 388 10. IANA considerations 390 This document makes no requests to IANA. All data are stored and 391 queried using existing DNS record types and operations. 393 11. References 394 11.1. References - Normative 396 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 397 STD 13, RFC 1034, November 1987. 399 [RFC1035] Mockapetris, P., "Domain names - implementation and 400 specification", STD 13, RFC 1035, November 1987. 402 11.2. References - Informative 404 [KNUTHV3] Knuth, D., "The Art of Computer Programming: Volume 3, 405 Sorting and Searching", 1998. 407 [RBLDNSD] Tokarev, M., "rbldnsd: Small Daemon for DNSBLs". 409 [RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNS 410 NCACHE)", RFC 2308, March 1998. 412 [RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless 413 Address Autoconfiguration", RFC 4862, September 2007. 415 [RFC5782] Levine, J., "DNS Blacklists and Whitelists", RFC 5782, 416 February 2010. 418 [WPBTREE] Wikipedia, "B-tree", December 2010. 420 Author's Address 422 John Levine 423 Taughannock Networks 424 PO Box 727 425 Trumansburg, NY 14886 427 Phone: +1 831 480 2300 428 Email: standards@taugh.com 429 URI: http://jl.ly