idnits 2.17.1 draft-levine-iprangepub-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC3849-compliant IPv6 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 28, 2010) is 4868 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Research Task Force J. Levine 3 Internet-Draft Taughannock Networks 4 Intended status: Experimental December 28, 2010 5 Expires: July 1, 2011 7 An efficient method to publish ranges of IP addresses in the DNS 8 draft-levine-iprangepub-01 10 Abstract 12 The DNS has long been used to publish lists of IPv4 address ranges in 13 blacklists and whitelists. The size of the IPv6 address space makes 14 the entry-per-IP approach used for IPv4 lists impractical. A new 15 technique for publishing IP address ranges is described. It is 16 intended to permit efficient publishing and querying, and to have 17 good DNS cache behavior. 19 Status of this Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on July 1, 2011. 36 Copyright Notice 38 Copyright (c) 2010 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Assumptions and Goals . . . . . . . . . . . . . . . . . . . . 4 55 3. DNS record format . . . . . . . . . . . . . . . . . . . . . . 4 56 4. Lookup algorithm . . . . . . . . . . . . . . . . . . . . . . . 5 57 5. Details of blob representation . . . . . . . . . . . . . . . . 6 58 6. Building and updating DNSBLs . . . . . . . . . . . . . . . . . 6 59 6.1. Building static DNSxLs . . . . . . . . . . . . . . . . . . 7 60 6.2. Building and updating dynamic DNSxLs . . . . . . . . . . . 7 61 7. Estimated performance . . . . . . . . . . . . . . . . . . . . 8 62 8. Security considerations . . . . . . . . . . . . . . . . . . . 9 63 9. Topics for further consideration . . . . . . . . . . . . . . . 9 64 10. IANA considerations . . . . . . . . . . . . . . . . . . . . . 10 65 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 66 11.1. References - Normative . . . . . . . . . . . . . . . . . . 10 67 11.2. References - Informative . . . . . . . . . . . . . . . . . 10 68 Appendix A. Change Log . . . . . . . . . . . . . . . . . . . . . 10 69 A.1. Changes from -00 to -01 . . . . . . . . . . . . . . . . . 11 70 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 1. Introduction 74 For many years, the Domain Name System[RFC1034] [RFC1035] has been 75 the Internet's de facto distributed database. Blacklists and 76 whitelists of IPv4 addresses have been published in the DNS using a 77 simple system adapted from rDNS[RFC5782]. A DNSxL (a DNSBL or DNSWL) 78 is a DNS sub-tree, typically also a DNS zone, with each listed IP 79 having an A and/or TXT record at a name corresponding to the IP 80 address. While this publication method has worked well for IPv4 81 addresses, the size of the IPv6 address space makes an analogous 82 approach unworkable. 84 In an IPv4 Internet, each network is typically limited to a few 85 thousand or at most a few million addresses. A single host typically 86 has a single address, or at most a few hundred addresses. The 87 limited size of each network forces a host to use its assigned 88 address or addresses. In IPv6 networks, hosts typically use 89 Stateless Address Autoconfiguration [RFC4862] to select an IP 90 address, with the low 64 bits of the address being almost entirely 91 arbitrary. A hostile host sending mail can easily switch to a new 92 address for every message it sends, never reusing an address, due to 93 the vast size of the IPv6 address space. 95 An IPv6 DNS blacklist would use wildcards or a specialized server 96 such as rbldnsd [RBLDNSD] to list entire /64 or larger ranges, but 97 that does not help DNS performance. Since wildcards are expanded by 98 the DNS server, every query for a unique IP address causes a unique 99 query to the DNSBL's server. Moreover, every unique query will take 100 up a cache entry in the client's local DNS cache, either a regular 101 entry if the DNSBL lists the query's address, or a negative 102 entry[RFC2308] if it doesn't. In the event that hostile mailers 103 (which we will call "spammers" for short) use a unique address per 104 message, the normal DNSBL query traffic will both flood the DNSBL's 105 server and fill local caches with useless single-use entries, forcing 106 out other cached data and causing excess traffic to all the other 107 servers the caches query as well. 109 For blacklists, an obvious approach would be to limit the granularity 110 of DNSBLs, so that, say, each /64 had a separate listing, and the 111 queries only used the high 64 bits of each address. While this might 112 limit the damage from DNSBL queries, it is not helpful for DNS 113 whitelists, which by their nature list individual IP addresses, and 114 are likely to be far more popular with IPv6 mail than they have been 115 with IPv4 mail. 117 The problem of looking up an address in a sorted list of IP addresses 118 stored on a remote DNS server is not unlike the problem of searching 119 for a key in a sorted list of keys stored on a disk, a problem that 120 is common in database applications. An approach called _B-trees_ has 121 been widely used in databases since the 1970s, and is described in 122 standard references such as [KNUTHV3]. The technique in this 123 document stores ordered sets of IP address ranges in DNS records, 124 using a straightforward adaptation of the way that B-trees store 125 ordered sets of key strings in disk blocks. 127 2. Assumptions and Goals 129 This design is intended to meet this set of design goals: 131 1. Handle arbitrary mixtures of prefix ranges and individual IPs. 133 2. Work well with existing DNS servers and caches. The number of 134 different queries should be limited, both to limit cache->server 135 traffic and the number of cache entries it uses. I assume that 136 client->cache queries are cheap, while cache->server queries are 137 much more expensive. 139 3. Don't assume senders will be well behaved. In particular, bad 140 guys may use a new IPv6 address for every message, hopping around 141 within each /64 (or whatever the network size is) in which a 142 zombie lives, and there will be many zombies active at the same 143 time. 145 4. Don't assume MTAs remember query results from one lookup to the 146 next, so the necessary info should be available in the DNS cache. 147 (If they do, it doesn't hurt.) 149 5. Be compatible with DNSSEC, but don't depend on DNSSEC. 151 6. Don't attempt to be compatible with existing DNSBLs. I expect 152 these lists will mostly be served from something like rbldnsd, 153 although it would also be possible to have an external program 154 create the zone files and serve them from a conventional DNS 155 server such as BIND. 157 3. DNS record format 159 Each DNS record is a blob of bytes containing multiple self- 160 describing logical entries, each of which is a an IP address prefix, 161 which might be a single IP if the length of the prefix is 128 (IPv6) 162 or 32 (IPv4). In each blob, the prefixes are in address order from 163 lowest to highest. The address ranges of prefixes cannot overlap. 164 The name of each record is the address entry that implicitly points 165 to it, which is the entry immediately preceding it. In this context, 166 there is no advantage to doing rDNS-style nibble reversed naming, so 167 the name is just 32 ASCII characters for an IPv6 DNSxL or 8 168 characters for an IPv4 DNSxL, the address as a hex number. One blob 169 is the tree root, which is named with all zeros, such as 170 00000000000000000000000000000000.dnsxl.example or 171 00000000.dnsxl.example. 173 The simplest way to represent a blob is as a TXT record, with all the 174 strings concatenated. The more entries that fit in a blob, the 175 better this will work, so although it would be possible for each 176 logical entry to be a separate string in the TXT record, it doesn't 177 to do that. 179 Each blob contains a set of the prefixes that are in the DNSxL. For 180 blobs that are not leaves (identified by a per-blob flag, described 181 below), the the entries in the blob also partition the address space, 182 with the prefixes between the ones in the current blob in sub-blobs, 183 each named by the entries in the current blob. 185 To minimize edge cases, the root blob always contains the lowest and 186 highest entries. 188 4. Lookup algorithm 190 A client looks up an IP address in the DNSxL as follows: 192 1. Make the root blob the current blob and fetch it. 194 2. Search through the current blob for a prefix entry that contains 195 the target IP address. If you find it, stop, the DNSxL contains 196 the IP. 198 3. If the IP address is lower than the first entry in the current 199 blob, or higher than the last entry, stop, the DNSxL does not 200 contain the IP. 202 4. If this is a leaf blob, stop, the DNSxL does not contain the IP. 204 5. Find the entry in the current blob that is just below the IP. 205 Use the address in that entry as the name of new blob, fetch that 206 blob, and make it the current one. 208 6. Go to Paragraph 2. 210 It should be evident that this is analogous to a binary tree search, 211 except that each node has a lot more than two descendants. 213 5. Details of blob representation 215 The first byte of each blob is a flag: 217 L P P P P P P P 219 The L bit is set if this is a leaf. The PPPPPPP is the implicit 220 prefix size. If all of the addresses in the blob have the same 221 initial bits as the name of the blob does, which is fairly likely due 222 to the way that IPv6 addresses are allocated, this is the number of 223 common initial bits. The common prefix bits are not stored in the 224 blob's entries, but are logically prefixed to each address in the 225 blob. 227 After that is some number of prefixes: 229 X S S S S S S S 230 Address 232 X is reserved for now. SSSSSSS is 0-127 (IPv6) or 0-31 (IPv4), the 233 prefix size minus one. The Address is 128-P-S or 32-P-S bits long, 234 rounded up to a byte. That is, it omits the common prefix bits which 235 are obtained from the name of the blob, and the suffix bits beyond 236 the prefix size. 238 For example, say an entry is 2001:1234:5678:9ABC::/64, and the common 239 prefix size is 16. Then the entry would be (in hex) 241 3f 12 34 56 78 9A BC 243 The 3f is the prefix size (64-1), the 2001 is omitted since that's 244 the prefix, and the rest is the address. 246 Each blob is as big as will fit in a DNS answer. If you don't 247 believe in EDNS0, the limit is about 450 bytes. If you do believe in 248 EDNS0, it's whatever size you think clients ask for, probably in the 249 2K to 4K range. 251 6. Building and updating DNSBLs 253 [[ Note: This section is somewhat speculative. I have experience 254 with disk-based B-trees, but haven't implemented any of this yet. ]] 256 DNSxLs are typically compiled as a list of prefixes and lengths. 257 They must be turned into a tree of named blobs before being served in 258 the DNS. The technique varies a little depending on whether the tree 259 will be updated incrementally, or rebuilt from scratch if it changes. 261 6.1. Building static DNSxLs 263 A static DNSxL should have the minimum number of blobs, each of which 264 should be as full as possible. The technique to build the tree is a 265 direct adaptation of the B-tree building technique in [WPBTREE]. 267 Start with a sorted list of prefix entries. Save one entry for the 268 next pass, then take as many entries as possible and make a blob out 269 of them. Repeat saving an entry and creating a blob until all 270 entries are used. These will be the leaf blobs. Now, take the list 271 of saved entries and repeat to create the blobs at the next level up. 272 Keep repeating to create each level of the tree until the process 273 creates only one blob. Insert the one saved entry into that blob 274 which will be the root. That might make the blob overflow, in which 275 case split it in half and move the first, middle, and last entries 276 into a new root blob. 278 When the list changes, rebuild it from scratch. 280 6.2. Building and updating dynamic DNSxLs 282 One of the reasons that B-trees are so widely used is that it is 283 possible to update them efficiently without rebuilding them. The 284 same should apply here. 286 The general approach to updates is to add or delete an entry, then if 287 that makes a blob too big or makes it empty, rebalance the tree to 288 fix the problem. If a blob is too big, move entries into an adjacent 289 blob if possible, otherwise split the blob. This will require 290 updating the blob above, which in turn might overflow if the update 291 involves adding rather than replacing an entry. (It might overflow 292 even with a replacement if it makes the compressible prefix shorter.) 293 In that case, repeat the process, potentially all the way to the 294 root. When deleting an entry, if the blob becomes empty, move its 295 pointer entry from the blob above up into one of the adjacent blobs, 296 then adjust the upper blob as needed. Again, this might cause 297 overflow in which case move entries between blobs or split the full 298 one. 300 A tree in which each blob is completely full is quite expensive to 301 update. The first insertion will cause a leaf to overflow, with 302 overflows rippling all way up the tree to the root. It would 303 probably be a good idea when building a list that is intended to be 304 updated to leave some slack space in each blob, to limit the ripple 305 effect from changes. 307 A significant difference between this design and a conventional 308 B-tree is the version skew due to DNS caching. In a normal B-tree, 309 the pages (blobs) are locked while being changed, and the changes are 310 immediately visible to all the clients. In this case, the clients 311 cache each blob for the DNS TTL. If updates change the entries in 312 non-leaf blobs, they will break the links between blobs since they 313 use the entries as pointers. A possible band-aid is to add temporary 314 CNAME records at the former names pointing to the closest new name, 315 so most (admittedly not all) of the entries can still be located. 316 Once the TTL on the old blobs has expired, the CNAMEs can be deleted. 318 7. Estimated performance 320 The size of entries varies depending on the length of the prefixes 321 and the amount of common prefix compression. A /64 with no common 322 prefix would take 9 bytes, so I'll use 10 bytes as an estimate of 323 average entry size. With EDNS0 and 4K records, that would allow 400 324 entries per blob. A two-level tree could hold 160,000 entries, a 325 three level tree 64 million entries, which would need 160,000 blobs. 326 Large v4 DNSBLs like the CBL have about seven million entries now, so 327 this should be adequate. If blobs have to fit in 512 byte responses, 328 that would be about 40 entries per blob. A five-level tree could 329 hold 100 million entries in about 2.5 million blobs, still adequate. 331 The number of queries for any particular lookup is the number of 332 levels, which is unlikely to be more than five in a DNSxL of 333 plausible size. The cache behavior obviously depends on both the 334 layout of the entries and the query pattern, but this design avoids 335 some obvious worst cases. If a /64 is either entirely listed, not 336 listed at all, or just has a single /128 listed, all queries for 337 addresses in that /64 will refetch the same four or five records. If 338 a large range of addresses is either listed in one prefix, or not 339 listed at all, all queries will refetch the same set of blobs, which 340 would be likely to be cached. 342 The total number of DNS records used is always less than the number 343 of records for a traditional entry-per-IP DNSxL for the same set of 344 entries. Since all the DNS queries are made by following the tree of 345 entries, clients shouldn't make queries that fail, so there will be 346 no negative cache entries. (This isn't quite true due to version 347 skew in updated DNSxLs, but it's hard to imagine a plausible scenario 348 in which there would be a lot of different failing queries.) This 349 suggests that the overall cache behavior will be no worse than, and 350 quite possibly much better than the behavior of traditional IPv4 351 DNSxLs. 353 8. Security considerations 355 Semantically, there is little difference between a DNSxL published 356 using this scheme and one published using the traditional entry per 357 IP approach, since both publish the operator's opinion about some 358 subset of the IP address space. 360 One significant practical difference is that it is much easier for 361 clients to obtain copies of all or part of the database. For a 362 traditional DNSxL, the only way to determine its contents is to query 363 the entire address space (or at least the active part of it) one 364 address at a time, which would require several billion queries for 365 IPv4, and is deterred by rate limiting the queries. In this scheme, 366 the names of all of the DNS records are easy for clients to 367 determine, so they can efficiently walk the tree. While rate 368 limiting is possible, it is less effective since clients fetch more 369 data with each query. It is also easy for a client to fetch all the 370 entries for a particular IP range, such as the range of a network the 371 client controls to see what parts of it are blacklisted. 373 9. Topics for further consideration 375 o Conventional IPv4 DNSBLs generally can return an A record or a TXT 376 record. The A record often has information coded in the low byte 377 or two that identifies the type of listing, or which of several 378 sub-lists the entry is on. If it's important to return a byte or 379 two of result, it would be straightforward to add the byte or two 380 to each entry, at a modest increase in entry size and hence some 381 loss in performance. The TXT records are typically either all the 382 same, or there's one per A value, perhaps with the IP address 383 interpolated into the string to provide a URL to look up. Those 384 strings could be constructed in client libraries, with templates 385 stored in the DNS, e.g. the string for code 42 might be in a TXT 386 record at 42._strings.dnsxl.example. 388 o There might be better ways to do prefix compression, e.g., a per- 389 entry field that says how many bits are the same as the previous 390 entry. Entries in blobs could be bit rather than byte aligned, 391 although I expect that would be a lot more work for minimal extra 392 compression. There may be clever tricks to allocate entries into 393 blobs to maximize the size of the prefix in each blob. If a blob 394 consists entirely of /128's it might be worth a special case, 395 leaving out the length byte on each entry. 397 o When adding a new entry to a full leaf blob, another possibility 398 would be to make the blob a non-leaf, and create two new leaves 399 below it. This would make updates faster and less disruptive, at 400 the cost of possibly slower lookups since some parts of the tree 401 will be deeper. Perhaps a hybrid approach would make sense, 402 rebuild or rebalance the tree when it gets too ragged, with more 403 than a one-level depth difference between the deepest and 404 shallowest leaves. 406 10. IANA considerations 408 This document makes no requests to IANA. All data are stored and 409 queried using existing DNS record types and operations. 411 11. References 413 11.1. References - Normative 415 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 416 STD 13, RFC 1034, November 1987. 418 [RFC1035] Mockapetris, P., "Domain names - implementation and 419 specification", STD 13, RFC 1035, November 1987. 421 11.2. References - Informative 423 [KNUTHV3] Knuth, D., "The Art of Computer Programming: Volume 3, 424 Sorting and Searching", 1998. 426 [RBLDNSD] Tokarev, M., "rbldnsd: Small Daemon for DNSBLs". 428 [RFC2308] Andrews, M., "Negative Caching of DNS Queries (DNS 429 NCACHE)", RFC 2308, March 1998. 431 [RFC4862] Thomson, S., Narten, T., and T. Jinmei, "IPv6 Stateless 432 Address Autoconfiguration", RFC 4862, September 2007. 434 [RFC5782] Levine, J., "DNS Blacklists and Whitelists", RFC 5782, 435 February 2010. 437 [WPBTREE] Wikipedia, "B-tree", December 2010. 439 Appendix A. Change Log 441 *NOTE TO RFC EDITOR: This section may be removed upon publication of 442 this document as an RFC.* 444 A.1. Changes from -00 to -01 446 Change CIDRs to prefixes. Allow for IPv4 addresses. 448 Add possible updates producing unbalanced trees. 450 Author's Address 452 John Levine 453 Taughannock Networks 454 PO Box 727 455 Trumansburg, NY 14886 457 Phone: +1 831 480 2300 458 Email: standards@taugh.com 459 URI: http://jl.ly