idnits 2.17.1 draft-bellovin-dnsext-bloomfilt-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 10 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 11 pages -- Found 11 instances of the string 'FORMFEED[Page...' -- is this a case of missing nroff postprocessing? Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 22 instances of too long lines in the document, the longest one being 5 characters in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 152: '...t return a Bloom filter RR SHOULD also...' RFC 2119 keyword, line 175: '...acks, the client MUST check that the p...' Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 2001) is 8165 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 93 -- Looks like a reference, but probably isn't: '2' on line 94 -- Possible downref: Non-RFC (?) normative reference: ref. 'Bloom70' ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034, RFC 4035) ** Downref: Normative reference to an Informational RFC: RFC 3174 -- Possible downref: Non-RFC (?) normative reference: ref. 'SHA2' Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Steven M. Bellovin 3 Internet Draft AT&T Labs Research 5 Expiration Date: June 2002 December 2001 7 Using Bloom Filters for Authenticated Yes/No Answers in the DNS 9 draft-bellovin-dnsext-bloomfilt-00.txt 11 1. Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet- Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 2. Abstract 34 Some aspects of DNSSEC, such as NXDOMAIN error messages, require an 35 authenticated answer. Producing this answer requires complex 36 mechanisms, online storage of the zone's secret key, expensive online 37 computations, or massive zone files. As an alternative, we propose 38 storage of authenticated pointers to Bloom filters. This scheme 39 provides large reductions in the size of, and computational expense 40 to produce, partially-signed zone files. 42 3. Introduction 44 Some aspects of DNSSEC [RFC2535], such as NXDOMAIN error messages, 45 require an authenticated answer. Producing this answer requires 46 complex mechanisms, online storage of the zone's secret key, 47 expensive online computations, or massive zone files. The current 48 scheme relies on NXT records, which have a number of troublesome 49 properties. Among these are the space required to store and transmit 50 them; additionally, some people have objected that NXT records make 51 it possible to dump a zone by chaining through NXT records. 53 The expense of storing DNSSEC records for a zone as large as .COM has 54 led some people to suggest an "opt-in" process: only those parties 55 who wish to have a signed record will have one. This raises the 56 question of how to receive an authenticated message saying that a 57 given RRset is not supposed to be signed. Two mechanisms, NOKEY and 58 NXT records, have been proposed; both have their disadvantages. 60 Both problems could be solved if the DNS server were to digitally 61 sign its answers. But this is too expensive in CPU time (and exposes 62 the server to denial of service attacks), and requires that the 63 signing key be available online, a serious security risk for major 64 zones such as the root and .COM. 66 We suggest using Bloom filters [Bloom70] to store yes/no answers to 67 such questions. The filter can be precomputed and presigned, but 68 queries to it are quite efficient. The total amount of storage and 69 computation required appear to be significantly less than is needed 70 for today's solutions. Furthermore, it is not possible to recover 71 names from a filter, thus protecting privacy. 73 One caveat must be mentioned. The filter needed for a zone such as 74 .COM is far too large to ship around in DNS responses. Accordingly, 75 we propose using indirect references to such filters. While this 76 seems to be an inconvenience, in fact using indirection allows load- 77 shifting and load-balancing. 79 4. Bloom Filters 81 A Bloom filter is a very efficient way to store information about the 82 existence of a record in a database. It is susceptible to false 83 positives; however, the probability of a false positive can be made 84 as small as desired. 86 A Bloom filter is an array of m bits, initialized to zero. It 87 requires a set of k hash functions that are independent and produce 88 uniformly distributed output in the range [0,m-1] on the possible 89 inputs. 91 To add an entry R to the filter, calculate 93 b[1] = H1(R) 94 b[2] = H2(R) 95 ... 96 b[k] = Hk(R) 98 and set bits b[i] to 1 in the array. 100 To see if a record exists, calculate the same b[i] and check the bit 101 values. If all k bits are 1, the record exists; if even a single bit 102 is 0, the record does not exist. 104 In a database of any reasonable size, it is not possible to determine 105 the input records from the bit array. Many different records can set 106 any one bit; there is no way to tell which records actually did. 108 If there are n records stored in the Bloom filter, the probability of 109 a false positive is given by 111 P = (1 - (1 - 1/m)**(k*n) )**k 113 which may be approximated by 115 P = (1 - exp(-k*n/m))**k 117 From the equations, it is clear that it is the ratio of the number of 118 records to the bit array size is what is important, rather than the 119 absolute size of either. Further, there is an optimal range for k; 120 values that are too small or too large will produce too many false 121 positives. The exact set of parameters for any filter need to be 122 chosen with some care. Some possible values for use in the DNS are 123 given in Appendix A. 125 5. A Bloom Filter Resource Record 127 As mentioned above, a Bloom filter for the DNS will be large: for the 128 .COM zone, at least 75 Mbytes. We clearly do not wish to transmit 129 such bit arrays on a routine basis! Accordingly, the filter RR 130 contains a URL pointing to the actual filter. 132 In order to query the filter, clients need to know k and m; they also 133 need to know what hash functions were used. We put these values in 134 the RR, with m in units of kilobytes. (Note: our initial value for 135 m will be around 600,000-800,000,000 bits. This is close enough to 136 what can be held in a 32-bit field that we prefer to buy our factor 137 of 8192 in advance.) 139 As is discussed below, it may be desirable to divide the filter into 140 chunks. It is therefore necessary to include the chunk size in the 141 RR as well. 143 Open issue: a public key (or certificate containing a public key) is 144 necessary to validate the filter. Where should this key be stored? 145 In a KEY record? In the RR? In some other record? It is not 146 necessary that this key be the same as the one used to sign the zone; 147 in one variant discussed below, it is better that the two not be the 148 same. 150 Since Bloom filters are specific to any given instance of a zone, the 151 SOA serial number is an essential part of the authentication process. 152 Accordingly, name servers that return a Bloom filter RR SHOULD also 153 return the relevant SOA record in the Additional Information section. 155 6. Using Bloom Filter RRs 157 A client who wishes to authenticate an NXDOMAIN response to a secure 158 query first obtains and authenticates a signed Bloom filter record. 159 It then calculates the b[i] values for the desired name, and checks 160 the bit positions in the Bloom filter. Finally, it authenticates the 161 content of the Bloom filter itself. We present two options for how 162 to perform these last two steps. 164 6.1. Using TLS for Filter Queries 166 To avoid the need to transmit a large bit array, one option is to 167 query a Bloom filter server via TLS [rfctls]. The client calculates 168 the bit positions, based on the domain name, k, and m, and then 169 queries the URL specified in the filter RR: 171 https://bloomfilter.ns.example.com?324+3248+23980+89732+... 173 The server responds with a simple yes/no answer. 175 To avoid attacks, the client MUST check that the public key in the 176 server's TLS certificate matches that returned by the RR. 178 Unfortunately, this scheme requires expensive public key operations 179 by the server for each query. Furthermore, it requires that a 180 private signature key be online. Fortunately, there is no need to 181 make this key the same as the zone-signing key. CPU load concerns 182 can be ameliorated by replicating the server, using any standard 183 load-sharing technique. Again, note that in general the contents of 184 the bit array need not be kept confidential. 186 6.2. Retrieving Filter Chunks 188 To avoid the need for online, server-based cryptography, we present 189 an alternate scheme based on bit array downloads. For this version, 190 the server divides the bit array into chunks. Each chunk is 191 digitally signed (the signature is calculated over the bit array 192 chunk, the metadata in the filter RR, and the SOA serial number). 193 The chunk size, and hence the number of chunks, is determined by the 194 tradeoff between download size and the number of chunks that must be 195 signed by the server. 197 A client that connects to the specified URL will receive the chunks 198 that contain the requested bit positions. (Open issue: should the 199 URL contain bit position numbers or chunk numbers? I suspect that 200 the former is better, since it means that both options can be 201 implemented using the same query syntax.) 203 A client need not query all k values. It can trade off its own need 204 for certainty, up to the limits set by k, against download time. 205 This determination can be done dynamically, since the chunk size and 206 k are in the Bloom filter RR. 208 Since the content of any given chunk is static during the lifetime of 209 that zone instance, chunk URLs can be cached or distributed by any 210 content distribution network. To avoid confusion and cache coherency 211 issues at zone change time, we recommend that the SOA serial number 212 for the zone be included in the filename portion of the URL. 214 7. Dealing with False Positives 216 As noted, false positives are possible with Bloom Filters. The 217 implications differ for different possible uses of the technology. 219 For authenticating NXDOMAIN requests, there is no ready recourse. A 220 false positive means that a name server has returned an error report 221 for a domain that the Bloom filter claims exists. This could be a 222 false positive, or it could be the exact form of attack that the 223 Bloom filter mechanism is intended to prevent. Palliative measures 224 include retrying the query from a client not believed to be under 225 attack, or waiting for a new instance of the zone file, and hence a 226 different bit array (see below). 228 The situation is brighter for opt-in secure zones. In this latter 229 case, the bit array represents zones for which signature records 230 should exist. A server building a bit array can check the remaining 231 names in the zone to see if there are any false positives. If so, a 232 NOKEY record can be generated for such names. This greater tolerance 233 of false positives permits selection of filter parameters that yield 234 smaller bit array sizes and/or fewer hash function calculations. Of 235 course, that must be traded off against the extra signed NOKEY 236 records. 238 8. Hash Function Families 240 The behavior of Bloom filters depends strongly on the quality of the 241 hash functions that are used. One good choice is to use SHA2-512 242 [SHA2], which produces 512 bits of uniformly distributed output. 243 This can be divided into 16 32-bit hash values, which in turn can be 244 reduced modulo m to represent the output of 16 hash functions. If 245 more hash functions are needed, a counter can be concatenated with 246 the the name: 248 SHA2("1" || name) 249 SHA2("2" || name) 250 ... 252 If we set k to 8, we can use SHA2-256, which is significantly 253 cheaper. 255 To avoid persistent problems from false positives, it may be 256 desirable to change the hash function for each new zone instance. 257 This is most easily done by including the zone serial number in the 258 hash: 260 SHA2("1" || name || serial) 261 SHA2("2" || name || serial) 262 ... 264 Open issue: is this desirable? It leads to more unpredictability in 265 behavior from day to day. On the other hand, addition of any new 266 records to the zone can generate new false positives. 268 It is not clear that the cryptographic properties of SHA2 are helpful 269 here. There does not appear to be any advantage to an enemy who can 270 deliberately cause collisions. Accordingly, it might make sense to 271 investigate cheaper hash functions. 273 9. Performance Issues 275 To process a zone as suggested above, one or two invocations of 276 SHA2-512 per name are needed. Signing an RRset would require a 277 single invocation of a cheaper hash function (probably SHA1 278 [RFC3174]) per name, plus a very expensive digital signature 279 operation. Until the percentage of signed RRsets becomes quite high, 280 it is clear that Bloom filters are much cheaper. 282 Chunk size determination relies on the tradeoff between the number of 283 chunk signatures that must be computed versus the bandwidth needed to 284 retrieve chunks. In general, one should opt for more signatures and 285 smaller chunks; the signing operations happen once per zone, while 286 the chunks are retrieved frequently. A 1 KB chunk size is not 287 unreasonable, but that would require 75,000 signatures for a 75 MB 288 bit array. 290 Bit array size per se is not a major issue, unless many replicas of 291 the data are desired. 293 10. Dynamic Update 295 Bloom filters are not compatible with certain forms of dynamic 296 updates. However, the problem caused is bounded and often 297 manageable. Ultimately, the acceptability will depend on the rate of 298 dynamic updates and the number of chunks used. 300 Deleting records is easy: do nothing. A deleted record will still 301 appear in the bit array, but that manifests itself as a false 302 positive, a problem inherent to Bloom filters. A new filter should 303 be calculated and distributed when the number of deletions has raised 304 the false positive response probability to an unacceptable level. If 305 necessary, the initial selection of k and m can be adjusted to 306 compensate for some predicted rate of deletions. 308 Additions are trickier. Conceptually, all that is necessary is to 309 calculate the new bit positions that need to be set to 1; however, 310 the existence of signed chunks complicates the matter. The exact 311 behavior will depend on the addition rate and upon the number of 312 chunks. 314 Without trying to calculate the exact probability distribution, it is 315 clear that the number of chunks changed per unit time is bounded by 316 the product of k and the number of update operations. As long as the 317 computer signing the chunks can keep up with that rate of operations, 318 there should not be a problem. And the network bandwidth required is 319 minimal; all that needs to be sent out is a set of new signatures, 320 plus the bit positions that must be turned on in each chunk. In 321 particular, it is not necessary to redistribute the entire bit array. 323 11. IANA Considerations 325 New families of hash functions may be defined and registered with 326 IANA. Registration may be done only by means of a standards-track 327 RFC. 329 12. Security Considerations 331 If the signatures specified here are checked, there do not appear to 332 be any correctness issues. However, the chunk retrieval protocol may 333 be abused to flood the server. Note that this server need not be co- 334 located with the zone servers; doing that would limit the effect of 335 such an attack. 337 As with other aspects of DNSSEC, replay attacks are possible. An 338 enemy could return a stale -- but signed -- Bloom filter RR in 339 response to a query. Similar attacks can be carried out against the 340 chunk retrieval protocol, but not against the TLS variant. 342 13. References 344 [Bloom70] "Space/time trade-offs in hash coding with allowable 345 errors". Bloom, B. H. Communications of ACM 13, 7 (July 1970), 346 pp. 422-426. 348 [RFC2535] "Domain Name System Security Extensions". D. Eastlake. 349 March 1999. 351 [RFC3174] "US Secure Hash Algorithm 1 (SHA1)". D. Eastlake. September 352 2001. 354 [SHA2] "Secure Hash Standard", Draft Federal Information 355 Processing Standards Publication 180-2, May 2001. 357 14. Author Information 359 Steven M. Bellovin 360 AT&T Labs Research 361 Shannon Laboratory 362 180 Park Avenue 363 Florham Park, NJ 07932 364 Phone: +1 973-360-8656 365 email: bellovin@acm.org 366 A. Appendix: Bloom Filter Parameters 368 Below is a table showing the false positive probability p for various 369 values of k and ratios of m/n. We set n -- the number of entries in 370 the zone -- to 25,000,000, the approximate size of .COM at this time. 372 While the actual choices dependon the acceptable false positive rate, 373 the choices of (8,32), (8,40), (16,32), and (16,40) seem to produce 374 favorable ratios of false positive rate to chunk size and bit array 375 size. 377 p k m/n 378 -------------------------- 379 0.000000005065 24 40 380 0.000000005112 32 40 381 0.000000010765 40 40 382 0.000000019475 16 40 383 0.000000216758 24 32 384 0.000000330046 16 32 385 0.000000422277 32 32 386 0.000001165717 8 40 387 0.000001366603 40 32 388 0.000005731505 8 32 389 0.000009874368 16 24 390 0.000016565253 24 24 391 0.000041690856 8 24 392 0.000055936473 32 24 393 0.000230939749 40 24 394 0.000574496205 8 16 395 0.000649828308 16 16 396 0.002335383727 24 16 397 0.009530761107 32 16 398 0.025491730341 8 8 399 0.032516117391 40 16 400 0.097625617676 16 8 401 0.293563779663 24 8 402 0.553477428654 32 8 403 0.763051324818 40 8 405 The following (ugly) ASCII graph summarizes the calculations: 407 1 ++--+----------+----------+---------+----------+------*************** 408 + + + + **************(1 - exp(-k/8))**k ****** + 409 0.1 ++ ************** (1 - exp(-k/16))**k ######++ 410 +************** (1 - exp(-k/24))**k $$$$#### 411 0.01 ** (1 - exp(-k/32))**k######%++ 412 + ###(1 - exp(-k/40))**k @@@@@@ + 413 + ############# + 414 0.001 ########################## ++ 415 + $$$$$$ 416 0.0001 ++ $$$$$$$$$$$$$ ++ 417 $$$$$$ $$$$$$$$$$$$$$$ + 418 1e-05 ++ $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ ++ 419 %%%% + 420 1e-06 @@ %%%%%% %%%%%%% 421 +@@@ %%%%%%%%%% %%%%%%%%%%%%%%%%% + 422 + @@@@@ %%%%%%%%%%%%%%%%%%%%%%%%%% + 423 1e-07 ++ @@@@@ ++ 424 + @@@@@@@@ + 425 1e-08 ++ @@@@@@@@@@@@@@@@ @@@@@@@@@@@@@@@@@@@@@ 426 + + + + +@@@@@@@@@@@ + + 427 1e-09 ++--+----------+----------+---------+----------+---------+---------++ 428 10 15 20 25 30 35 40