idnits 2.17.1 

draft-bellovin-dnsext-bloomfilt-00.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 10
     longer pages, the longest (page 2) being 60 lines

  == It seems as if not all pages are separated by form feeds - found 0 form
     feeds but 11 pages

  -- Found 11 instances of the string 'FORMFEED[Page...' -- is this a case of
     missing nroff postprocessing?


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.

  ** There are 22 instances of too long lines in the document, the longest
     one being 5 characters in excess of 72.

  ** The document seems to lack a both a reference to RFC 2119 and the
     recommended RFC 2119 boilerplate, even if it appears to use RFC 2119
     keywords. 

     RFC 2119 keyword, line 152: '...t return a Bloom filter RR SHOULD also...'
     RFC 2119 keyword, line 175: '...acks, the client MUST check that the p...'


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 2001) is 8165 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 93

  -- Looks like a reference, but probably isn't: '2' on line 94

  -- Possible downref: Non-RFC (?) normative reference: ref. 'Bloom70'

  ** Obsolete normative reference: RFC 2535 (Obsoleted by RFC 4033, RFC 4034,
     RFC 4035)

  ** Downref: Normative reference to an Informational RFC: RFC 3174

  -- Possible downref: Non-RFC (?) normative reference: ref. 'SHA2'


     Summary: 8 errors (**), 0 flaws (~~), 3 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                 Steven M. Bellovin
3	Internet Draft                                        AT&T Labs Research

5	Expiration Date: June 2002                                 December 2001

7	    Using Bloom Filters for Authenticated Yes/No Answers in the DNS

9	                 draft-bellovin-dnsext-bloomfilt-00.txt

11	1. Status of this Memo

13	   This document is an Internet-Draft and is in full conformance with
14	   all provisions of Section 10 of RFC2026.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups.  Note that
18	   other groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet- Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt

29	   The list of Internet-Draft Shadow Directories can be accessed at
30	   http://www.ietf.org/shadow.html.

32	2. Abstract

34	   Some aspects of DNSSEC, such as NXDOMAIN error messages, require an
35	   authenticated answer.  Producing this answer requires complex
36	   mechanisms, online storage of the zone's secret key, expensive online
37	   computations, or massive zone files.  As an alternative, we propose
38	   storage of authenticated pointers to Bloom filters.  This scheme
39	   provides large reductions in the size of, and computational expense
40	   to produce, partially-signed zone files.

42	3. Introduction

44	   Some aspects of DNSSEC [RFC2535], such as NXDOMAIN error messages,
45	   require an authenticated answer.  Producing this answer requires
46	   complex mechanisms, online storage of the zone's secret key,
47	   expensive online computations, or massive zone files.  The current
48	   scheme relies on NXT records, which have a number of troublesome
49	   properties.  Among these are the space required to store and transmit
50	   them; additionally, some people have objected that NXT records make
51	   it possible to dump a zone by chaining through NXT records.

53	   The expense of storing DNSSEC records for a zone as large as .COM has
54	   led some people to suggest an "opt-in" process:  only those parties
55	   who wish to have a signed record will have one.  This raises the
56	   question of how to receive an authenticated message saying that a
57	   given RRset is not supposed to be signed.  Two mechanisms, NOKEY and
58	   NXT records, have been proposed; both have their disadvantages.

60	   Both problems could be solved if the DNS server were to digitally
61	   sign its answers.  But this is too expensive in CPU time (and exposes
62	   the server to denial of service attacks), and requires that the
63	   signing key be available online, a serious security risk for major
64	   zones such as the root and .COM.

66	   We suggest using Bloom filters [Bloom70] to store yes/no answers to
67	   such questions.  The filter can be precomputed and presigned, but
68	   queries to it are quite efficient.  The total amount of storage and
69	   computation required appear to be significantly less than is needed
70	   for today's solutions.  Furthermore, it is not possible to recover
71	   names from a filter, thus protecting privacy.

73	   One caveat must be mentioned.  The filter needed for a zone such as
74	   .COM is far too large to ship around in DNS responses.  Accordingly,
75	   we propose using indirect references to such filters.  While this
76	   seems to be an inconvenience, in fact using indirection allows load-
77	   shifting and load-balancing.

79	4. Bloom Filters

81	   A Bloom filter is a very efficient way to store information about the
82	   existence of a record in a database.  It is susceptible to false
83	   positives; however, the probability of a false positive can be made
84	   as small as desired.

86	   A Bloom filter is an array of m bits, initialized to zero.  It
87	   requires a set of k hash functions that are independent and produce
88	   uniformly distributed output in the range [0,m-1] on the possible
89	   inputs.

91	   To add an entry R to the filter, calculate

93	        b[1] = H1(R)
94	        b[2] = H2(R)
95	        ...
96	        b[k] = Hk(R)

98	   and set bits b[i] to 1 in the array.

100	   To see if a record exists, calculate the same b[i] and check the bit
101	   values.  If all k bits are 1, the record exists; if even a single bit
102	   is 0, the record does not exist.

104	   In a database of any reasonable size, it is not possible to determine
105	   the input records from the bit array.  Many different records can set
106	   any one bit; there is no way to tell which records actually did.

108	   If there are n records stored in the Bloom filter, the probability of
109	   a false positive is given by

111	        P = (1 - (1 - 1/m)**(k*n) )**k

113	   which may be approximated by

115	        P = (1 - exp(-k*n/m))**k

117	   From the equations, it is clear that it is the ratio of the number of
118	   records to the bit array size is what is important, rather than the
119	   absolute size of either.  Further, there is an optimal range for k;
120	   values that are too small or too large will produce too many false
121	   positives.  The exact set of parameters for any filter need to be
122	   chosen with some care.  Some possible values for use in the DNS are
123	   given in Appendix A.

125	5. A Bloom Filter Resource Record

127	   As mentioned above, a Bloom filter for the DNS will be large: for the
128	   .COM zone, at least 75 Mbytes.  We clearly do not wish to transmit
129	   such bit arrays on a routine basis!  Accordingly, the filter RR
130	   contains a URL pointing to the actual filter.

132	   In order to query the filter, clients need to know k and m; they also
133	   need to know what hash functions were used.  We put these values in
134	   the RR, with m in units of kilobytes.  (Note:  our initial value for
135	   m will be around 600,000-800,000,000 bits.  This is close enough to
136	   what can be held in a 32-bit field that we prefer to buy our factor
137	   of 8192 in advance.)

139	   As is discussed below, it may be desirable to divide the filter into
140	   chunks.  It is therefore necessary to include the chunk size in the
141	   RR as well.

143	   Open issue:  a public key (or certificate containing a public key) is
144	   necessary to validate the filter.  Where should this key be stored?
145	   In a KEY record?  In the RR?  In some other record?  It is not
146	   necessary that this key be the same as the one used to sign the zone;
147	   in one variant discussed below, it is better that the two not be the
148	   same.

150	   Since Bloom filters are specific to any given instance of a zone, the
151	   SOA serial number is an essential part of the authentication process.
152	   Accordingly, name servers that return a Bloom filter RR SHOULD also
153	   return the relevant SOA record in the Additional Information section.

155	6. Using Bloom Filter RRs

157	   A client who wishes to authenticate an NXDOMAIN response to a secure
158	   query first obtains and authenticates a signed Bloom filter record.
159	   It then calculates the b[i] values for the desired name, and checks
160	   the bit positions in the Bloom filter.  Finally, it authenticates the
161	   content of the Bloom filter itself.  We present two options for how
162	   to perform these last two steps.

164	6.1. Using TLS for Filter Queries

166	   To avoid the need to transmit a large bit array, one option is to
167	   query a Bloom filter server via TLS [rfctls].  The client calculates
168	   the bit positions, based on the domain name, k, and m, and then
169	   queries the URL specified in the filter RR:

171	        https://bloomfilter.ns.example.com?324+3248+23980+89732+...

173	   The server responds with a simple yes/no answer.

175	   To avoid attacks, the client MUST check that the public key in the
176	   server's TLS certificate matches that returned by the RR.

178	   Unfortunately, this scheme requires expensive public key operations
179	   by the server for each query.  Furthermore, it requires that a
180	   private signature key be online.  Fortunately, there is no need to
181	   make this key the same as the zone-signing key.  CPU load concerns
182	   can be ameliorated by replicating the server, using any standard
183	   load-sharing technique.  Again, note that in general the contents of
184	   the bit array need not be kept confidential.

186	6.2. Retrieving Filter Chunks

188	   To avoid the need for online, server-based cryptography, we present
189	   an alternate scheme based on bit array downloads.  For this version,
190	   the server divides the bit array into chunks.  Each chunk is
191	   digitally signed (the signature is calculated over the bit array
192	   chunk, the metadata in the filter RR, and the SOA serial number).
193	   The chunk size, and hence the number of chunks, is determined by the
194	   tradeoff between download size and the number of chunks that must be
195	   signed by the server.

197	   A client that connects to the specified URL will receive the chunks
198	   that contain the requested bit positions.  (Open issue: should the
199	   URL contain bit position numbers or chunk numbers?  I suspect that
200	   the former is better, since it means that both options can be
201	   implemented using the same query syntax.)

203	   A client need not query all k values.  It can trade off its own need
204	   for certainty, up to the limits set by k, against download time.
205	   This determination can be done dynamically, since the chunk size and
206	   k are in the Bloom filter RR.

208	   Since the content of any given chunk is static during the lifetime of
209	   that zone instance, chunk URLs can be cached or distributed by any
210	   content distribution network.  To avoid confusion and cache coherency
211	   issues at zone change time, we recommend that the SOA serial number
212	   for the zone be included in the filename portion of the URL.

214	7. Dealing with False Positives

216	   As noted, false positives are possible with Bloom Filters.  The
217	   implications differ for different possible uses of the technology.

219	   For authenticating NXDOMAIN requests, there is no ready recourse.  A
220	   false positive means that a name server has returned an error report
221	   for a domain that the Bloom filter claims exists.  This could be a
222	   false positive, or it could be the exact form of attack that the
223	   Bloom filter mechanism is intended to prevent.  Palliative measures
224	   include retrying the query from a client not believed to be under
225	   attack, or waiting for a new instance of the zone file, and hence a
226	   different bit array (see below).

228	   The situation is brighter for opt-in secure zones.  In this latter
229	   case, the bit array represents zones for which signature records
230	   should exist.  A server building a bit array can check the remaining
231	   names in the zone to see if there are any false positives.  If so, a
232	   NOKEY record can be generated for such names.  This greater tolerance
233	   of false positives permits selection of filter parameters that yield
234	   smaller bit array sizes and/or fewer hash function calculations.  Of
235	   course, that must be traded off against the extra signed NOKEY
236	   records.

238	8. Hash Function Families

240	   The behavior of Bloom filters depends strongly on the quality of the
241	   hash functions that are used.  One good choice is to use SHA2-512
242	   [SHA2], which produces 512 bits of uniformly distributed output.
243	   This can be divided into 16 32-bit hash values, which in turn can be
244	   reduced modulo m to represent the output of 16 hash functions.  If
245	   more hash functions are needed, a counter can be concatenated with
246	   the the name:

248	        SHA2("1" || name)
249	        SHA2("2" || name)
250	        ...

252	   If we set k to 8, we can use SHA2-256, which is significantly
253	   cheaper.

255	   To avoid persistent problems from false positives, it may be
256	   desirable to change the hash function for each new zone instance.
257	   This is most easily done by including the zone serial number in the
258	   hash:

260	        SHA2("1" || name || serial)
261	        SHA2("2" || name || serial)
262	        ...

264	   Open issue:  is this desirable?  It leads to more unpredictability in
265	   behavior from day to day.  On the other hand, addition of any new
266	   records to the zone can generate new false positives.

268	   It is not clear that the cryptographic properties of SHA2 are helpful
269	   here.  There does not appear to be any advantage to an enemy who can
270	   deliberately cause collisions.  Accordingly, it might make sense to
271	   investigate cheaper hash functions.

273	9. Performance Issues

275	   To process a zone as suggested above, one or two invocations of
276	   SHA2-512 per name are needed.  Signing an RRset would require a
277	   single invocation of a cheaper hash function (probably SHA1
278	   [RFC3174]) per name, plus a very expensive digital signature
279	   operation.  Until the percentage of signed RRsets becomes quite high,
280	   it is clear that Bloom filters are much cheaper.

282	   Chunk size determination relies on the tradeoff between the number of
283	   chunk signatures that must be computed versus the bandwidth needed to
284	   retrieve chunks.  In general, one should opt for more signatures and
285	   smaller chunks; the signing operations happen once per zone, while
286	   the chunks are retrieved frequently.  A 1 KB chunk size is not
287	   unreasonable, but that would require 75,000 signatures for a 75 MB
288	   bit array.

290	   Bit array size per se is not a major issue, unless many replicas of
291	   the data are desired.

293	10. Dynamic Update

295	   Bloom filters are not compatible with certain forms of dynamic
296	   updates.  However, the problem caused is bounded and often
297	   manageable.  Ultimately, the acceptability will depend on the rate of
298	   dynamic updates and the number of chunks used.

300	   Deleting records is easy: do nothing.  A deleted record will still
301	   appear in the bit array, but that manifests itself as a false
302	   positive, a problem inherent to Bloom filters.  A new filter should
303	   be calculated and distributed when the number of deletions has raised
304	   the false positive response probability to an unacceptable level.  If
305	   necessary, the initial selection of k and m can be adjusted to
306	   compensate for some predicted rate of deletions.

308	   Additions are trickier.  Conceptually, all that is necessary is to
309	   calculate the new bit positions that need to be set to 1; however,
310	   the existence of signed chunks complicates the matter.  The exact
311	   behavior will depend on the addition rate and upon the number of
312	   chunks.

314	   Without trying to calculate the exact probability distribution, it is
315	   clear that the number of chunks changed per unit time is bounded by
316	   the product of k and the number of update operations.  As long as the
317	   computer signing the chunks can keep up with that rate of operations,
318	   there should not be a problem.  And the network bandwidth required is
319	   minimal; all that needs to be sent out is a set of new signatures,
320	   plus the bit positions that must be turned on in each chunk.  In
321	   particular, it is not necessary to redistribute the entire bit array.

323	11. IANA Considerations

325	   New families of hash functions may be defined and registered with
326	   IANA.  Registration may be done only by means of a standards-track
327	   RFC.

329	12. Security Considerations

331	   If the signatures specified here are checked, there do not appear to
332	   be any correctness issues.  However, the chunk retrieval protocol may
333	   be abused to flood the server.  Note that this server need not be co-
334	   located with the zone servers; doing that would limit the effect of
335	   such an attack.

337	   As with other aspects of DNSSEC, replay attacks are possible.  An
338	   enemy could return a stale -- but signed -- Bloom filter RR in
339	   response to a query.  Similar attacks can be carried out against the
340	   chunk retrieval protocol, but not against the TLS variant.

342	13. References

344	   [Bloom70] "Space/time trade-offs in hash coding with allowable
345	        errors". Bloom, B. H. Communications of ACM 13, 7 (July 1970),
346	        pp. 422-426.

348	   [RFC2535] "Domain Name System Security Extensions". D. Eastlake.
349	        March 1999.

351	   [RFC3174] "US Secure Hash Algorithm 1 (SHA1)". D. Eastlake. September
352	        2001.

354	   [SHA2]    "Secure Hash Standard", Draft Federal Information
355	        Processing Standards Publication 180-2, May 2001.

357	14. Author Information

359	Steven M. Bellovin
360	AT&T Labs Research
361	Shannon Laboratory
362	180 Park Avenue
363	Florham Park, NJ 07932
364	Phone: +1 973-360-8656
365	email: bellovin@acm.org
366	A. Appendix:  Bloom Filter Parameters

368	   Below is a table showing the false positive probability p for various
369	   values of k and ratios of m/n.  We set n -- the number of entries in
370	   the zone -- to 25,000,000, the approximate size of .COM at this time.

372	   While the actual choices dependon the acceptable false positive rate,
373	   the choices of (8,32), (8,40), (16,32), and (16,40) seem to produce
374	   favorable ratios of false positive rate to chunk size and bit array
375	   size.

377	      p          k    m/n
378	--------------------------
379	0.000000005065   24   40
380	0.000000005112   32   40
381	0.000000010765   40   40
382	0.000000019475   16   40
383	0.000000216758   24   32
384	0.000000330046   16   32
385	0.000000422277   32   32
386	0.000001165717    8   40
387	0.000001366603   40   32
388	0.000005731505    8   32
389	0.000009874368   16   24
390	0.000016565253   24   24
391	0.000041690856    8   24
392	0.000055936473   32   24
393	0.000230939749   40   24
394	0.000574496205    8   16
395	0.000649828308   16   16
396	0.002335383727   24   16
397	0.009530761107   32   16
398	0.025491730341    8    8
399	0.032516117391   40   16
400	0.097625617676   16    8
401	0.293563779663   24    8
402	0.553477428654   32    8
403	0.763051324818   40    8

405	The following (ugly) ASCII graph summarizes the calculations:

407	      1 ++--+----------+----------+---------+----------+------***************
408	        +   +          +          + **************(1 - exp(-k/8))**k ****** +
409	    0.1 ++            **************             (1 - exp(-k/16))**k ######++
410	        +**************                          (1 - exp(-k/24))**k $$$$####
411	   0.01 **                                       (1 - exp(-k/32))**k######%++
412	        +                                     ###(1 - exp(-k/40))**k @@@@@@ +
413	        +                         #############                             +
414	  0.001 ##########################                                         ++
415	        +                                                              $$$$$$
416	 0.0001 ++                                                 $$$$$$$$$$$$$   ++
417	        $$$$$$                              $$$$$$$$$$$$$$$                 +
418	  1e-05 ++   $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$                              ++
419	        %%%%                                                                +
420	  1e-06 @@ %%%%%%                                                     %%%%%%%
421	        +@@@     %%%%%%%%%%                          %%%%%%%%%%%%%%%%%      +
422	        +   @@@@@          %%%%%%%%%%%%%%%%%%%%%%%%%%                       +
423	  1e-07 ++      @@@@@                                                      ++
424	        +            @@@@@@@@                                               +
425	  1e-08 ++                   @@@@@@@@@@@@@@@@           @@@@@@@@@@@@@@@@@@@@@
426	        +   +          +          +         +@@@@@@@@@@@         +          +
427	  1e-09 ++--+----------+----------+---------+----------+---------+---------++
428	           10         15         20        25         30        35         40