idnits 2.17.1 

draft-lear-lisp-nerd-08.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (March 6, 2010) is 5166 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-24) exists of draft-ietf-lisp-06

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Obsolete informational reference (is this intentional?): RFC  977
     (Obsoleted by RFC 3977)

  -- Obsolete informational reference (is this intentional?): RFC 4346
     (Obsoleted by RFC 5246)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-lisp-alt-02

  == Outdated reference: A later version (-16) exists of draft-ietf-lisp-ms-04


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                            E. Lear
3	Internet-Draft                                        Cisco Systems GmbH
4	Intended status: Experimental                              March 6, 2010
5	Expires: September 7, 2010

7	               NERD: A Not-so-novel EID to RLOC Database
8	                      draft-lear-lisp-nerd-08.txt

10	Abstract

12	   LISP is a protocol to encapsulate IP packets in order to allow end
13	   sites to multihome without injecting routes from one end of the
14	   Internet to another.  This memo presents an experimental database and
15	   a discussion of methods to transport the mapping of EIDs to RLOCs to
16	   routers in a reliable, scalable, and secure manner.  Our analysis
17	   concludes that transport of of all EID/RLOC mappings scales well to
18	   at least 10^8 entries.

20	Status of this Memo

22	   This Internet-Draft is submitted to IETF in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF), its areas, and its working groups.  Note that
27	   other groups may also distribute working documents as Internet-
28	   Drafts.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   The list of current Internet-Drafts can be accessed at
36	   http://www.ietf.org/ietf/1id-abstracts.txt.

38	   The list of Internet-Draft Shadow Directories can be accessed at
39	   http://www.ietf.org/shadow.html.

41	   This Internet-Draft will expire on September 7, 2010.

43	Copyright Notice

45	   Copyright (c) 2010 IETF Trust and the persons identified as the
46	   document authors.  All rights reserved.

48	   This document is subject to BCP 78 and the IETF Trust's Legal
49	   Provisions Relating to IETF Documents
50	   (http://trustee.ietf.org/license-info) in effect on the date of
51	   publication of this document.  Please review these documents
52	   carefully, as they describe your rights and restrictions with respect
53	   to this document.  Code Components extracted from this document must
54	   include Simplified BSD License text as described in Section 4.e of
55	   the Trust Legal Provisions and are provided without warranty as
56	   described in the BSD License.

58	Table of Contents

60	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
61	     1.1.  Base Assumptions . . . . . . . . . . . . . . . . . . . . .  3
62	     1.2.  What is NERD?  . . . . . . . . . . . . . . . . . . . . . .  4
63	     1.3.  Glossary . . . . . . . . . . . . . . . . . . . . . . . . .  4
64	   2.  Theory of Operation  . . . . . . . . . . . . . . . . . . . . .  5
65	     2.1.  Database Updates . . . . . . . . . . . . . . . . . . . . .  5
66	     2.2.  Communications between ITR and ETR . . . . . . . . . . . .  6
67	     2.3.  Who are database authorities?  . . . . . . . . . . . . . .  6
68	   3.  NERD Format  . . . . . . . . . . . . . . . . . . . . . . . . .  7
69	     3.1.  NERD Record Format . . . . . . . . . . . . . . . . . . . .  9
70	     3.2.  Database Update Format . . . . . . . . . . . . . . . . . . 10
71	   4.  NERD Distribution Mechanism  . . . . . . . . . . . . . . . . . 10
72	     4.1.  Initial Bootstrap  . . . . . . . . . . . . . . . . . . . . 10
73	     4.2.  Retrieving Changes . . . . . . . . . . . . . . . . . . . . 11
74	   5.  Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
75	     5.1.  Database Size  . . . . . . . . . . . . . . . . . . . . . . 13
76	     5.2.  Router Throughput Versus Time  . . . . . . . . . . . . . . 14
77	     5.3.  Number of Servers Required . . . . . . . . . . . . . . . . 14
78	     5.4.  Security Considerations  . . . . . . . . . . . . . . . . . 16
79	       5.4.1.  Use of Public Key Infrastructures (PKIs) . . . . . . . 17
80	       5.4.2.  Other Risks  . . . . . . . . . . . . . . . . . . . . . 19
81	   6.  Why not use XML? . . . . . . . . . . . . . . . . . . . . . . . 20
82	   7.  Other Distribution Mechanisms  . . . . . . . . . . . . . . . . 20
83	     7.1.  What About DNS as a retrieval model? . . . . . . . . . . . 21
84	     7.2.  Use of BGP and LISP+ALT  . . . . . . . . . . . . . . . . . 22
85	     7.3.  Perhaps use a hybrid model?  . . . . . . . . . . . . . . . 22
86	   8.  Deployment Issues  . . . . . . . . . . . . . . . . . . . . . . 23
87	     8.1.  HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
88	   9.  Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 23
89	   10. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 24
90	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 25
91	   12. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 25
92	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
93	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 25
94	     13.2. Informative References . . . . . . . . . . . . . . . . . . 25
95	   Appendix A.  Generating and verifying the database signature
96	                with OpenSSL  . . . . . . . . . . . . . . . . . . . . 27
97	   Appendix B.  Changes . . . . . . . . . . . . . . . . . . . . . . . 28
98	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29

100	1.  Introduction

102	   Locator/ID Separation Protocol (LISP) [I-D.ietf-lisp] separates an IP
103	   address used by a host and local routing system from the locators
104	   advertised by BGP participants on the Internet in general, and in the
105	   default free zone (DFZ) in particular.  It accomplishes this by
106	   establishing a mapping between globally unique endpoint identifiers
107	   (EIDs) and routing locators (RLOCs).  This reduces the amount of
108	   state change that occurs on routers within the default-free zone on
109	   the Internet, while enabling end sites to be multihomed.

111	   In some mapping distribution approaches to LISP the mapping is
112	   learned via data-triggered control messages between ingress tunnel
113	   routers (ITRs) and egress tunnel routers (ETRs) through an alternate
114	   routing topology [I-D.ietf-lisp-alt].  In other approaches of LISP,
115	   the mapping from EIDs to RLOCs is instead learned through some other
116	   means.  This memo addresses different approaches to the problem, and
117	   specifies a Not-so-novel EID RLOC Database (NERD) and methods to both
118	   receive the database and to receive updates.

120	   NERD is offered primarily as a way to avoid dropping packets, the
121	   underlying assumption being that dropping packets is bad for
122	   applications and end users.  Those who do not agree with this
123	   underlying assumption may find that other approaches make more sense.

125	   NERD is specified in such a way that the methods used to distribute
126	   or retrieve it may vary over time.  Multiple databases are supported
127	   in order to allow for multiple data sources.  An effort has been made
128	   to divorce the database from access methods so that both can evolve
129	   independently through experimentation and operational validation.

131	1.1.  Base Assumptions

133	   In order to specify a mapping it is important to understand how it
134	   will be used, and the nature of the data being mapped.  In the case
135	   of LISP, the following assumptions are pertinent:

137	   o  The data contained within the mapping changes only on provisioning
138	      or configuration operations, and is not intended to change when a
139	      link either fails or is restored.  Some other mechanism such as
140	      the use of LISP Reachability Bits with mapping replies handles
141	      healing operations, particularly when a tail circuit within an
142	      service provider's aggregate goes down.  NERD can be used as a
143	      verification method to ensure that whatever operational mapping
144	      changes an ITR receives are authorized.
145	   o  While weight and priority are defined, these are not hop-by-hop
146	      metrics.  Hence the information contained within the mapping does
147	      not change based on where one sits within the topology.

149	   o  A purpose of LISP being to reduce control plane overhead by
150	      reducing "rate X state" complexity, updates to the mapping will be
151	      relatively rare.
152	   o  Because NERD is designed to ease interdomain routing, its use is
153	      intended within the inter-domain environment.  That is, NERD is
154	      best implemented at either the customer edge or provider edge, and
155	      there will be on the order of as many ITRs and EID Prefixes as
156	      there are connections to Internet Service Providers by end
157	      customers.
158	   o  As such, NERD cannot be the sole means to implement host mobility,
159	      although NERD may be in used in conjunction with other mechanisms.

161	1.2.  What is NERD?

163	   NERD is a Not-so-novel EID to RLOC Database.  It consists of the
164	   following components:

166	   1.  a network database format;
167	   2.  a change distribution format;
168	   3.  a database retrieval/bootstrapping method;
169	   4.  a change distribution method.

171	   The network database format is compressible.  However, at this time
172	   we specify no compression method.  NERD will make use of potentially
173	   several transport methods, but most notably HTTP [RFC2616].  HTTP has
174	   restart and compression capabilities.  It is also widely deployed.

176	   There exist many methods to show differences between two versions of
177	   a database or a file, UNIX's "diff" being the classic example.  In
178	   this case, because the data is well structured and easily keyed, we
179	   can make use of a very simple format for version differences that
180	   simply provides a list of EID/RLOC mappings that have changed using
181	   the same record format as the database, and a list of EIDs that are
182	   to be removed.

184	1.3.  Glossary

186	   The reader is once again referred to [I-D.ietf-lisp] for a general
187	   glossary of terms related to LISP.  The following terms are specific
188	   to this memo.

190	   Base Distribution URI:  An Absolute-URI as defined in Section 4.3 of
191	      [RFC3986] from which other references are relative.  The base
192	      distribution URI is used to construct a URI to an EID/RLOC mapping
193	      database.  If more than one NERD is known then there will be one
194	      or more base distribution URIs associated with each (although each
195	      such base distribution URI may have the same value).

197	   EID Database Authority:  The authority that will sign database files
198	      and updates.  It is the source of both.

200	   The Authority:  Shorthand for the EID Database Authority.

202	   NERD:  (N)ot-so-novel (E)ID to (R)LOC (D)atabase.

204	   AFI  Address Family Identifier.

206	   Pull Model:  An architecture where clients pull only the information
207	      they need at any given time, such as when a packet arrives for
208	      forwarding.

210	   Push Model:  An architecture in which clients receive an entire
211	      dataset, containing data they may or may not require, such as
212	      mappings for EIDs that no host served is attempting to send to.

214	   Hybrid Model:  An architecture in which some information is pushed
215	      toward the receiver from a source and some information is pulled
216	      by the receiver.

218	2.  Theory of Operation

220	   Operational functions are split into two components: database updates
221	   and state exchange between ITR and ETR during a communication.

223	2.1.  Database Updates

225	   What follows is a summary of how NERDs are generated and updated.
226	   Specifics can be found in Section 3.  The general way in which NERD
227	   works is as follows:

229	   1.  A NERD is generated by an authority that allocates provider
230	       independent (PI) addresses (e.g., IANA or an RIR) which are used
231	       by sites as EIDs.  As part of this process the authority
232	       generates a digest for the database and signs it with a private
233	       key whose public key is part of an X.509 certificate.
234	       [ITU.X509.2000] That signature along with a copy of the
235	       authority's public key is included in the NERD.
236	   2.  The NERD is distributed to a group of well known servers.
237	   3.  ITRs retrieve an initial copy of the NERD via HTTP when they come
238	       into service.

240	   4.  ITRs are preconfigured with a group of certificates whose private
241	       keys are used by database authorities to sign the NERD.  This
242	       list of certificates should be configurable by administrators.
243	   5.  ITRs next verify both the validity of the public key and the
244	       signed digest.  If either fail validation, the ITR attempts to
245	       retrieve the NERD from a different source.  The process iterates
246	       until either a valid database is found or the list of sources is
247	       exhausted.
248	   6.  Once a valid NERD is retrieved, the ITR installs it into both
249	       non-volatile and local memory.
250	   7.  At some point the authority updates the NERD and increments the
251	       database version counter.  At the same time it generates a list
252	       of changes, which it also signs, as it does with the original
253	       database.
254	   8.  Periodically ITRs will poll from their list of servers to
255	       determine if a new version of the database exists.  When a new
256	       version is found, an ITR will attempt to retrieve a change file,
257	       using its list of preconfigured servers.
258	   9.  The ITR validates a change file just as it does the original
259	       database.  Assuming the change file passes validation, the ITR
260	       installs new entries, overwrites existing ones, and removes empty
261	       entries, based on the content of the change file.

263	   As time goes on it is quite possible that an ITR may probe a list of
264	   configured peers for a database or change file copy.  It is equally
265	   possible that peers might advertise to each other the version number
266	   of their database.  Such methods are not explored in depth in this
267	   memo, but are mentioned for future consideration.

269	2.2.  Communications between ITR and ETR

271	   [I-D.ietf-lisp] describes the basic approach to what happens when a
272	   packet arrives at an ITR, and what communications between ITR and ETR
273	   take place.  NERD provides an optimistic approach to establishing
274	   communications with an ETR that is responsible for a given EID
275	   prefix.  State must be kept, however, on an ITR to determine whether
276	   that ETR is in fact reachable.  It is expected that this is a common
277	   requirement across LISP mapping systems, and will be handled in the
278	   core LISP architecture.

280	2.3.  Who are database authorities?

282	   This memo does not specify who the database authority is.  That is
283	   because there are several possible operational models.  In each case
284	   the number of database authorities is meant to be small so that ITRs
285	   need only keep a small list of authorities, similar to the way a name
286	   server might cache a list of root servers.

288	   o  A single database authority exists.  In this case all entries in
289	      the database are registered to a single entity, and that entity
290	      distributes the database.  Because the EID space is provider
291	      independent address space, there is no architectural requirement
292	      that address space be hierarchically distributed to anyone, as
293	      there is with provider-assigned address space.  Hence, there is a
294	      natural affinity between the IANA function and the database
295	      authority function.
296	   o  Each region runs a database authority.  In this case, provider
297	      independent address space is allocated to either Regional Internet
298	      Registries (RIRs) or to affiliates of such organizations of
299	      network operations guilds (NOGs).  The benefit of this approach is
300	      that there is no single organization that controls the database.
301	      It allows one database authority to backup another.  One could
302	      envision as many as ten database authorities in this scenario.
303	      One drawback to this approach, however, is that any reference to a
304	      region imposes a notion of locality, thus potentially diminishing
305	      the split between locator and identifier.
306	   o  Each country runs a database authority.  This could occur should
307	      countries decide to regulate this function.  While limiting the
308	      scope of any single database authority as the previous scenario
309	      describes, this approach would introduce some overhead as the list
310	      of database authorities would grow to as many as 200, and possibly
311	      more if jurisdictions within countries attempted to regulate the
312	      function.  There are two drawbacks to this approach.  First, as
313	      distribution of EIDs is driven to more local jurisdictions, an EID
314	      prefix is tied even tighter to a location.  Second, a large number
315	      of database authorities will demand some sort of discovery
316	      mechanism.
317	   o  Independent operators manage database authorities.  This has the
318	      appeals of being location independent, and enabling competition
319	      for good performance.  This method has the drawback of potentially
320	      requiring a discovery mechanism.

322	   The latter two approaches are not mutually exclusive.  While this
323	   specification allows for multiple databases, discovery mechanisms are
324	   left as future work.

326	3.  NERD Format

328	   The NERD consists of a header that contains a database version and a
329	   signature that is generated by ignoring the signature field and
330	   setting the authentication block length to 0 (NULL).  The
331	   authentication block itself consists of a signature and a certificate
332	   whose private key counterpart was used to generate the signature.

334	   Records are kept sorted in numeric order with AFI plus EID as primary
335	   key and prefix length as secondary.  This is so that after a database
336	   update it should be possible to reconstruct the database to verify
337	   the digest signature, which may be retrieved separately from the
338	   database for verification purposes.

340	        0                   1                   2                   3
341	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
342	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
343	       | Schema Vers=1 |  DB Code      |     Database Name Size        |
344	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
345	       |                      Database Version                         |
346	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
347	       |                   Old Database Version or 0                   |
348	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
349	       |                                                               |
350	       |                        Database Name                          |
351	       |                                                               |
352	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
353	       |       PKCS#7 Block Size       |          Reserved             |
354	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
355	       |                                                               |
356	       |      PKCS#7 Block containing Certificate and Signature        |
357	       |                                                               |
358	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

360	   Database Header

362	   The DB Code indicates 0 if what follows is an entire database or 1 if
363	   what follows is an update.  The database file version is incremented
364	   each time the complete database is generated by the authority.  In
365	   the case of an update, the database file version indicates the new
366	   database file version, and the old database file version is indicated
367	   in the "old DB version" field.  The database file version is used by
368	   routers to determine whether or not they have the most current
369	   database.

371	   The database name is an ASCII-encoded domain name, as specified in
372	   [RFC5321].  This is the name that will appear in the Subject field of
373	   the certificate used to verify the database.  The purpose of the
374	   database name is to allow for more than one database.  Such databases
375	   would be merged by the router.  It is important that an EID/RLOC
376	   mapping be listed in no more than one database, lest inconsistencies
377	   arise.  However, it may be possible to transition a mapping from one
378	   database to another.  During the transition period, the mappings
379	   would be identical.  When they are not, the resultant behavior will
380	   be undefined.  The database name is padded with NULLs to the nearest
381	   fourth byte.

383	   The PKCS#7 [RFC2315] authentication block contains a DER encoded
384	   [ITU.X509.2000] signature and associated public key.  For purposes of
385	   this experiment all implementations will support the RSA encryption
386	   signature algorithm and SHA1 digest algorithm, and the standard
387	   attributes are expected to be present.

389	   N.B., it has been suggested that Cryptographic Message Syntax (CMS)
390	   [RFC5652] be used instead of PKCS#7.  At the time of this writing,
391	   CMS is not yet widely deployed.  However, it is certainly the correct
392	   direction, and should be strongly considered should NERD be
393	   standardized.

395	3.1.  NERD Record Format

397	   As distributed over the network, NERD records appear as follows:

399	        0                   1                   2                   3
400	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
401	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
402	       | Num. RLOCs    | EID Pref. Len  |           EID AFI            |
403	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
404	       |                       End point identifier                    |
405	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
406	       | Priority 1    |    Weight 1   |             AFI 1             |
407	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
408	       |                       Routing Locator 1                       |
409	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
410	       | Priority 2    |    Weight 2   |             AFI 2             |
411	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
412	       |                       Routing Locator 2                       |
413	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
414	       | Priority 3    |    Weight 3   |             AFI 3             |
415	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
416	       |                       Routing Locator 3...                    |
417	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

419	   EID AFI is the AFI of the EID.  Priority N and Weight N, and AFI N
420	   are associated with Routing Locator N. There will always be at least
421	   one routing locator.  The minimum record size for IPv4 is 16 bytes.

423	   Each additional IPv4 RLOC increases the record size by 8 bytes.  The
424	   purpose of this format is to keep the database compact, but somewhat
425	   easily read.  The meaning of weight and priority are described in
426	   [I-D.ietf-lisp].  The format of the AFI is specified by IANA as
427	   "Address Family Numbers", with the exception of how IPv6 EID prefixes
428	   are stored.

430	   In order to reduce storage and transmission amounts for IPv6, only
431	   the necessary number of bytes of an EID as specified by the prefix
432	   length are kept in the record, rounded to the nearest four byte
433	   (word) boundary.  For instance, if the prefix length is /49, the
434	   nearest four-byte word boundary would require that eight bytes are
435	   stored.  IPv6 RLOCs are represented as normal 128-bit IPv6 addresses.

437	3.2.  Database Update Format

439	   A database update contains a set of changes to an existing database.
440	   Each AFI/EID/mask-length tuple may have zero or more RLOCs associated
441	   with it.  In the case where there are no RLOCs, the EID entry is
442	   removed from the database.  Records that contain EIDs and prefix
443	   lengths that were not previously listed are simply added.  Otherwise,
444	   the old record for the EID and prefix length is replaced by the more
445	   current information.  The record format used by the a database update
446	   is the same as described in Section 3.1.

448	4.  NERD Distribution Mechanism

450	4.1.  Initial Bootstrap

452	   Bootstrap occurs when a router needs to retrieve the entire database.
453	   It knows it needs to retrieve the entire database because either it
454	   has none or an update too substantial to process, as might be the
455	   case if a router has been out of service for a substantially lengthy
456	   period of time.

458	   To bootstrap the ITR appends the database name plus "/current/
459	   entiredb" to a Base Distribution URI and retrieves the file via HTTP.
460	   More formally (using ABNF from [RFC5234]):

462	      entire-db =    base-uri dbname "/current/entiredb"
463	      base-uri  =    uri ; From RFC 3986
464	      dbname    =    Domain ; from RFC5321

466	   For example,if the base distribution URI is
467	   "http://www.example.com/eiddb/", and assuming a database name of
468	   "nerd.arin.net", the ITR would request

470	   "http://www.example.com/eiddb/nerd.arin.net/current/entiredb".
471	   Routers check the signature on the database prior to installing it,
472	   and check that the database schema matches a schema they understand.
473	   Once a router has a valid database it stores that database in some
474	   sort of non-volatile memory (e.g., disk, flash memory, etc).

476	   N.B., the host component for such URIs should not resolve to a LISP
477	   EID, lest a circular dependency be created.

479	4.2.  Retrieving Changes

481	   In order to retrieve a set of database changes an ITR will have
482	   previously retrieved the entire database.  Hence it knows the current
483	   version of the database it has.  Its first step for retrieving
484	   changes is to retrieve the current version of the database.  It does
485	   so by appending "/current/version" to the base distribution URI and
486	   database name and retrieving the file.  Its format is text and it
487	   contains the integer value of the current database version.

489	   Once an ITR has retrieved the current version it compares the version
490	   of its local copy.  If there is no difference, then the router is up
491	   to date and need take no further actions until it next checks.

493	   If the versions differ, the router next sends a request for the
494	   appropriate change file by appending "current/changes/" and the
495	   textual representation of the version of its local copy of the
496	   database to the base distribution URI.  More formally:

498	      db-version    =    base-uri dbname "/current/version"
499	      db-curupdate  =    base-uri dbname "/current/changes/" old-version
500	      old-version   =    1*DIGIT

502	   For example, if the current version of the database is 1105503 and
503	   router's version is 1105500, and the base URI and database name are
504	   the same as above, the router would first request
505	   "http://www.example.com/eiddb/nerd.arin.net/current/version" to
506	   determine that it is out of date, and to also learn the current
507	   version.  It would then attempt to retrieve
508	   "http://www.example.com/eiddb/nerd.arin.net/current/changes/1105500".

510	   The server may not have that change file, either because there are
511	   too many versions between what the router has and what is current, or
512	   because no such change file was generated.  If the server has changes
513	   from the routers version to any later version, the server issues an
514	   HTTP redirect to that change file, and the router retrieves and
515	   process it.  More formally:

517	      db-incupdate    =    base-uri dbname "/" newer-version
518	                           "/changes/" old-version
519	      newer-version   =    1*DIGIT

521	   For example:

523	   "http://www.example.com/eiddb/nerd.arin.net/1105450/changes/1105401"
524	   would update a router from version 1105401 to 1105450.  Once it has
525	   done so, the router should then repeat the process until it has
526	   brought itself up to date.

528	   This begs the question: how does a router know to retrieve version
529	   1105450 in our example above?  It cannot.  A redirect must be given
530	   by the server to that URI when the router attempts to retrieve
531	   differences from the current version, say, 1105503.

533	   While it is unlikely that database versions would wrap, as they
534	   consists of 32 bit integers, should the event occur, ITRs should
535	   attempt first to retrieve a change file when their current version
536	   number is within 10,000 of 2^32 and they see a version available that
537	   is less than 10,000.  Barring the availability of a change file, the
538	   ITR can still assume that the database version has wrapped and
539	   retrieve a new copy.  It may be safer in future work to include
540	   additional wrap information or a larger field to avoid having to use
541	   any heuristics.

543	5.  Analysis

545	   We will start our analysis by looking at how much data will be
546	   transferred to a router during bootstrap conditions.  We will then
547	   look at the bandwidth required.  Next we will turn our concerns to
548	   servers.  Finally we will ponder the effect of providing only
549	   changes.

551	   In the analysis below we treat the overhead of the database header as
552	   insignificant (because it is).  The analysis should be similar,
553	   whether a single database or multiple databases are employed, as we
554	   would assume that no entry would appear more than once.

556	5.1.  Database Size

558	   By its very nature the information to be transported is relatively
559	   static and is specifically designed to be topologically insensitive.
560	   That is, every ITR is intended to have the same set of RLOCs for a
561	   given EID.  While some processing power will be necessary to install
562	   a table, the amount required should be far less than that of a
563	   routing information database because the level of entropy is intended
564	   to be lower.

566	   For purposes of this analysis, we will assume that the world has
567	   migrated to IPv6, as this increases the size of the database, which
568	   would be our primary concern.  However, to mitigate the size
569	   increase, we have limited the size of the prefix transmitted.  For
570	   purposes of this analysis, we shall assume an average prefix length
571	   of 64 bits.

573	   Based on that assumption, Section 3.1 states that mapping information
574	   for each EID/Prefix includes a group of RLOCs, each with an
575	   associated priority and weight, and that a minimum record size with
576	   IPv6 EIDs with at least one RLOC is 30 bytes uncompressed.  Each
577	   additional IPv6 RLOC costs 20 bytes.

579	                 +-----------+--------+--------+---------+
580	                 | 10^n EIDs | 2 RLOC | 4 RLOC |  8 RLOC |
581	                 +-----------+--------+--------+---------+
582	                 |         4 | 500 KB | 900 KB | 1.70 MB |
583	                 |         5 | 5.0 MB | 9.0 MB | 17.0 MB |
584	                 |         6 |  50 MB |  90 MB |  170 MB |
585	                 |         7 | 500 MB | 900 MB | 1.70 GB |
586	                 |         8 | 5.0 GB | 9.0 GB | 17.0 GB |
587	                 +-----------+--------+--------+---------+

589	    Database size for IPv6 routes with average prefix length = 64 bits

591	                                  Table 1

593	   Entries in the above table are derived as follows:

595	        E * (30 + 20 * (R - 1 ))

597	   where E = number of EIDs (10^n), R = number of RLOCs per EID.

599	   Our scaling target is to accommodate 10^8 multihomed systems, which
600	   is one order magnitude greater than what is discussed in [CARP07].
601	   At 10^8 entries, a device could be expected to use between 5 and 17
602	   gigabytes of RAM for the mapping.  No matter the method of
603	   distribution, any router that sits in the core of the Internet would
604	   require near this amount of memory in order to perform the ITR
605	   function.  Large enterprise ETRs would be similarly strained, simply
606	   due to the diversity of of sites that communicate with one another.
607	   The good news is that this is not our starting point, but rather our
608	   scaling target, a number that we intend to reach by the year 2050.
609	   Our starting point is more likely in the neighborhood of 10^4 or 10^5
610	   EIDs, thus requiring between 500KB and 17 MB.

612	5.2.  Router Throughput Versus Time

614	        +-------------------+---------+--------+---------+-------+
615	        | Table Size (10^N) |   1mb/s | 10mb/s | 100mb/s | 1gb/s |
616	        +-------------------+---------+--------+---------+-------+
617	        |                 6 |       8 |    0.8 |    0.08 | 0.008 |
618	        |                 7 |      80 |      8 |     0.8 |  0.08 |
619	        |                 8 |     800 |     80 |       8 |   0.8 |
620	        |                 9 |   8,000 |    800 |      80 |     8 |
621	        |                10 |  80,000 |  8,000 |     800 |    80 |
622	        |                11 | 800,000 | 80,000 |   8,000 |   800 |
623	        +-------------------+---------+--------+---------+-------+

625	                     Number of seconds to process NERD

627	                                  Table 2

629	   The length of time it takes to process the database is significant in
630	   models where the device acquires the entire table.  During this
631	   period of time, either the router will be unable to route packets
632	   using LISP or it must use some sort of query mechanism for specific
633	   EIDs as the rest it populates its table through the transfer.
634	   Table 2 shows us that at our scaling target, the length of time it
635	   would take for a router using 1 mb/s of bandwidth is about 80
636	   seconds.  We can measure the processing rate in small numbers of
637	   hours for any transfer speed greater than that.  The fastest
638	   processing time shows us as taking 8 seconds to process an entire
639	   table of 10^9 bytes and 80 seconds for 10^10 bytes.

641	5.3.  Number of Servers Required

643	   As easy as it may be for a router to retrieve, the aggregate
644	   information may be difficult for servers to transmit, assuming the
645	   information is transmitted in aggregate (we'll revisit that
646	   assumption later).

648	   +----------------+------------+-----------+------------+------------+
649	   | # Simultaneous | 10 Servers |       100 |      1,000 |     10,000 |
650	   |       Requests |            |   Servers |    Servers |    Servers |
651	   +----------------+------------+-----------+------------+------------+
652	   |            100 |        720 |        72 |         72 |         72 |
653	   |          1,000 |      7,200 |       720 |         72 |         72 |
654	   |         10,000 |     72,000 |     7,200 |        720 |         72 |
655	   |        100,000 |    720,000 |    72,000 |      7,200 |        720 |
656	   |      1,000,000 |  7,200,000 |   720,000 |     72,000 |      7,200 |
657	   |     10,000,000 | 72,000,000 | 7,200,000 |    720,000 |     72,000 |
658	   +----------------+------------+-----------+------------+------------+

660	     Retrieval time per number of servers in seconds.  Assumes average
661	   10^8 entries with 4 RLOCs per EID and that each server has access to
662	    1gb/s and 100% efficient use of that bandwidth and no compression.

664	                                  Table 3

666	   Entries in the above table were generated using the following method:

668	   For 10^8 entries with four RLOCs per EID, the table size is 9.0GB,
669	   per our previous table.  Assume 1 Gb/s transfer rates and 100%
670	   utilization.  Protocol overhead is ignored for this exercise.  Hence
671	   a single transfer X takes 48 seconds and can get no faster.

673	   With this in mind, each entry is as follows:

675	            max(1X,N*X/S)

677	     where N=number of transfers, X = 72 seconds,
678	     and S = number of servers.

680	   If we have a distribution model which every device must retrieve the
681	   mapping information upon start, Table 3 shows the length of time in
682	   seconds it will take for a given number of servers to complete a
683	   transfer to a given number of devices.  This table says, as an
684	   example, that it would take 72,000 seconds (20 hours) for one million
685	   ITRs to simultaneously retrieve the database from one thousand
686	   servers, assuming equal load distribution.  Should a cold start
687	   scenario occur, this number should be of some concern.  Hence it is
688	   important to take some measures both to avoid such a scenario, and to
689	   ease the load should it occur.  The primary defense should be for
690	   ITRs to first attempt to retrieve their databases from their peers or
691	   upstream providers.  Secondary defenses could include data sanity
692	   checks within ITRs, with agreed norms for how much the database
693	   should change in any given update or over any given period of time.

695	   As we will see below, dissemination of changes is considerably less
696	   volume.

698	     +----------------+-------------+---------------+----------------+
699	     | % Daily Change | 100 Servers | 1,000 Servers | 10,000 Servers |
700	     +----------------+-------------+---------------+----------------+
701	     |           0.1% |         300 |            30 |              3 |
702	     |           0.5% |        1500 |           150 |             15 |
703	     |             1% |        3000 |           300 |             30 |
704	     |             5% |      15,000 |          1500 |            150 |
705	     |            10% |      30,000 |          3000 |            300 |
706	     +----------------+-------------+---------------+----------------+

708	     Assuming 10 million routers and a database size of 9GB, resulting
709	   transfer times for hourly updates are shown in seconds, given number
710	     of servers and daily rate of change.  Note that when insufficient
711	    resources are devoted to servers, an unsustainable situation arises
712	   where updates for the next batch would begin prior to the completion
713	                           of the current batch.

715	                                  Table 4

717	   This table shows us that with 10,000 servers the average transfer
718	   time with 1Gb/s links for 10,000,000 routers will be 300 seconds with
719	   10% daily change spread over 24 hourly updates.  For a 0.1% daily
720	   change, that number is 3 seconds for a database of size 9.0GB.

722	   The amount of change goes to the purpose of LISP.  If its purpose is
723	   to provide effective multihoming support to end customers, then we
724	   might anticipate relatively few changes.  If, on the other, service
725	   providers attempt to make use of LISP to provide some form of traffic
726	   engineering, we can expect the same data to change more often.  We
727	   can probably not conclude much in this regard without additional
728	   operational experience.  The one thing we can say is that different
729	   applications of the LISP protocol may require new and different
730	   distribution mechanisms.  Such optimization is left for another day.

732	5.4.  Security Considerations

734	   Whichever the answer to our previous question, we must consider the
735	   security of the information being transported.  If an attacker can
736	   forge an update or tamper with the database, he can in effect
737	   redirect traffic to end sites.  Hence, integrity and authenticity of
738	   the NERD is critical.  In addition, a means is required to determine
739	   whether a source is authorized to modify a given database.  No data
740	   privacy is required.  Quite to the contrary, this information will be
741	   necessary for any ITR.

743	   The first question one must ask is who to trust to provide the ITR a
744	   mapping.  Ultimately the owner of the EID prefix is most
745	   authoritative for the mapping to RLOCs.  However, were all owners to
746	   sign all such mappings, ITRs would need to know which owner is
747	   authorized to modify which mapping, creating a problem of O(N^2)
748	   complexity.

750	   We can reduce this problem substantially by investing some trust in a
751	   small number of entities that are allowed to sign entries.  If
752	   authority manages EIDs much the same way a domain name registrar
753	   handles domains, then the owner of the EID would choose a database
754	   authority she or he trusts, and ITRs must trust each such authority
755	   in order to map the EIDs listed by that authority to RLOCs.  This
756	   reduces the amount of management complexity on the ETR to retaining
757	   knowledge of O(#authorities), but does require that each authority
758	   establish procedures for authenticating the owner of an EID.  Those
759	   procedures needn't be the same.

761	   There are two classic methods to ensure integrity of data:

763	   o  secure transport of the source of the data to the consumer, such
764	      as Transport Layer Security (TLS) [RFC4346]; and
765	   o  provide object level security.

767	   These methods are not mutually exclusive, although one can argue
768	   about the need for the former, given the latter.

770	   In the case of TLS, when it is properly implemented, the objects
771	   being transported cannot easily be modified by interlopers or so-
772	   called men in the middle.  When data objects are distributed to
773	   multiple servers, each of those servers must be trusted.  As we have
774	   seen above, we could have quite a large number of servers, thus
775	   providing an attacker a large number of targets.  We conclude that
776	   some form of object level security is required.

778	   Object level security involves an authority signing an object in a
779	   way that can easily be verified by a consumer, in this case a router.
780	   In this case, we would want the mapping table and any incremental
781	   update to be signed by the originator of the update.  This implies
782	   that we cannot simply make use of a tool like CVS [CVS].  Instead,
783	   the originator will want to generate diffs, sign them, and make them
784	   available either directly or through some sort of content
785	   distribution or peer to peer network.

787	5.4.1.  Use of Public Key Infrastructures (PKIs)

789	   X.509 provides a certificate hierarchy that has scaled to the size of
790	   the Internet.  The system is most manageable when there are few
791	   certificates to manage.  The model proposed in this memo makes use of
792	   one current certificate per database authority.  The two pieces of
793	   information necessary to verify a signature, therefore, are as
794	   follows:

796	   o  the certificate of the database authority, which can be provided
797	      along with the database; and
798	   o  the certificate authority's certificate.

800	   The latter two pieces of information must be very well known and must
801	   be configured on each ITR.  It is expected that both would change
802	   very rarely, and it would not be unreasonable for such updates to
803	   occur as part of a normal OS release process.

805	   The tools for both signing and verifying are readily available.
806	   OpenSSL [1] provides tools and libraries for both signing and
807	   verifying.  Other tools commonly exist.

809	   Use of PKIs is not without implementation, operational complexity or
810	   risk.  The following risks and mitigations are identified with NERD's
811	   use of PKIs:

813	   If a NERD database authority private key is exposed:

815	      In this case an attacker could sign a false database update,
816	      either redirecting traffic, or otherwise causing havoc.  In this
817	      case, the NERD database administrator must revoke its existing key
818	      and issue a new one.  The certificate is added to a certificate
819	      revocation list (CRL), which may be distributed with both this and
820	      other databases, as well as through other channels.  Because this
821	      event is expected to be rare, and the number of database
822	      authorities is expected to be small, a CRL will be small.  When a
823	      router receives a revocation, it checks it against its existing
824	      databases, and attempts to update the one that is revoked.  This
825	      implies that prior to issuing the revocation, the database
826	      authority would sign an update with the new key.  Routers would
827	      discard updates they have already received that were signed after
828	      the revocation was generated.  If a router cannot confirm that
829	      whether the authority's certificate was revoked before or after a
830	      particular update, it will retrieve a fresh new copy of the
831	      database with a valid signature.

833	   The private key associated with a CA in the chain of trust of the
834	   Authority's certificate is compromised:

836	      In this case, it becomes possible for an attacker to masquerade as
837	      the database authority.  To ameliorate damage, the database
838	      authority revokes its certificate and get a new certificate issued
839	      from a CA that is not compromised.  Once it has done so, the
840	      previous procedure is followed.  The compromised certificate can
841	      be removed during the normal operating system upgrade cycle.  In
842	      the case of the root authority, the situation could be more
843	      serious.  Updates to the OS in the IRT need to be validated prior
844	      to installation.  One possible method of doing this is provided in
845	      [RFC4108].  Trust Anchors are assumed to be updated as part of an
846	      OS update, implementers should consider using a key other than the
847	      trust anchor for validating OS updates.

849	   An algorithm used if either the certificate or the signature is
850	   cracked:

852	      This is a catastrophic failure and the above forms of attack
853	      become possible.  The only mitigation is to make use of a new
854	      algorithm.  In theory this should be possible, but in practice has
855	      proved very difficult.  For this reason, additional work is
856	      recommended to make alternative algorithms available.

858	   The Database Authority loses its key or disappears:

860	      In this case nobody can update the existing database.  There are
861	      few programmatic mitigations.  If the database authority places
862	      its private keys and suitable amounts of information escrow, under
863	      agreed upon circumstances, such as no updates for three days, for
864	      example, the escrow agent would release the information to a party
865	      competent of generating a database update.

867	5.4.2.  Other Risks

869	   Because this specification does not require secure transport, if an
870	   attacker prevents updates to an ITR for the purposes of having that
871	   ITR continue to use a compromised ETR, the ITR could continue to use
872	   an old version of the database without realizing a new version has
873	   been made available.  If one is worried about such an attack, a
874	   secure channel such as SSL to a secure chain back to the database
875	   authority should be used.  It is possible that after some operational
876	   experience, later versions of this format will contain additional
877	   semantics to address this attack.  SSL would also prevent attempts
878	   spoof false database versions on the server.

880	   As discussed above, substantial risk would be a cold start scenario.
881	   If an attacker found a bug in a common operating system that allowed
882	   it to erase an ITR's database, and was able to disseminate that bug,
883	   the collective ability of ITRs to retrieve new copies of the database
884	   could be taxed by collective demand.  The remedy to this is for
885	   devices to share copies of the database with their peers, thus making
886	   each potential requester a potential service.

888	6.  Why not use XML?

890	   Many objects these days are distributed as either XML pages or
891	   something derived as XML [W3C.REC-xml11-20040204], such as SOAP
892	   [W3C.REC-soap12-part1-20070427],[W3C.REC-soap12-part2-20070427].  Use
893	   of such well known standards allows for high level tools and library
894	   reuse.  XML's strength is extensibility.  Without a doubt XML would
895	   be more extensible than a fixed field database.  Why not, then, use
896	   these standards in this case?  The greatest concern the author had
897	   was compactness of the data stream.  In as much as this mechanism is
898	   used at all in the future, so long as that concern could be
899	   addressed, and so long as signatures of the database can be verified,
900	   XML probably should be considered.

902	7.  Other Distribution Mechanisms

904	   We now consider various different mechanisms.  The problem of
905	   distributing changes in various databases is as old as databases.
906	   The author is aware of two obvious approaches that have been well
907	   used in the past.  One approach would be the wide distribution of CVS
908	   repositories.  However, for reasons mentioned in the previous
909	   section, CVS is insufficient to the task.

911	   The other tried and true approach is the use of periodic updates in
912	   the form of messages.  Good old NNTP [RFC0977] itself provides two
913	   separate mechanisms (one push and another pull) to provide a coherent
914	   update process.  This was in fact used to update molecular biology
915	   databases [gb91] in the early 1990s.  Netnews offers a way to
916	   determine whether articles with specified Article-Ids have been
917	   received.  In the case where the mapping file source of authority
918	   wishes to transmit updates, it can sign a change file and then post
919	   it into the network.  Routers merely need to keep a record of article
920	   ids that it has received.  Netnews systems have years ago handled far
921	   greater volume of traffic than we envision. [2] Initially this is
922	   probably overkill, but it may not be so later in this process.  Some
923	   consideration should be given to a mechanism known to widely
924	   distribute vast amounts of data, as instantaneously either the sender
925	   or the receiver wishes.

927	   To attain an additional level of hierarchy in the distribution
928	   network, service providers could retrieve information to their own
929	   local servers, and configure their routers with the host portion of
930	   the above URI.

932	   Another possibility would be for providers to establish an agreement
933	   on a small set of anycast addresses for use for this purpose.  There
934	   are limitations to the use of anycast, particularly with TCP.  In the
935	   midst of a routing flap anycast address can become all but unusable.
936	   Careful study of such a use as well as appropriate use of HTTP
937	   redirects is expected.

939	7.1.  What About DNS as a retrieval model?

941	   It has been proposed that a query/response mechanism be used for this
942	   information, and that specifically the domain name system (DNS)
943	   [RFC1034] be used.  The previous models do not preclude the DNS.  DNS
944	   has the advantage that the administrative lines are well drawn, and
945	   that the ID/RLOC mapping is likely to appear very close to these
946	   boundaries.  DNS also has the added benefit that an entire
947	   distribution infrastructure already exists.  There are, however, some
948	   problems that could impact end hosts when intermediate routers make
949	   queries, some of which were first pointed out in [RFC1383]:

951	   o  Any query mechanism offers an opportunity for a resource attack if
952	      an attacker can force the ITR to query for information.  In this
953	      case, all that would be necessary would be for a "botnet" (a group
954	      of computers that have been compromised and used as vehicles to
955	      attack others) to ping or otherwise contact via some normal
956	      service hosts that sit behind the ETR.  If the botnet hosts
957	      themselves are behind ETRs, the victim's ITR will need to query
958	      for each and every one of them, thus becoming part of a classic
959	      reflector attack.
960	   o  Packets will be delayed at the very least, and probably dropped in
961	      the process of a mapping query.  This could be at the beginning of
962	      a communication, but it will be impossible for a router to
963	      conclude with certainty that this is the case.
964	   o  The DNS has a backoff algorithm that presumes that applications
965	      are making queries prior to the beginning of a communication.
966	      This is appropriate for end hosts who know in fact when a
967	      communication begins.  An end user may not enjoy a router waiting
968	      seconds for a retry.
969	   o  While the administrative lines may appear to be correct, the
970	      location of name servers may not be.  If name servers sit within
971	      PI address space, thus requiring LISP to reach, a circular
972	      dependency is created.  This is precisely where many enterprise
973	      name servers sit.  The LISP experiment should not predicate its
974	      success on relocation of such name servers.

976	   Never-the-less, DNS may be able to play a role in providing the
977	   enterprise control over the mapping of its EIDs to RLOCs.  Posit a
978	   new DNS record "EID2RLOC".  This record is used by the authority to
979	   collect and aggregate mapping information so that it may be
980	   distributed through one of the other mechanisms.  As an example:

982	      $ORIGIN 0.10.PI-SPACE.
983	       128   EID2RLOC   mask 23 priority 10 weight 5 172.16.5.60
984	             EID2RLOC   mask 23 priority 15 weight 5 192.168.1.5

986	   In the above figure network 10.0.128/23 would delegated to some end
987	   system, say EXAMPLE.COM.  They would manage the above zone
988	   information.  This would allow a DNS mechanism to work, but it would
989	   also allow someone to aggregate the information and distribution a
990	   table.

992	7.2.  Use of BGP and LISP+ALT

994	   Border Gateway Protocol (BGP) [RFC4271] is currently used to
995	   distribute inter-domain routing throughout the Internet.  Why not,
996	   then, use BGP to distribute mapping entries, or provide a rendezvous
997	   mechanism to initialize mapping entries?  In fact this is precisely
998	   what LISP+ALT [I-D.ietf-lisp-alt] accomplishes, using a completely
999	   separate topology from the normal DFZ.  It does so using existing
1000	   code paths and expertise.  The alternate topology also provides an
1001	   extremely accurate control path from ITRs to ETRs, whereas NERD's
1002	   operational model requires an optimistic assumption and control plane
1003	   functionality to cycle through unresponsive ETRs in an EID prefix's
1004	   mapping entry.  The memory scaling characteristics of LISP+ALT are
1005	   extremely attractive because of expected strong aggregation, whereas
1006	   NERD makes almost no attempt at aggregation.

1008	   A number of key deployment issues are left open.  The principle issue
1009	   is whether it is deemed acceptable for routers to drop packets
1010	   occasionally while mapping information is being gathered.  This
1011	   should be the subject of future research for ALT, as it was a key
1012	   design goal of NERD to avoid such a situation.

1014	7.3.  Perhaps use a hybrid model?

1016	   Perhaps it would be useful to use both a prepopulated database such
1017	   as NERD and a query mechanism (perhaps LISP+ALT, LISP-CONS
1018	   [I-D.meyer-lisp-cons], or DNS) to determine an EID/RLOC mapping.  One
1019	   idea would be to receive a subset of the mappings, say, by taking
1020	   only the NERD for certain regions.  This alleviates the need to drop
1021	   packets for some subset of destinations under the assumption that
1022	   one's business is localized to a particular region.  If one did not
1023	   have a local entry for a particular EID one would then make a query.

1025	   One approach to using DNS to query live would be to periodically walk
1026	   "interesting" portions of the network, in search of relevant records,
1027	   and caching them to non-volatile storage.  While preventing resource
1028	   attacks, the walk itself could be viewed as an attack, if the
1029	   algorithm was not selective enough about what it thought was
1030	   interesting.  A similar approach could be applied to LISP+ALT or
1031	   LISP-CONS by forcing a data-driven Map Reply for certain sites.

1033	8.  Deployment Issues

1035	   While LISP and NERD are intended as experiments at this point, it is
1036	   already obvious one must give serious consideration to circular
1037	   dependencies with regard to the protocols used and the elements
1038	   within them.

1040	8.1.  HTTP

1042	   In as much as HTTP depends on DNS, either due to the authority
1043	   section of a URI, or due to the configured base distribution URI,
1044	   these same concerns apply.  In addition, any HTTP server that itself
1045	   makes use of provider independent addresses would be a poor choice to
1046	   distribute the database for these exact same reasons.

1048	   One issue with using HTTP is that it is possible that a middlebox of
1049	   some form, such as a cache, may intercept and process requests.  In
1050	   some cases this might be a good thing.  For instance, if a cache
1051	   correctly returns a database, some amount of bandwidth is conserved.
1052	   On the other hand, if the cache itself fails to function properly for
1053	   whatever reason, end to end connectivity could be impaired.  For
1054	   example, if the cache itself depended on the mapping being in place
1055	   and functional, a cold start scenario might leave the cache
1056	   functioning improperly, in turn providing routers no means to update
1057	   their databases.  Some care must be given to avoid such
1058	   circumstances.

1060	9.  Open Questions

1062	   Do we need to discuss reachability in more detail?  This was clearly
1063	   an issue at the IST-RING workshop.  There are two key issues.  First,
1064	   what is the appropriate architectural separation between the data
1065	   plane and the control plane?  Second, is there some specific way in
1066	   which NERD impacts the data plane?

1068	   Should we specify a (perhaps compressed) tarball that treads a middle
1069	   ground for the last question, where each update tarball contains both
1070	   a signature for the update and for the entire database, once the
1071	   update is applied.

1073	   Should we compress?  In some initial testing of databases with 1, 5,
1074	   and 10 million IPv4 EIDs and a random distribution of IPv4 RLOCs, the
1075	   current format in this document compresses down by a factor of
1076	   between 35% and 36%, using Burrows-Wheeler block sorting text
1077	   compression algorithm (bzip2).  The NERD used random EIDs with prefix
1078	   lengths varying from 19-29, with probability weighted toward the
1079	   smaller masks.  This only very roughly reflects reality.  A better
1080	   test would be to start with the existing prefixes found in the DFZ.

1082	10.  Conclusions

1084	   This memo has specified a database format, an update format, a URI
1085	   convention, an update method, and a validation method for EID/RLOC
1086	   mappings.  We have shown that beyond the predictions of 10^8 EID-
1087	   prefix entries, the aggregate database size would likely be at most
1088	   17GB.  We have considered the amount of servers to distribute that
1089	   information and we have demonstrated the limitations of a simple
1090	   content distribution network and other well known mechanisms.  The
1091	   effort required to retrieve a database change amounts to between 3
1092	   and 30 seconds of processing time per hour at at today's gigabit
1093	   speeds.  We conclude that there is no need for an off box query
1094	   mechanism today, and that there are distinct disadvantages for having
1095	   such a mechanism in the control plane.

1097	   Beyond this we have examined alternatives that allow for hybrid
1098	   models that do use query mechanisms, should our operating assumptions
1099	   prove overly optimistic.  Use of NERD today does not foreclose use of
1100	   such models in the future, and in fact both models can happily co-
1101	   exist.

1103	   We leave to future work how the list of databases is distributed, how
1104	   BGP can play a role in distributing knowledge of the databases, and
1105	   how DNS can play a role in aggregating information into these
1106	   databases.

1108	   We also leave to future work whether HTTP is the best protocol for
1109	   the job, and whether the scheme described in this document is the
1110	   most efficient.  One could easily envision that when applied in high
1111	   delay or high loss environments, a broadcast or multicast method may
1112	   prove more effective.

1114	   Speaking of multicast, we also leave to future work how multicast is
1115	   implemented, if at all, either in conjunction or as an extension to
1116	   this model.

1118	   Finally, perhaps the most interesting future work would be to
1119	   understand if and how NERD could be integrated with the LISP mapping
1120	   server.  [I-D.ietf-lisp-ms]

1122	11.  IANA Considerations

1124	   This memo makes no requests of IANA.

1126	12.  Acknowledgments

1128	   Dino Farinacci, Patrik Faltstrom, Dave Meyer, Joel Halpern, Jim
1129	   Schaad, Dave Thaler, Mohamed Boucadair, Robin Whittle, Max Pritikin,
1130	   and Scott Brim were very helpful with their reviews of this work.
1131	   Thanks also to the participants of the Routing Research Group and the
1132	   IST-RING workshop held in Madrid in December of 2007 for their
1133	   incisive comments.  The astute will notice a lengthy References
1134	   section.  This work stands on the shoulders of many others' efforts.

1136	13.  References

1138	13.1.  Normative References

1140	   [I-D.ietf-lisp]
1141	              Farinacci, D., Fuller, V., Meyer, D., and D. Lewis,
1142	              "Locator/ID Separation Protocol (LISP)",
1143	              draft-ietf-lisp-06 (work in progress), January 2010.

1145	   [ITU.X509.2000]
1146	              International Telecommunications Union, "Information
1147	              technology - Open Systems Interconnection - The Directory:
1148	              Public-key and attribute certificate frameworks", ITU-
1149	              T Recommendation X.509, ISO Standard 9594-8, March 2000.

1151	   [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1152	              Resource Identifier (URI): Generic Syntax", STD 66,
1153	              RFC 3986, January 2005.

1155	   [RFC5321]  Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
1156	              October 2008.

1158	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1159	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

1161	13.2.  Informative References

1163	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1164	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
1165	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

1167	   [RFC2315]  Kaliski, B., "PKCS #7: Cryptographic Message Syntax
1168	              Version 1.5", RFC 2315, March 1998.

1170	   [RFC5652]  Housley, R., "Cryptographic Message Syntax (CMS)",
1171	              RFC 5652, September 2009.

1173	   [RFC0977]  Kantor, B. and P. Lapsley, "Network News Transfer
1174	              Protocol", RFC 977, February 1986.

1176	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
1177	              STD 13, RFC 1034, November 1987.

1179	   [RFC1383]  Huitema, C., "An Experiment in DNS Based IP Routing",
1180	              RFC 1383, December 1992.

1182	   [RFC4271]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway
1183	              Protocol 4 (BGP-4)", RFC 4271, January 2006.

1185	   [RFC4108]  Housley, R., "Using Cryptographic Message Syntax (CMS) to
1186	              Protect Firmware Packages", RFC 4108, August 2005.

1188	   [RFC4346]  Dierks, T. and E. Rescorla, "The Transport Layer Security
1189	              (TLS) Protocol Version 1.1", RFC 4346, April 2006.

1191	   [CARP07]   Carpenter, B., "IETF Plenary Presentation: Routing and
1192	              Addressing: Where we are today", March 2007.

1194	   [CVS]      Grune, R., Baalbergen, E., Waage, M., Berliner, B., and J.
1195	              Polk, "CVS: Concurrent Versions System", November 1985.

1197	   [gb91]     Smith, R., Gottesman, Y., Hobbs, B., Lear, E.,
1198	              Kristofferson, D., Benton, D., and P. Smith, "A mechanism
1199	              for maintaining an up-to-date GenBank database via
1200	              Usenet", CABIOS , April 1991.

1202	   [W3C.REC-xml11-20040204]
1203	              Yergeau, F., Maler, E., Paoli, J., Cowan, J., Bray, T.,
1204	              and C. Sperberg-McQueen, "Extensible Markup Language (XML)
1205	              1.1", World Wide Web Consortium FirstEdition REC-xml11-
1206	              20040204, February 2004,
1207	              <http://www.w3.org/TR/2004/REC-xml11-20040204>.

1209	   [W3C.REC-soap12-part1-20070427]
1210	              Gudgin, M., Karmarkar, A., Nielsen, H., Mendelsohn, N.,
1211	              Hadley, M., Lafon, Y., and J. Moreau, "SOAP Version 1.2
1212	              Part 1: Messaging Framework (Second Edition)", World Wide
1213	              Web Consortium Recommendation REC-soap12-part1-20070427,
1214	              April 2007,
1215	              <http://www.w3.org/TR/2007/REC-soap12-part1-20070427>.

1217	   [W3C.REC-soap12-part2-20070427]
1218	              Mendelsohn, N., Karmarkar, A., Moreau, J., Lafon, Y.,
1219	              Gudgin, M., Hadley, M., and H. Nielsen, "SOAP Version 1.2
1220	              Part 2: Adjuncts (Second Edition)", World Wide Web
1221	              Consortium Recommendation REC-soap12-part2-20070427,
1222	              April 2007,
1223	              <http://www.w3.org/TR/2007/REC-soap12-part2-20070427>.

1225	   [I-D.ietf-lisp-alt]
1226	              Fuller, V., Farinacci, D., Meyer, D., and D. Lewis, "LISP
1227	              Alternative Topology (LISP+ALT)", draft-ietf-lisp-alt-02
1228	              (work in progress), January 2010.

1230	   [I-D.meyer-lisp-cons]
1231	              Brim, S., "LISP-CONS: A Content distribution Overlay
1232	              Network Service for LISP", draft-meyer-lisp-cons-04 (work
1233	              in progress), April 2008.

1235	   [I-D.ietf-lisp-ms]
1236	              Fuller, V. and D. Farinacci, "LISP Map Server",
1237	              draft-ietf-lisp-ms-04 (work in progress), October 2009.

1239	URIs

1241	   [1]  <http://www.openssl.org>

1243	   [2]  <http://en.wikipedia.org/wiki/Usenet>

1245	Appendix A.  Generating and verifying the database signature with
1246	             OpenSSL

1248	   As previously mentioned, one goal of NERD was to use off-the-shelf
1249	   tools to both generate and retrieve the database.  To many, PKI is
1250	   magic.  This section is meant to provide at least some clarification
1251	   as to both the generation and verification process, complete with
1252	   command line examples.  Not included is how you get the entries
1253	   themselves.  We'll assume they exist, and that you're just trying to
1254	   sign the database.

1256	   To sign the database, to start with, you need a database file that
1257	   has a database header described in Section 3.  Block size should be
1258	   zero, and there should be no PKCS#7 block at this point.  You also
1259	   need a certificate and its private key with which you will sign the
1260	   database.

1262	   The OpenSSL "smime" command contains all the functions we need from
1263	   this point forth.  To sign the database, issue the following command:

1265	         openssl smime -binary -sign -outform DER -signer yourcert.crt \
1266	                 -inkey yourcert.key -in database-file -out signature

1268	   -binary states that no MIME canonicalization should be performed.
1269	   -sign indicates that you are signing the file that was given as the
1270	   argument to -in.  The output format (-outform) is binary DER, and
1271	   your public certificate is provided with -signer along with your key
1272	   with -inkey.  The signature itself is specified with -out.

1274	   The resulting file "signature" is then copied into to PKCS#7 block in
1275	   the database header, its size in bytes is recorded in the PKCS#7
1276	   block size field, and the resulting file is ready for distribution to
1277	   ITRs.

1279	   To verify a database file, first retrieve the PKCS#7 block from the
1280	   file by copying the appropriate number of bytes into another file,
1281	   say "signature".  Next, zero this field, and set the block size field
1282	   to 0.  Next use the "smime" command to verify the signature as
1283	   follows:

1285	       openssl smime -binary -verify -inform DER -content database-file
1286	               -out /dev/null -in signature

1288	   Openssl will return "Verification OK" if the signature is correct.
1289	   OpenSSL provides sufficiently rich libraries to accomplish the above
1290	   within the C programming language with a single pass.

1292	Appendix B.  Changes

1294	   This section to be removed prior to publication.

1296	   o  06-08: editorial.  Clarify sending diffs,
1297	   o  05: Fix normative/informative references.  Wordsmithing.
1298	   o  04: Analysis change: IPv6 RLOCs are 128 bits.  While they can be
1299	      shortened to 64 bits, that involves substantial ETR changes and
1300	      expenditure of IPv6 networks, which is probably unnecessary, and
1301	      can be left as a later optimization.  Added an option of
1302	      independent operators.  Processed all but two of Dino's comments.
1303	      Addressed Scott's comments.  Removed existing work analysis.
1304	      Saving that for another day.  Clarified OpenSSL Appendix.
1305	   o  05: clean DOWN. reinsert some text for historical purposes.
1306	   o  04: cleanup
1307	   o  03: Change dbname to a domain name, indicate that is what is in
1308	      the subject of the X.509 certificate, and list editorial changes,
1309	      update acknowledgments.
1310	   o  02: Incorporate some of Dave Thaler's comments.  Add
1311	      authentication block detail.  Modify analysis to take IPv6 into
1312	      account, along with a more realistic number of RLOCs per EID.  Add
1313	      some comments about potential risks of a cold start.  Add S/MIME
1314	      example as appendix A and take out old ToDo.  Provide some amount
1315	      of compression of IPv6 addresses by limiting their size to
1316	      significant bytes rounded to a four byte word boundary.
1317	   o  01: Massive spelling correction, URI example correction.
1318	   o  00: Initial Revision.

1320	Author's Address

1322	   Eliot Lear
1323	   Cisco Systems GmbH
1324	   Glatt-com
1325	   Glattzentrum, ZH  CH-8301
1326	   Switzerland

1328	   Phone: +41 44 878 7525
1329	   Email: lear@cisco.com