idnits 2.17.1 

draft-lear-lisp-nerd-05.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** You're using the IETF Trust Provisions' Section 6.b License Notice from
     12 Sep 2009 rather than the newer Notice from 28 Dec 2009.  (See
     https://trustee.ietf.org/license-info/)


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  == There are 2 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (December 14, 2009) is 5247 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-12) exists of
     draft-farinacci-lisp-07

  ** Obsolete normative reference: RFC 2616 (ref. '2') (Obsoleted by RFC
     7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Obsolete informational reference (is this intentional?): RFC  977 (ref.
     '7') (Obsoleted by RFC 3977)

  -- Obsolete informational reference (is this intentional?): RFC 4346 (ref.
     '11') (Obsoleted by RFC 5246)

  == Outdated reference: A later version (-05) exists of
     draft-fuller-lisp-alt-02


     Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                            E. Lear
3	Internet-Draft                                        Cisco Systems GmbH
4	Intended status: Experimental                          December 14, 2009
5	Expires: June 17, 2010

7	               NERD: A Not-so-novel EID to RLOC Database
8	                      draft-lear-lisp-nerd-05.txt

10	Abstract

12	   LISP is a protocol to encapsulate IP packets in order to allow end
13	   sites to multihome without injecting routes from one end of the
14	   Internet to another.  This memo specifies a database and a method to
15	   transport the mapping of EIDs to RLOCs to routers in a reliable,
16	   scalable, and secure manner.  Our analysis concludes that transport
17	   of of all EID/RLOC mappings scales well to at least 10^8 entries.

19	Status of this Memo

21	   This Internet-Draft is submitted to IETF in full conformance with the
22	   provisions of BCP 78 and BCP 79.

24	   Internet-Drafts are working documents of the Internet Engineering
25	   Task Force (IETF), its areas, and its working groups.  Note that
26	   other groups may also distribute working documents as Internet-
27	   Drafts.

29	   Internet-Drafts are draft documents valid for a maximum of six months
30	   and may be updated, replaced, or obsoleted by other documents at any
31	   time.  It is inappropriate to use Internet-Drafts as reference
32	   material or to cite them other than as "work in progress."

34	   The list of current Internet-Drafts can be accessed at
35	   http://www.ietf.org/ietf/1id-abstracts.txt.

37	   The list of Internet-Draft Shadow Directories can be accessed at
38	   http://www.ietf.org/shadow.html.

40	   This Internet-Draft will expire on June 17, 2010.

42	Copyright Notice

44	   Copyright (c) 2009 IETF Trust and the persons identified as the
45	   document authors.  All rights reserved.

47	   This document is subject to BCP 78 and the IETF Trust's Legal
48	   Provisions Relating to IETF Documents
49	   (http://trustee.ietf.org/license-info) in effect on the date of
50	   publication of this document.  Please review these documents
51	   carefully, as they describe your rights and restrictions with respect
52	   to this document.  Code Components extracted from this document must
53	   include Simplified BSD License text as described in Section 4.e of
54	   the Trust Legal Provisions and are provided without warranty as
55	   described in the BSD License.

57	Table of Contents

59	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
60	     1.1.  Base Assumptions . . . . . . . . . . . . . . . . . . . . .  3
61	     1.2.  What is NERD?  . . . . . . . . . . . . . . . . . . . . . .  4
62	     1.3.  Glossary . . . . . . . . . . . . . . . . . . . . . . . . .  5
63	   2.  Theory of Operation  . . . . . . . . . . . . . . . . . . . . .  5
64	     2.1.  Database Updates . . . . . . . . . . . . . . . . . . . . .  5
65	     2.2.  Communications between ITR and ETR . . . . . . . . . . . .  6
66	     2.3.  Who are database authorities?  . . . . . . . . . . . . . .  7
67	   3.  NERD Format  . . . . . . . . . . . . . . . . . . . . . . . . .  8
68	     3.1.  NERD Record Format . . . . . . . . . . . . . . . . . . . .  9
69	     3.2.  Database Update Format . . . . . . . . . . . . . . . . . . 10
70	   4.  NERD Distribution Mechanism  . . . . . . . . . . . . . . . . . 10
71	     4.1.  Initial Bootstrap  . . . . . . . . . . . . . . . . . . . . 10
72	     4.2.  Retrieving Changes . . . . . . . . . . . . . . . . . . . . 11
73	   5.  Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
74	     5.1.  Database Size  . . . . . . . . . . . . . . . . . . . . . . 12
75	     5.2.  Router Throughput Versus Time  . . . . . . . . . . . . . . 14
76	     5.3.  Number of Servers Required . . . . . . . . . . . . . . . . 14
77	     5.4.  Security Considerations  . . . . . . . . . . . . . . . . . 16
78	       5.4.1.  Use of Public Key Infrastructures (PKIs) . . . . . . . 17
79	       5.4.2.  Other Risks  . . . . . . . . . . . . . . . . . . . . . 19
80	   6.  Why not use XML? . . . . . . . . . . . . . . . . . . . . . . . 19
81	   7.  Other Distribution Mechanisms  . . . . . . . . . . . . . . . . 20
82	     7.1.  What About DNS as a retrieval model? . . . . . . . . . . . 21
83	     7.2.  Use of BGP and LISP+ALT  . . . . . . . . . . . . . . . . . 22
84	     7.3.  Perhaps use a hybrid model?  . . . . . . . . . . . . . . . 22
85	   8.  Deployment Issues  . . . . . . . . . . . . . . . . . . . . . . 23
86	     8.1.  HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
87	   9.  Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 23
88	   10. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 24
89	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 24
90	   12. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 24
91	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25
92	     13.1. Normative References . . . . . . . . . . . . . . . . . . . 25
93	     13.2. Informational References . . . . . . . . . . . . . . . . . 25
94	   Appendix A.  Generating and verifying the database signature
95	                with OpenSSL  . . . . . . . . . . . . . . . . . . . . 27

97	   Appendix B.  Changes . . . . . . . . . . . . . . . . . . . . . . . 28
98	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28

100	1.  Introduction

102	   Locator/ID Separation Protocol (LISP) [1] separates an IP address
103	   used by a host and local routing system from the locators advertised
104	   by BGP participants on the Internet in general, and in the default
105	   free zone (DFZ) in particular.  It accomplishes this by establishing
106	   a mapping between globally unique endpoint identifiers (EIDs) and
107	   routing locators (RLOCs).  This reduces the amount of state change
108	   that occurs on routers within the default-free zone on the Internet,
109	   while enabling end sites to be multihomed.

111	   In some mapping distribution approaches to LISP the mapping is
112	   learned via data-triggered control messages between ingress tunnel
113	   routers (ITRs) and egress tunnel routers (ETRs) through an alternate
114	   routing topology [19].  In other approaches of LISP, the mapping from
115	   EIDs to RLOCs is instead learned through some other means.  This memo
116	   addresses different approaches to the problem, and specifies a Not-
117	   so-novel EID RLOC Database (NERD) and methods to both receive the
118	   database and to receive updates.

120	   NERD is offered primarily as a way to avoid dropping packets, the
121	   underlying assumption being that dropping packets is bad for
122	   applications and end users.  Those who do not agree with this
123	   underlying assumption may find that other approaches make more sense.

125	   LISP and NERD are both currently experimental protocols.  The NERD
126	   database is specified in such a way that the methods used to
127	   distribute or retrieve it may vary over time.  Multiple databases are
128	   supported in order to allow for multiple data sources.  An effort has
129	   been made to divorce the database from access methods so that both
130	   can evolve independently through experimentation and operational
131	   validation.

133	1.1.  Base Assumptions

135	   In order to specify a mapping it is important to understand how it
136	   will be used, and the nature of the data being mapped.  In the case
137	   of LISP, the following assumptions are pertinent:

139	   o  The data contained within the mapping changes only on provisioning
140	      or configuration operations, and is not intended to change when a
141	      link either fails or is restored.  Some other mechanism such as
142	      the use of LISP Reachability Bits with mapping replies handles
143	      healing operations, particularly when a tail circuit within an
144	      service provider's aggregate goes down.  NERD can be used as a
145	      verification method to ensure that whatever operational mapping
146	      changes an ITR receives are authorized.

148	   o  While weight and priority are defined, these are not hop-by-hop
149	      metrics.  Hence the information contained within the mapping does
150	      not change based on where one sits within the topology.
151	   o  A purpose of LISP being to reduce control plane overhead by
152	      reducing "rate X state" complexity, updates to the mapping will be
153	      relatively rare.
154	   o  Because LISP and NERD are designed to ease interdomain routing,
155	      their use is intended within the inter-domain environment.  That
156	      is, LISP is best implemented at either the customer edge or
157	      provider edge, and there will be on the order of as many ITRs and
158	      EID Prefixes as there are connections to Internet Service
159	      Providers by end customers.
160	   o  As such, NERD cannot be the sole means to implement host mobility,
161	      although NERD may be in used in conjunction with other mechanisms.
162	      For instance, it would be possible for a mobile node to receive a
163	      local address that is an EID and pass that to the correspondent
164	      node, who could also make use of an EID.  As such use of LISP in
165	      this case would be transparent, and no mapping entries are changed
166	      for mobility.

168	1.2.  What is NERD?

170	   NERD is a Not-so-novel EID to RLOC Database.  It consists of the
171	   following components:

173	   1.  a network database format;
174	   2.  a change distribution format;
175	   3.  a database retrieval/bootstrapping method;
176	   4.  a change distribution method.

178	   The network database format is compressible.  However, at this time
179	   we specify no compression method.  NERD will make use of potentially
180	   several transport methods, but most notably HTTP [2].  HTTP has
181	   restart and compression capabilities.  It is also widely deployed.

183	   There exist many methods to show differences between two versions of
184	   a database or a file, UNIX's "diff" being the classic example.  In
185	   this case, because the data is well structured and easily keyed, we
186	   can make use of a very simple format for version differences that
187	   simply provides a list of EID/RLOC mappings that have changed using
188	   the same record format as the database, and a list of EIDs that are
189	   to be removed.

191	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
192	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
193	   document are to be interpreted as described in RFC 2119 [3].

195	1.3.  Glossary

197	   The reader is once again referred to [1] for a general glossary of
198	   terms related to LISP.  The following terms are specific to this
199	   memo.

201	   Base Distribution URI:  An Absolute-URI as defined in Section 4.3 of
202	      [6] from which other references are relative.  The base
203	      distribution URI is used to construct a URI to an EID/RLOC mapping
204	      database.  If more than one NERD is known then there will be one
205	      or more base distribution URIs associated with each (although each
206	      such base distribution URI may have the same value).

208	   EID Database Authority:  The authority that will sign database files
209	      and updates.  It is the source of both.

211	   The Authority:  Shorthand for the EID Database Authority.

213	   NERD:  (N)ot-so-novel (E)ID to (R)LOC (D)atabase.

215	   AFI  Address Family Identifier.

217	   Pull Model:  An architecture where clients pull only the information
218	      they need at any given time, such as when a packet arrives for
219	      forwarding.

221	   Push Model:  An architecture in which clients receive an entire
222	      dataset, containing data they may or may not require, such as
223	      mappings for EIDs that no host served is attempting to send to.

225	   Hybrid Model:  An architecture in which some information is pushed
226	      toward the receiver from a source and some information is pulled
227	      by the receiver.

229	2.  Theory of Operation

231	   Operational functions are split into two components: database updates
232	   and state exchange between ITR and ETR during a communication.

234	2.1.  Database Updates

236	   What follows is a summary of how NERDs are generated and updated.
237	   Specifics can be found in Section 3.  The general way in which NERD
238	   works is as follows:

240	   1.  A NERD is generated by an authority that allocates provider
241	       independent (PI) addresses (e.g., IANA or an RIR) which are used
242	       by sites as EIDs.  As part of this process the authority
243	       generates a digest for the database and signs it with a private
244	       key whose public key is part of an X.509 certificate. [15] That
245	       signature along with a copy of the authority's public key is
246	       included in the NERD.
247	   2.  The NERD is distributed to a group of well known servers.
248	   3.  ITRs retrieve an initial copy of the NERD via HTTP when they come
249	       into service.
250	   4.  ITRs are preconfigured with a group of certificates whose private
251	       keys are used by database authorities to sign the NERD.  This
252	       list of certificates should be configurable by administrators.
253	   5.  ITRs next verify both the validity of the public key and the
254	       signed digest.  If either fail validation, the ITR attempts to
255	       retrieve the NERD from a different source.  The process iterates
256	       until either a valid database is found or the list of sources is
257	       exhausted.
258	   6.  Once a valid NERD is retrieved, the ITR installs it into both
259	       non-volatile and local memory.
260	   7.  At some point the authority updates the NERD and increments the
261	       database version counter.  At the same time it generates a list
262	       of changes, which it also signs, as it does with the original
263	       database.
264	   8.  Periodically ITRs will poll from their list of servers to
265	       determine if a new version of the database exists.  When a new
266	       version is found, an ITR will attempt to retrieve a change file,
267	       using its list of preconfigured servers.
268	   9.  The ITR validates a change file just as it does the original
269	       database.  Assuming the change file passes validation, the ITR
270	       installs new entries, overwrites existing ones, and removes empty
271	       entries, based on the content of the change file.

273	   As time goes on it is quite possible that an ITR may probe a list of
274	   configured neighbors for a database or change file copy.  It is
275	   equally possible that neighbors might advertise to each other the
276	   version number of their database.  Such methods are not explored in
277	   depth in this memo, but are mentioned for future consideration.

279	2.2.  Communications between ITR and ETR

281	   [1] describes the basic approach to what happens when a packet
282	   arrives at an ITR, and what communications between ITR and ETR take
283	   place.  NERD provides an optimistic approach to establishing
284	   communications with an ETR that is responsible for a given EID
285	   prefix.  State must be kept, however, on an ITR to determine whether
286	   that ETR is in fact reachable.  It is expected that this is a common
287	   requirement across LISP mapping systems, and will be handled in the
288	   core LISP architecture.

290	2.3.  Who are database authorities?

292	   This memo does not specify who the database authority is.  That is
293	   because there are several possible operational models.  In each case
294	   the number of database authorities is meant to be small so that ITRs
295	   need only keep a small list of authorities, similar to the way a name
296	   server might cache a list of root servers.

298	   o  A single database authority exists.  In this case all entries in
299	      the database are registered to a single entity, and that entity
300	      distributes the database.  Because the EID space is provider
301	      independent address space, there is no architectural requirement
302	      that address space be hierarchically distributed to anyone, as
303	      there is with provider-assigned address space.  Hence, there is a
304	      natural affinity between the IANA function and the database
305	      authority function.
306	   o  Each region runs a database authority.  In this case, provider
307	      independent address space is allocated to either Regional Internet
308	      Registries (RIRs) or to affiliates of such organizations of
309	      network operations guilds (NOGs).  The benefit of this approach is
310	      that there is no single organization that controls the database.
311	      It allows one database authority to backup another.  One could
312	      envision as many as ten database authorities in this scenario.
313	      One drawback to this approach, however, is that any reference to a
314	      region imposes a notion of locality, thus potentially diminishing
315	      the split between locator and identifier.
316	   o  Each country runs a database authority.  This could occur should
317	      countries decide to regulate this function.  While limiting the
318	      scope of any single database authority as the previous scenario
319	      describes, this approach would introduce some overhead as the list
320	      of database authorities would grow to as many as 200, and possibly
321	      more if jurisdictions within countries attempted to regulate the
322	      function.  There are two drawbacks to this approach.  First, as
323	      distribution of EIDs is driven to more local jurisdictions, an EID
324	      prefix is tied even tighter to a location.  Second, a large number
325	      of database authorities will demand some sort of discovery
326	      mechanism.
327	   o  Independent operators manage database authorities.  This has the
328	      appeals of being location independent, and enabling competition
329	      for good performance.  This method has the drawback of potentially
330	      requiring a discovery mechanism.

332	   The latter two approaches are not mutually exclusive.  While this
333	   specification allows for multiple databases, discovery mechanisms are
334	   left as future work.

336	3.  NERD Format

338	   The NERD consists of a header that contains a database version and a
339	   signature that is generated by ignoring the signature field and
340	   setting the authentication block length to 0 (NULL).  The
341	   authentication block itself consists of a signature and a certificate
342	   whose private key counterpart was used to generate the signature.

344	   Records are kept sorted in numeric order with AFI plus EID as primary
345	   key and mask length as secondary.  This is so that after a database
346	   update it should be possible to reconstruct the database to verify
347	   the digest signature, which may be retrieved separately from the
348	   database for verification purposes.

350	        0                   1                   2                   3
351	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
352	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
353	       | Schema Vers=1 |  DB Code      |     Database Name Size        |
354	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
355	       |                      Database Version                         |
356	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
357	       |                   Old Database Version or 0                   |
358	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
359	       |                                                               |
360	       |                        Database Name                          |
361	       |                                                               |
362	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
363	       |       PKCS#7 Block Size       |          Reserved             |
364	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
365	       |                                                               |
366	       |      PKCS#7 Block containing Certificate and Signature        |
367	       |                                                               |
368	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

370	   Database Header

372	   The DB Code indicates 0 if what follows is an entire database or 1 if
373	   what follows is an update.  The database file version is incremented
374	   each time the complete database is generated by the authority.  In
375	   the case of an update, the database file version indicates the new
376	   database file version, and the old database file version is indicated
377	   in the "old DB version" field.  The database file version is used by
378	   routers to determine whether or not they have the most current
379	   database.

381	   The database name is a domain name.  This is the name that will
382	   appear in the Subject field of the certificate used to verify the
383	   database.  The purpose of the database name is to allow for more than
384	   one database.  Such databases would be merged by the router.  It is
385	   important that an EID/RLOC mapping be listed in no more than one
386	   database, lest inconsistencies arise.  However, it may be possible to
387	   transition a mapping from one database to another.  During the
388	   transition period, the mappings MUST be identical.  When they are
389	   not, the resultant behavior will be undefined.

391	   The PKCS#7 [4] authentication block contains a DER encoded [5]
392	   signature and associated public key.

394	3.1.  NERD Record Format

396	   As distributed over the network, NERD records appear as follows:

398	        0                   1                   2                   3
399	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
400	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
401	       | Num. RLOCs    | EID Mask Len  |            EID AFI            |
402	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
403	       |                       End point identifier                    |
404	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
405	       | Priority 1    |    Weight 1   |             AFI 1             |
406	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
407	       |                       Routing Locator 1                       |
408	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
409	       | Priority 2    |    Weight 2   |             AFI 2             |
410	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
411	       |                       Routing Locator 2                       |
412	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
413	       | Priority 3    |    Weight 3   |             AFI 3             |
414	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
415	       |                       Routing Locator 3...                    |
416	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

418	   Priority N and Weight N, and AFI N are associated with Routing
419	   Locator N. There will always be at least one routing locator.  The
420	   minimum record size for IPv4 is 16 bytes.  Each additional IPv4 RLOC
421	   increases the record size by 8 bytes.  The purpose of this format is
422	   to keep the database compact, but somewhat easily read.  The meaning
423	   of weight and priority are described in [1].  The format of the AFI
424	   is specified by IANA as "Address Family Numbers", with the exception
425	   of how IPv6 EID prefixes are stored.

427	   In order to reduce storage and transmission amounts for IPv6, only
428	   the necessary number of bytes as specified by the prefix length are
429	   kept in the record, rounded to the nearest four byte (word) boundary.
430	   For instance, if the prefix length is /49, the nearest four-byte word
431	   boundary would require that eight bytes are stored.  IPv6 RLOCs are
432	   represented as normal 128-bit IPv6 addresses.

434	3.2.  Database Update Format

436	   A database update contains a set of changes to an existing database.
437	   Each AFI/EID/mask-length tuple may have zero or more RLOCs associated
438	   with it.  In the case where there are no RLOCs, the EID entry is
439	   removed from the database.  Records that contain EIDs and mask
440	   lengths that were not previously listed are simply added.  Otherwise,
441	   the old record for the EID and mask length is replaced by the more
442	   current information.  The record format used by the a database update
443	   is the same as described in Section 3.1.

445	4.  NERD Distribution Mechanism

447	4.1.  Initial Bootstrap

449	   Bootstrap occurs when a router needs to retrieve the entire database.
450	   It knows it needs to retrieve the entire database because either it
451	   has none or an update too substantial to process, as might be the
452	   case if a router has been out of service for a substantially lengthy
453	   period of time.

455	   To bootstrap the ITR appends the database name plus "/current/
456	   entiredb" to a Base Distribution URI and retrieves the file via HTTP.
457	   For example, if the configured URI is
458	   "http://www.example.com/eiddb/", and assuming a database name of
459	   "nerd.arin.net", the ITR would request
460	   "http://www.example.com/eiddb/current/nerd.arin.net/entiredb".
461	   Routers MUST check the signature on the database prior to installing
462	   it, and MUST check that the database schema matches a schema they
463	   understand.  Once a router has a valid database it MUST store that
464	   database in some sort of non-volatile memory (e.g., disk, flash
465	   memory, etc).

467	   N.B., the host component for such URIs MUST NOT resolve to a LISP
468	   EID, lest a circular dependency be created.

470	4.2.  Retrieving Changes

472	   In order to retrieve a set of database changes an ITR will have
473	   previously retrieved the entire database.  Hence it knows the current
474	   version of the database it has.  Its first step for retrieving
475	   changes is to retrieve the current version of the database.  It does
476	   so by appending "current/version" to the base distribution URI and
477	   retrieving the file.  Its format is text and it contains the integer
478	   value of the current database version.

480	   Once an ITR has retrieved the current version it compares version of
481	   its local copy.  If there is no difference, then the router is up to
482	   date and need take no further actions until it next checks.

484	   If the versions differ, the router next sends a request for the
485	   appropriate change file by appending "current/changes/" and the
486	   textual representation of the version of its local copy of the
487	   database to the base distribution URI.  For example, if the current
488	   version of the database is 1105503 and router's version is 1105500,
489	   and the base URI and database name are the same as above, the router
490	   would request
491	   "http://www.example.com/eiddb/nerd.arin.net/current/changes/1105500".

493	   The server may not have that change file, either because there are
494	   too many versions between what the router has and what is current, or
495	   because no such change file was generated.  If the server has changes
496	   from the routers version to any later version, the server SHOULD
497	   issue an HTTP redirect to that change file, and the router SHOULD
498	   retrieve and process it.  Once it has done so, the router should then
499	   repeat the process until it has brought itself up to date.  It is
500	   thus important for servers to expire old change files in the order in
501	   which they were generated.

503	   By way of convention, it is suggested that the URIs issued in
504	   redirects be of the following form:

506	   {base dist.  URI}/{dbname}/{more-recent-version}/changes/
507	   {older-version}

509	   where "base dist.  URI" is the base distribution URI, "dbname" is the
510	   name of the database, and each version is the textual representation
511	   of the integer version value.

513	   For example, if the current database version was 1105503 and a router
514	   made a request for
515	   "http://www.example.com/eiddb/nerd.arin.net/current/changes/1105400"
516	   but there was no change file from 1105400 to 1105503, and the server
517	   had a group of change files to make the router current, it would
518	   issue a redirect to
519	   "http://www.example.com/eiddb/nerd.arin.net/110450/changes/1105400"
520	   that the router would then process.  The router would then make a
521	   request for
522	   "http://www.example.com/eiddb/nerd.arin.net/current/changes/110450"
523	   that the server would have.

525	   While it is unlikely that database versions would wrap, as they
526	   consists of 32 bit integers, should the event occur, ITRs MUST
527	   attempt first to retrieve a change file when their current version
528	   number is within 10,000 of 2^32 and they see a version available that
529	   is less than 10,000.  Barring the availability of a change file, the
530	   ITR MUST still assume that the database version has wrapped and
531	   retrieve a new copy.

533	5.  Analysis

535	   We will start our analysis by looking at how much data will be
536	   transferred to a router during bootstrap conditions.  We will then
537	   look at the bandwidth required.  Next we will turn our concerns to
538	   servers.  Finally we will ponder the effect of providing only
539	   changes.

541	   In the analysis below we treat the overhead of the database header as
542	   insignificant (because it is).  The analysis should be similar,
543	   whether a single database or multiple databases are employed, as we
544	   would assume that no entry would appear more than once.

546	5.1.  Database Size

548	   By its very nature the information to be transported is relatively
549	   static and is specifically designed to be topologically insensitive.
550	   That is, every ITR is intended to have the same set of RLOCs for a
551	   given EID.  While some processing power will be necessary to install
552	   a table, the amount required should be far less than that of a
553	   routing information database because the level of entropy is intended
554	   to be lower.

556	   For purposes of this analysis, we will assume that the world has
557	   migrated to IPv6, as this increases the size of the database, which
558	   would be our primary concern.  However, to mitigate the size
559	   increase, we have limited the size of the prefix transmitted.  For
560	   purposes of this analysis, we shall assume an average prefix length
561	   of 64 bits.

563	   Based on that assumption, Section 3.1 states that mapping information
564	   for each EID/Prefix includes a group of RLOCs, each with an
565	   associated priority and weight, and that a minimum record size with
566	   IPv6 EIDs with at least one RLOC is 30 bytes uncompressed.  Each
567	   additional IPv6 RLOC costs 20 bytes.

569	                 +-----------+--------+--------+---------+
570	                 | 10^n EIDs | 2 RLOC | 4 RLOC |  8 RLOC |
571	                 +-----------+--------+--------+---------+
572	                 |         4 | 500 KB | 900 KB | 1.70 MB |
573	                 |         5 | 5.0 MB | 9.0 MB | 17.0 MB |
574	                 |         6 |  50 MB |  90 MB |  170 MB |
575	                 |         7 | 500 MB | 900 MB | 1.70 GB |
576	                 |         8 | 5.0 GB | 9.0 GB | 17.0 GB |
577	                 +-----------+--------+--------+---------+

579	    Database size for IPv6 routes with average prefix length = 64 bits

581	                                  Table 1

583	   Entries in the above table are derived as follows:

585	        E * (30 + 20 * (R - 1 ))

587	   where E = number of EIDs (10^n), R = number of RLOCs per EID.

589	   Our scaling target is to accommodate 10^8 multihomed systems, which
590	   is one order magnitude greater than what is discussed in [12].  At
591	   10^8 entries, a device could be expected to use between 5 and 17
592	   gigabytes of RAM for the mapping.  No matter the method of
593	   distribution, any router that sits in the core of the Internet would
594	   require near this amount of memory in order to perform the ITR
595	   function.  Large enterprise ETRs would be similarly strained, simply
596	   due to the diversity of of sites that communicate with one another.
597	   The good news is that this is not our starting point, but rather our
598	   scaling target, a number that we intend to reach by the year 2050.
599	   Our starting point is more likely in the neighborhood of 10^4 or 10^5
600	   EIDs, thus requiring between 500KB and 17 MB.

602	5.2.  Router Throughput Versus Time

604	        +-------------------+---------+--------+---------+-------+
605	        | Table Size (10^N) |   1mb/s | 10mb/s | 100mb/s | 1gb/s |
606	        +-------------------+---------+--------+---------+-------+
607	        |                 6 |       8 |    0.8 |    0.08 | 0.008 |
608	        |                 7 |      80 |      8 |     0.8 |  0.08 |
609	        |                 8 |     800 |     80 |       8 |   0.8 |
610	        |                 9 |   8,000 |    800 |      80 |     8 |
611	        |                10 |  80,000 |  8,000 |     800 |    80 |
612	        |                11 | 800,000 | 80,000 |   8,000 |   800 |
613	        +-------------------+---------+--------+---------+-------+

615	                     Number of seconds to process NERD

617	                                  Table 2

619	   The length of time it takes to process the database is significant in
620	   models where the device acquires the entire table.  During this
621	   period of time, either the router will be unable to route packets
622	   using LISP or it must use some sort of query mechanism for specific
623	   EIDs as the rest it populates its table through the transfer.
624	   Table 2 shows us that at our scaling target, the length of time it
625	   would take for a router using 1 mb/s of bandwidth is about 80
626	   seconds.  We can measure the processing rate in small numbers of
627	   hours for any transfer speed greater than that.  The fastest
628	   processing time shows us as taking 8 seconds to process an entire
629	   table of 10^9 bytes and 80 seconds for 10^10 bytes.

631	5.3.  Number of Servers Required

633	   As easy as it may be for a router to retrieve, the aggregate
634	   information may be difficult for servers to transmit, assuming the
635	   information is transmitted in aggregate (we'll revisit that
636	   assumption later).

638	   +----------------+------------+-----------+------------+------------+
639	   | # Simultaneous | 10 Servers |       100 |      1,000 |     10,000 |
640	   |       Requests |            |   Servers |    Servers |    Servers |
641	   +----------------+------------+-----------+------------+------------+
642	   |            100 |        720 |        72 |         72 |         72 |
643	   |          1,000 |      7,200 |       720 |         72 |         72 |
644	   |         10,000 |     72,000 |     7,200 |        720 |         72 |
645	   |        100,000 |    720,000 |    72,000 |      7,200 |        720 |
646	   |      1,000,000 |  7,200,000 |   720,000 |     72,000 |      7,200 |
647	   |     10,000,000 | 72,000,000 | 7,200,000 |    720,000 |     72,000 |
648	   +----------------+------------+-----------+------------+------------+

650	     Retrieval time per number of servers in seconds.  Assumes average
651	   10^8 entries with 4 RLOCs per EID and that each server has access to
652	    1gb/s and 100% efficient use of that bandwidth and no compression.

654	                                  Table 3

656	   Entries in the above table were generated using the following method:

658	   For 10^8 entries with four RLOCs per EID, the table size is 9.0GB,
659	   per our previous table.  Assume 1 Gb/s transfer rates and 100%
660	   utilization.  Protocol overhead is ignored for this exercise.  Hence
661	   a single transfer X takes 48 seconds and can get no faster.

663	   With this in mind, each entry is as follows:

665	            max(1X,N*X/S)

667	     where N=number of transfers, X = 72 seconds,
668	     and S = number of servers.

670	   If we have a distribution model which every device must retrieve the
671	   mapping information upon start, Table 3 shows the length of time in
672	   seconds it will take for a given number of servers to complete a
673	   transfer to a given number of devices.  This table says, as an
674	   example, that it would take 72,000 seconds (20 hours) for one million
675	   ITRs to simultaneously retrieve the database from one thousand
676	   servers.  Should a cold start scenario occur, this number should be
677	   of some concern.  Hence it is important to take some measures both to
678	   avoid such a scenario, and to ease the load should it occur.  The
679	   primary defense should be for ITRs to first attempt to retrieve their
680	   databases from their peers or upstream providers.  Secondary defenses
681	   could include data sanity checks within ITRs, with agreed norms for
682	   how much the database should change in any given update or over any
683	   given period of time.  As we will see below, dissemination of changes
684	   is considerably less volume.

686	     +----------------+-------------+---------------+----------------+
687	     | % Daily Change | 100 Servers | 1,000 Servers | 10,000 Servers |
688	     +----------------+-------------+---------------+----------------+
689	     |           0.1% |         300 |            30 |              3 |
690	     |           0.5% |        1500 |           150 |             15 |
691	     |             1% |        3000 |           300 |             30 |
692	     |             5% |      15,000 |          1500 |            150 |
693	     |            10% |      30,000 |          3000 |            300 |
694	     +----------------+-------------+---------------+----------------+

696	     Assuming 10 million routers and a database size of 9GB, resulting
697	    hourly transfer times are shown in seconds, given number of servers
698	                         and daily rate of change.

700	                                  Table 4

702	   This table shows us that with 10,000 servers the average transfer
703	   time with 1Gb/s links for 10,000,000 routers will be 300 seconds with
704	   10% daily change spread over 24 hourly updates.  For a 0.1% daily
705	   change, that number is 3 seconds for a database of size 9.0GB.

707	   The amount of change goes to the purpose of LISP.  If its purpose is
708	   to provide effective multihoming support to end customers, then we
709	   might anticipate relatively few changes.  If, on the other, service
710	   providers attempt to make use of LISP to provide some form of traffic
711	   engineering, we can expect the same data to change more often.  We
712	   can probably not conclude much in this regard without additional
713	   operational experience.  The one thing we can say is that different
714	   applications of the LISP protocol may require new and different
715	   distribution mechanisms.  Such optimization is left for another day.

717	5.4.  Security Considerations

719	   Whichever the answer to our previous question, we must consider the
720	   security of the information being transported.  If an attacker can
721	   forge an update or tamper with the database, he can in effect
722	   redirect traffic to end sites.  Hence, integrity and authenticity of
723	   the NERD is critical.  In addition, a means is required to determine
724	   whether a source is authorized to modify a given database.  No data
725	   privacy is required.  Quite to the contrary, this information will be
726	   necessary for any ITR.

728	   The first question one must ask is who to trust to provide the ITR a
729	   mapping.  Ultimately the owner of the EID prefix is most
730	   authoritative for the mapping to RLOCs.  However, were all owners to
731	   sign all such mappings, ITRs would need to know which owner is
732	   authorized to modify which mapping, creating a problem of O(N^2)
733	   complexity.

735	   We can reduce this problem substantially by investing some trust in a
736	   small number of entities that are allowed to sign entries.  If
737	   authority manages EIDs much the same way a domain name registrar
738	   handles domains, then the owner of the EID would choose a database
739	   authority she or he trusts, and ITRs must trust each such authority
740	   in order to map the EIDs listed by that authority to RLOCs.  This
741	   reduces the amount of management complexity on the ETR to retaining
742	   knowledge of O(#authorities), but does require that each authority
743	   establish procedures for authenticating the owner of an EID.  Those
744	   procedures needn't be the same.

746	   There are two classic methods to ensure integrity of data:

748	   o  secure transport of the source of the data to the consumer, such
749	      as Transport Layer Security (TLS) [11]; and
750	   o  provide object level security.

752	   These methods are not mutually exclusive, although one can argue
753	   about the need for the former, given the latter.

755	   In the case of TLS, when it is properly implemented, the objects
756	   being transported cannot easily be modified by interlopers or so-
757	   called men in the middle.  When data objects are distributed to
758	   multiple servers, each of those servers must be trusted.  As we have
759	   seen above, we could have quite a large number of servers, thus
760	   providing an attacker a large number of targets.  We conclude that
761	   some form of object level security is required.

763	   Object level security involves an authority signing an object in a
764	   way that can easily be verified by a consumer, in this case a router.
765	   In this case, we would want the mapping table and any incremental
766	   update to be signed by the originator of the update.  This implies
767	   that we cannot simply make use of a tool like CVS [13].  Instead, the
768	   originator will want to generate diffs, sign them, and make them
769	   available either directly or through some sort of content
770	   distribution or peer to peer network.

772	5.4.1.  Use of Public Key Infrastructures (PKIs)

774	   X.509 provides a certificate hierarchy that has scaled to the size of
775	   the Internet.  The system is most manageable when there are few
776	   certificates to manage.  The model proposed in this memo makes use of
777	   one current certificate per database authority.  The three pieces of
778	   information necessary to verify a signature, therefore, are as
779	   follows:

781	   o  the certificate of the database authority, which can be provided
782	      along with the database;
783	   o  the certificate authority's certificate; and
784	   o  A table of database names and distinguished names (DNs) that are
785	      allowed to update them.

787	   The latter two pieces of information must be very well known and must
788	   be configured on each ITR.  It is expected that both would change
789	   very rarely, and it would not be unreasonable for such updates to
790	   occur as part of a normal OS release process.

792	   The tools for both signing and verifying are readily available.
793	   OpenSSL [21] provides tools and libraries for both signing and
794	   verifying.  Other tools commonly exist.

796	   Use of PKIs is not without implementation, operational complexity or
797	   risk.  The following risks and mitigations are identified with NERD's
798	   use of PKIs:

800	   If a NERD database authority private key is exposed:

802	      In this case an attacker could sign a false database update,
803	      either redirecting traffic, or otherwise causing havoc.  In this
804	      case, the NERD database administrator must revoke its existing key
805	      and issue a new one.  The certificate is added to a certificate
806	      revocation list (CRL), which may be distributed with both this and
807	      other databases, as well as through other channels.  Because this
808	      event is expected to be rare, and the number of database
809	      authorities is expected to be small, a CRL will be small.  When a
810	      router receives a revocation, it checks it against its existing
811	      databases, and attempts to update the one that is revoked.  This
812	      implies that prior to issuing the revocation, the database
813	      authority MUST sign an update with the new key.  Routers SHOULD
814	      discard updates they have already received that were signed after
815	      the revocation was generated.  If a router cannot confirm that
816	      whether the authority's certificate was revoked before or after a
817	      particular update, it MUST retrieve a fresh new copy of the
818	      database with a valid signature.

820	   The private key associated with the CA that signed the Authority's
821	   certificate is compromised:

823	      In this case, it becomes possible for an attacker to masquerade as
824	      the database authority.  To ameliorate damage, the database
825	      authority SHOULD revoke its certificate and get a new certificate
826	      issued from a CA that is not compromised.  Once it has done so,
827	      the previous procedure is followed.  The compromised certificate
828	      can be removed during the normal operating system upgrade cycle.

830	   An algorithm used if either the certificate or the signature is
831	   cracked:

833	      This is a catastrophic failure and the above forms of attack
834	      become possible.  The only mitigation is to make use of a new
835	      algorithm.  In theory this should be possible, but in practice has
836	      proved very difficult.  For this reason, additional work is
837	      recommended to make alternative algorithms available.

839	   The Database Authority loses its key or disappears:

841	      In this case nobody can update the existing database.  There are
842	      few programmatic mitigations.  If the database authority places
843	      its private keys and suitable amounts of information escrow, under
844	      agreed upon circumstances, such as no updates for three days, for
845	      example, the escrow agent would release the information to a party
846	      competent of generating a database update.

848	5.4.2.  Other Risks

850	   Because this specification does not require secure transport, if an
851	   attacker prevents updates to an ITR for the purposes of having that
852	   ITR continue to use a compromised ETR, the ITR could continue to use
853	   an old version of the database without realizing a new version has
854	   been made available.  If one is worried about such an attack, a
855	   secure channel such as SSL to a secure chain back to the database
856	   authority should be used.  It is possible that after some operational
857	   experience, later versions of this format will contain additional
858	   semantics to address this attack.

860	   As discussed above, substantial risk would be a cold start scenario.
861	   If an attacker found a bug in a common operating system that allowed
862	   it to erase an ITR's database, and was able to disseminate that bug,
863	   the collective ability of ITRs to retrieve new copies of the database
864	   could be taxed by collective demand.  The remedy to this is for
865	   devices to share copies of the database with their neighbors, thus
866	   making each potential requester a potential service.

868	6.  Why not use XML?

870	   Many objects these days are distributed as either XML pages or
871	   something derived as XML [16], such as SOAP [17],[18].  Use of such
872	   well known standards allows for high level tools and library reuse.
873	   XML's strength is extensibility.  Without a doubt XML would be more
874	   extensible than a fixed field database.  Why not, then, use these
875	   standards in this case?  The greatest concern the author had was
876	   compactness of the data stream.  In as much as this mechanism is used
877	   at all in the future, so long as that concern could be addressed, and
878	   so long as signatures of the database can be verified, XML probably
879	   should be considered.

881	7.  Other Distribution Mechanisms

883	   We now consider various different mechanisms.  The problem of
884	   distributing changes in various databases is as old as databases.
885	   The author is aware of two obvious approaches that have been well
886	   used in the past.  One approach would be the wide distribution of CVS
887	   repositories.  However, for reasons mentioned in the previous
888	   section, CVS is insufficient to the task.

890	   The other tried and true approach is the use of periodic updates in
891	   the form of messages.  Good old NNTP [7] itself provides two separate
892	   mechanisms (one push and another pull) to provide a coherent update
893	   process.  This was in fact used to update molecular biology databases
894	   [14] in the early 1990s.  Netnews offers a way to determine whether
895	   articles with specified Article-Ids have been received.  In the case
896	   where the mapping file source of authority wishes to transmit
897	   updates, it can sign a change file and then post it into the network.
898	   Routers merely need to keep a record of article ids that it has
899	   received.  Netnews systems have years ago handled far greater volume
900	   of traffic than we envision. [22] Initially this is probably
901	   overkill, but it may not be so later in this process.  Some
902	   consideration should be given to a mechanism known to widely
903	   distribute vast amounts of data, as instantaneously either the sender
904	   or the receiver wishes.

906	   To attain an additional level of hierarchy in the distribution
907	   network, service providers could retrieve information to their own
908	   local servers, and configure their routers with the host portion of
909	   the above URI.

911	   Another possibility would be for providers to establish an agreement
912	   on a small set of anycast addresses for use for this purpose.  There
913	   are limitations to the use of anycast, particularly with TCP.  In the
914	   midst of a routing flap anycast address can become all but unusable.
915	   Careful study of such a use as well as appropriate use of HTTP
916	   redirects is expected.

918	7.1.  What About DNS as a retrieval model?

920	   It has been proposed that a query/response mechanism be used for this
921	   information, and that specifically the domain name system (DNS) [8]
922	   be used.  The previous models do not preclude the DNS.  DNS has the
923	   advantage that the administrative lines are well drawn, and that the
924	   ID/RLOC mapping is likely to appear very close to these boundaries.
925	   DNS also has the added benefit that an entire distribution
926	   infrastructure already exists.  There are, however, some problems
927	   that could impact end hosts when intermediate routers make queries,
928	   some of which were first pointed out in [9]:

930	   o  Any query mechanism offers an opportunity for a resource attack if
931	      an attacker can force the ITR to query for information.  In this
932	      case, all that would be necessary would be for a "botnet" (a group
933	      of computers that have been compromised and used as vehicles to
934	      attack others) to ping or otherwise contact via some normal
935	      service hosts that sit behind the ETR.  If the botnet hosts
936	      themselves are behind ETRs, the victim's ITR will need to query
937	      for each and every one of them, thus becoming part of a classic
938	      reflector attack.
939	   o  Packets will be delayed at the very least, and probably dropped in
940	      the process of a mapping query.  This could be at the beginning of
941	      a communication, but it will be impossible for a router to
942	      conclude with certainty that this is the case.
943	   o  The DNS has a backoff algorithm that presumes that applications
944	      are making queries prior to the beginning of a communication.
945	      This is appropriate for end hosts who know in fact when a
946	      communication begins.  An end user may not enjoy a router waiting
947	      seconds for a retry.
948	   o  While the administrative lines may appear to be correct, the
949	      location of name servers may not be.  If name servers sit within
950	      PI address space, thus requiring LISP to reach, a circular
951	      dependency is created.  This is precisely where many enterprise
952	      name servers sit.  The LISP experiment should not predicate its
953	      success on relocation of such name servers.

955	   Never-the-less, DNS may be able to play a role in providing the
956	   enterprise control over the mapping of its EIDs to RLOCs.  Posit a
957	   new DNS record "EID2RLOC".  This record is used by the authority to
958	   collect and aggregate mapping information so that it may be
959	   distributed through one of the other mechanisms.  As an example:

961	      $ORIGIN 0.10.PI-SPACE.
962	       128   EID2RLOC   mask 23 priority 10 weight 5 172.16.5.60
963	             EID2RLOC   mask 23 priority 15 weight 5 192.168.1.5

965	   In the above figure network 10.0.128/23 would delegated to some end
966	   system, say EXAMPLE.COM.  They would manage the above zone
967	   information.  This would allow a DNS mechanism to work, but it would
968	   also allow someone to aggregate the information and distribution a
969	   table.

971	7.2.  Use of BGP and LISP+ALT

973	   Border Gateway Protocol (BGP) [10] is currently used to distribute
974	   inter-domain routing throughout the Internet.  Why not, then, use BGP
975	   to distribute mapping entries, or provide a rendezvous mechanism to
976	   initialize mapping entries?  In fact this is precisely what LISP+ALT
977	   [19] accomplishes, using a completely separate topology from the
978	   normal DFZ.  It does so using existing code paths and expertise.  The
979	   alternate topology also provides an extremely accurate control path
980	   from ITRs to ETRs, whereas NERD's operational model requires an
981	   optimistic assumption and control plane functionality to cycle
982	   through unresponsive ETRs in an EID prefix's mapping entry.  The
983	   memory scaling characteristics of LISP+ALT are extremely attractive
984	   because of expected strong aggregation, whereas NERD makes almost no
985	   attempt at aggregation.

987	   A number of key deployment issues are left open.  The principle issue
988	   is whether it is deemed acceptable for routers to drop packets
989	   occasionally while mapping information is being gathered.  This
990	   should be the subject of future research for ALT, as it was a key
991	   design goal of NERD to avoid such a situation.

993	7.3.  Perhaps use a hybrid model?

995	   Perhaps it would be useful to use both a prepopulated database such
996	   as NERD and a query mechanism (perhaps LISP+ALT, LISP-CONS [20], or
997	   DNS) to determine an EID/RLOC mapping.  One idea would be to receive
998	   a subset of the mappings, say, by taking only the NERD for certain
999	   regions.  This alleviates the need to drop packets for some subset of
1000	   destinations under the assumption that one's business is localized to
1001	   a particular region.  If one did not have a local entry for a
1002	   particular EID one would then make a query.

1004	   One approach to using DNS to query live would be to periodically walk
1005	   "interesting" portions of the network, in search of relevant records,
1006	   and caching them to non-volatile storage.  While preventing resource
1007	   attacks, the walk itself could be viewed as an attack, if the
1008	   algorithm was not selective enough about what it thought was
1009	   interesting.  A similar approach could be applied to LISP+ALT or
1010	   LISP-CONS by forcing a data-driven Map Reply for certain sites.

1012	8.  Deployment Issues

1014	   While LISP and NERD are intended as experiments at this point, it is
1015	   already obvious one must give serious consideration to circular
1016	   dependencies with regard to the protocols used and the elements
1017	   within them.

1019	8.1.  HTTP

1021	   In as much as HTTP depends on DNS, either due to the authority
1022	   section of a URI, or due to the configured base distribution URI,
1023	   these same concerns apply.  In addition, any HTTP server that itself
1024	   makes use of provider independent addresses would be a poor choice to
1025	   distribute the database for these exact same reasons.

1027	   One issue with using HTTP is that it is possible that a middlebox of
1028	   some form, such as a cache, may intercept and process requests.  In
1029	   some cases this might be a good thing.  For instance, if a cache
1030	   correctly returns a database, some amount of bandwidth is conserved.
1031	   On the other hand, if the cache itself fails to function properly for
1032	   whatever reason, end to end connectivity could be impaired.  For
1033	   example, if the cache itself depended on the mapping being in place
1034	   and functional, a cold start scenario might leave the cache
1035	   functioning improperly, in turn providing routers no means to update
1036	   their databases.  Some care must be given to avoid such
1037	   circumstances.

1039	9.  Open Questions

1041	   Do we need to discuss reachability in more detail?  This was clearly
1042	   an issue at the IST-RING workshop.  There are two key issues.  First,
1043	   what is the appropriate architectural separation between the data
1044	   plane and the control plane?  Second, is there some specific way in
1045	   which NERD impacts the data plane?

1047	   Should we specify a (perhaps compressed) tarball that treads a middle
1048	   ground for the last question, where each update tarball contains both
1049	   a signature for the update and for the entire database, once the
1050	   update is applied.

1052	   Should we compress?  In some initial testing of databases with 1, 5,
1053	   and 10 million IPv4 EIDs and a random distribution of IPv4 RLOCs, the
1054	   current format in this document compresses down by a factor of
1055	   between 35% and 36%, using Burrows-Wheeler block sorting text
1056	   compression algorithm (bzip2).  The NERD used random EIDs with mask
1057	   lengths varying from 19-29, with probability weighted toward the
1058	   smaller masks.  This only very roughly reflects reality.  A better
1059	   test would be to start with the existing prefixes found in the DFZ.

1061	10.  Conclusions

1063	   This memo has specified a database format, an update format, a URI
1064	   convention, an update method, and a validation method for EID/RLOC
1065	   mappings.  We have shown that beyond the predictions of 10^8 EID-
1066	   prefix entries, the aggregate database size would likely be at most
1067	   17GB.  We have considered the amount of servers to distribute that
1068	   information and we have demonstrated the limitations of a simple
1069	   content distribution network and other well known mechanisms.  The
1070	   effort required to retrieve a database change amounts to between 3
1071	   and 30 seconds of processing time per hour at at today's gigabit
1072	   speeds.  We conclude that there is no need for an off box query
1073	   mechanism today, and that there are distinct disadvantages for having
1074	   such a mechanism in the control plane.

1076	   Beyond this we have examined alternatives that allow for hybrid
1077	   models that do use query mechanisms, should our operating assumptions
1078	   prove overly optimistic.  Use of NERD today does not foreclose use of
1079	   such models in the future, and in fact both models can happily co-
1080	   exist.

1082	   We leave to future work how the list of databases is distributed, how
1083	   BGP can play a role in distributing knowledge of the databases, and
1084	   how DNS can play a role in aggregating information into these
1085	   databases.

1087	   We also leave to future work whether HTTP is the best protocol for
1088	   the job, and whether the scheme described in this document is the
1089	   most efficient.  One could easily envision that when applied in high
1090	   delay or high loss environments, a broadcast or multicast method may
1091	   prove more effective.

1093	   Speaking of multicast, we also leave to future work how multicast is
1094	   implemented, if at all, either in conjunction or as an extension to
1095	   this model.

1097	11.  IANA Considerations

1099	   This memo makes no requests of IANA.

1101	12.  Acknowledgments

1103	   Dino Farinacci, Patrik Faltstrom, Dave Meyer, Joel Halpern, Dave
1104	   Thaler, Mohamed Boucadair, Robin Whittle, Max Pritikin, and Scott
1105	   Brim were very helpful with their reviews of this work.  Thanks also
1106	   to the participants of the Routing Research Group and the IST-RING
1107	   workshop held in Madrid in December of 2007 for their incisive
1108	   comments.  The astute will notice a lengthy References section.  This
1109	   work stands on the shoulders of many others' efforts.

1111	13.  References

1113	13.1.  Normative References

1115	   [1]   Farinacci, D., Fuller, V., Oran, D., and D. Meyer, "Locator/ID
1116	         Separation Protocol (LISP)", draft-farinacci-lisp-07 (work in
1117	         progress), April 2008.

1119	   [2]   Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L.,
1120	         Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol --
1121	         HTTP/1.1", RFC 2616, June 1999.

1123	   [3]   Bradner, S., "Key words for use in RFCs to Indicate Requirement
1124	         Levels", BCP 14, RFC 2119, March 1997.

1126	   [4]   Kaliski, B., "PKCS #7: Cryptographic Message Syntax Version
1127	         1.5", RFC 2315, March 1998.

1129	   [5]   International Telecommunications Union, "Information technology
1130	         - Open Systems Interconnection - The Directory: Public-key and
1131	         attribute certificate frameworks", ITU-T Recommendation X.509,
1132	         ISO Standard 9594-8, March 2000.

1134	   [6]   Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
1135	         Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986,
1136	         January 2005.

1138	13.2.  Informational References

1140	   [7]   Kantor, B. and P. Lapsley, "Network News Transfer Protocol",
1141	         RFC 977, February 1986.

1143	   [8]   Mockapetris, P., "Domain names - concepts and facilities",
1144	         STD 13, RFC 1034, November 1987.

1146	   [9]   Huitema, C., "An Experiment in DNS Based IP Routing", RFC 1383,
1147	         December 1992.

1149	   [10]  Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4
1150	         (BGP-4)", RFC 4271, January 2006.

1152	   [11]  Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS)
1153	         Protocol Version 1.1", RFC 4346, April 2006.

1155	   [12]  Carpenter, B., "IETF Plenary Presentation: Routing and
1156	         Addressing: Where we are today", March 2007.

1158	   [13]  Grune, R., Baalbergen, E., Waage, M., Berliner, B., and J.
1159	         Polk, "CVS: Concurrent Versions System", November 1985.

1161	   [14]  Smith, R., Gottesman, Y., Hobbs, B., Lear, E., Kristofferson,
1162	         D., Benton, D., and P. Smith, "A mechanism for maintaining an
1163	         up-to-date GenBank database via Usenet", CABIOS , April 1991.

1165	   [15]  International International Telephone and Telegraph
1166	         Consultative Committee, "Information Technology - Open Systems
1167	         Interconnection - The Directory: Authentication Framework",
1168	         CCITT Recommendation X.509, November 1988.

1170	   [16]  Maler, E., Paoli, J., Yergeau, F., Cowan, J., Bray, T., and C.
1171	         Sperberg-McQueen, "Extensible Markup Language (XML) 1.1", World
1172	         Wide Web Consortium FirstEdition REC-xml11-20040204,
1173	         February 2004, <http://www.w3.org/TR/2004/REC-xml11-20040204>.

1175	   [17]  Nielsen, H., Hadley, M., Karmarkar, A., Lafon, Y., Mendelsohn,
1176	         N., Moreau, J., and M. Gudgin, "SOAP Version 1.2 Part 1:
1177	         Messaging Framework (Second Edition)", World Wide Web
1178	         Consortium Recommendation REC-soap12-part1-20070427,
1179	         April 2007,
1180	         <http://www.w3.org/TR/2007/REC-soap12-part1-20070427>.

1182	   [18]  Mendelsohn, N., Karmarkar, A., Lafon, Y., Hadley, M., Gudgin,
1183	         M., Nielsen, H., and J. Moreau, "SOAP Version 1.2 Part 2:
1184	         Adjuncts (Second Edition)", World Wide Web Consortium
1185	         Recommendation REC-soap12-part2-20070427, April 2007,
1186	         <http://www.w3.org/TR/2007/REC-soap12-part2-20070427>.

1188	   [19]  Farinacci, D., "LISP Alternative Topology (LISP+ALT)",
1189	         draft-fuller-lisp-alt-02 (work in progress), April 2008.

1191	   [20]  Brim, S., "LISP-CONS: A Content distribution Overlay Network
1192	         Service for LISP", draft-meyer-lisp-cons-04 (work in progress),
1193	         April 2008.

1195	URIs

1197	   [21]  <http://www.openssl.org>

1199	   [22]  <http://en.wikipedia.org/wiki/Usenet>

1201	Appendix A.  Generating and verifying the database signature with
1202	             OpenSSL

1204	   As previously mentioned, one goal of NERD was to use off-the-shelf
1205	   tools to both generate and retrieve the database.  To many, PKI is
1206	   magic.  This section is meant to provide at least some clarification
1207	   as to both the generation and verification process, complete with
1208	   command line examples.  Not included is how you get the entries
1209	   themselves.  We'll assume they exist, and that you're just trying to
1210	   sign the database.

1212	   To sign the database, to start with, you need a database file that
1213	   has a database header described in Section 3.  Block size should be
1214	   zero, and there should be no PKCS#7 block at this point.  You also
1215	   need a certificate and its private key with which you will sign the
1216	   database.

1218	   The OpenSSL "smime" command contains all the functions we need from
1219	   this point forth.  To sign the database, issue the following command:

1221	         openssl smime -binary -sign -outform DER -signer yourcert.crt \
1222	                 -inkey yourcert.key -in database-file -out signature

1224	   -binary states that no MIME canonicalization should be performed.
1225	   -sign indicates that you are signing the file that was given as the
1226	   argument to -in.  The output format (-outform) is binary DER, and
1227	   your public certificate is provided with -signer along with your key
1228	   with -inkey.  The signature itself is specified with -out.

1230	   The resulting file "signature" is then copied into to PKCS#7 block in
1231	   the database header, its size in bytes is recorded in the PKCS#7
1232	   block size field, and the resulting file is ready for distribution to
1233	   ITRs.

1235	   To verify a database file, first retrieve the PKCS#7 block from the
1236	   file by copying the appropriate number of bytes into another file,
1237	   say "signature".  Next, zero this field, and set the block size field
1238	   to 0.  Next use the "smime" command to verify the signature as
1239	   follows:

1241	       openssl smime -binary -verify -inform DER -content database-file
1242	               -out /dev/null -in signature

1244	   Openssl will return "Verification OK" if the signature is correct.
1245	   OpenSSL provides sufficiently rich libraries to accomplish the above
1246	   within the C programming language with a single pass.

1248	Appendix B.  Changes

1250	   This section to be removed prior to publication.

1252	   o  04: Analysis change: IPv6 RLOCs are 128 bits.  While they can be
1253	      shortened to 64 bits, that involves substantial ETR changes and
1254	      expenditure of IPv6 networks, which is probably unnecessary, and
1255	      can be left as a later optimization.  Added an option of
1256	      independent operators.  Processed all but two of Dino's comments.
1257	      Addressed Scott's comments.  Removed existing work analysis.
1258	      Saving that for another day.  Clarified OpenSSL Appendix.
1259	   o  05: clean DOWN. reinsert some text for historical purposes.
1260	   o  04: cleanup
1261	   o  03: Change dbname to a domain name, indicate that is what is in
1262	      the subject of the X.509 certificate, and list editorial changes,
1263	      update acknowledgments.
1264	   o  02: Incorporate some of Dave Thaler's comments.  Add
1265	      authentication block detail.  Modify analysis to take IPv6 into
1266	      account, along with a more realistic number of RLOCs per EID.  Add
1267	      some comments about potential risks of a cold start.  Add S/MIME
1268	      example as appendix A and take out old ToDo.  Provide some amount
1269	      of compression of IPv6 addresses by limiting their size to
1270	      significant bytes rounded to a four byte word boundary.
1271	   o  01: Massive spelling correction, URI example correction.
1272	   o  00: Initial Revision.

1274	Author's Address

1276	   Eliot Lear
1277	   Cisco Systems GmbH
1278	   Glatt-com
1279	   Glattzentrum, ZH  CH-8301
1280	   Switzerland

1282	   Phone: +41 44 878 7525
1283	   Email: lear@cisco.com