idnits 2.17.1 

draft-lear-lisp-nerd-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 2) being 60 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There are 2 instances of too long lines in the document, the longest one
     being 42 characters in excess of 72.

  == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the
     document.

  == There are 2 instances of lines with private range IPv4 addresses in the
     document.  If these are generic example addresses, they should be changed
     to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x,
     198.51.100.x or 203.0.113.x.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 109 has weird spacing: '...instead  learn...'

  == Line 760 has weird spacing: '...thority  manag...'

  == Line 977 has weird spacing: '...nd user  may n...'

  == Line 1037 has weird spacing: '... of the  netwo...'

  == Unrecognized Status in 'Intended status: Experimental Protocol',
     assuming Proposed Standard

     (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard',
     'Proposed Standard', 'Best Current Practice', 'Informational',
     'Experimental', 'Informational', 'Historic'.)

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (April 20, 2012) is 4388 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Looks like a reference, but probably isn't: '1' on line 816

  -- Looks like a reference, but probably isn't: '2' on line 929

  == Unused Reference: 'I-D.meyer-lisp-cons' is defined on line 1258, but no
     explicit reference was found in the text

  == Outdated reference: A later version (-24) exists of draft-ietf-lisp-22

  ** Downref: Normative reference to an Experimental draft: draft-ietf-lisp
     (ref. 'I-D.ietf-lisp')

  -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.X509.2000'

  ** Obsolete normative reference: RFC 6125 (Obsoleted by RFC 9525)

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Obsolete informational reference (is this intentional?): RFC 4346
     (Obsoleted by RFC 5246)

  == Outdated reference: A later version (-23) exists of
     draft-ietf-dane-protocol-19

  == Outdated reference: A later version (-16) exists of draft-ietf-lisp-ms-15


     Summary: 3 errors (**), 0 flaws (~~), 13 warnings (==), 7 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                            E. Lear
3	Internet-Draft                                        Cisco Systems GmbH
4	Intended status: Experimental Protocol                    April 20, 2012
5	Expires: October 20, 2012

7	               NERD: A Not-so-novel EID to RLOC Database
8	                      draft-lear-lisp-nerd-09.txt

10	Abstract

12	   LISP is a protocol to encapsulate IP packets in order to allow end
13	   sites to route to one another without injecting routes from one end
14	   of the Internet to another.  This memo presents an experimental
15	   database and a discussion of methods to transport the mapping of EIDs
16	   to RLOCs to routers in a reliable, scalable, and secure manner.  Our
17	   analysis concludes that transport of of all EID/RLOC mappings scales
18	   well to at least 10^8 entries.

20	Status of this Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on October 20, 2012.

37	Copyright Notice

39	   Copyright (c) 2012 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents (http://trustee.ietf.org/
44	   license-info) in effect on the date of publication of this document.
45	   Please review these documents carefully, as they describe your rights
46	   and restrictions with respect to this document.  Code Components
47	   extracted from this document must include Simplified BSD License text
48	   as described in Section 4.e of the Trust Legal Provisions and are
49	   provided without warranty as described in the Simplified BSD License.

51	Table of Contents

53	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
54	     1.1.  Applicability  . . . . . . . . . . . . . . . . . . . . . .  3
55	     1.2.  Base Assumptions . . . . . . . . . . . . . . . . . . . . .  3
56	     1.3.  What is NERD?  . . . . . . . . . . . . . . . . . . . . . .  4
57	     1.4.  Glossary . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	   2.  Theory of Operation  . . . . . . . . . . . . . . . . . . . . .  5
59	     2.1.  Database Updates . . . . . . . . . . . . . . . . . . . . .  5
60	     2.2.  Communications between ITR and ETR . . . . . . . . . . . .  6
61	     2.3.  Who are database authorities?  . . . . . . . . . . . . . .  6
62	   3.  NERD Format  . . . . . . . . . . . . . . . . . . . . . . . . .  7
63	     3.1.  NERD Record Format . . . . . . . . . . . . . . . . . . . .  9
64	     3.2.  Database Update Format . . . . . . . . . . . . . . . . . . 10
65	   4.  NERD Distribution Mechanism  . . . . . . . . . . . . . . . . . 10
66	     4.1.  Initial Bootstrap  . . . . . . . . . . . . . . . . . . . . 10
67	     4.2.  Retrieving Changes . . . . . . . . . . . . . . . . . . . . 11
68	   5.  Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
69	     5.1.  Database Size  . . . . . . . . . . . . . . . . . . . . . . 12
70	     5.2.  Router Throughput Versus Time  . . . . . . . . . . . . . . 13
71	     5.3.  Number of Servers Required . . . . . . . . . . . . . . . . 14
72	     5.4.  Security Considerations  . . . . . . . . . . . . . . . . . 16
73	       5.4.1.  Use of Public Key Infrastructures (PKIs) . . . . . . . 17
74	       5.4.2.  Other Risks  . . . . . . . . . . . . . . . . . . . . . 19
75	   6.  Why not use XML? . . . . . . . . . . . . . . . . . . . . . . . 19
76	   7.  Other Distribution Mechanisms  . . . . . . . . . . . . . . . . 19
77	     7.1.  What About DNS as a mapping retrieval model? . . . . . . . 20
78	     7.2.  Use of BGP and LISP+ALT  . . . . . . . . . . . . . . . . . 21
79	     7.3.  Perhaps use a hybrid model?  . . . . . . . . . . . . . . . 22
80	   8.  Deployment Issues  . . . . . . . . . . . . . . . . . . . . . . 22
81	     8.1.  HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
82	   9.  Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 22
83	   10. Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 23
84	   11. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 24
85	   12. Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 24
86	   13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
87	     13.1.  Normative References  . . . . . . . . . . . . . . . . . . 24
88	     13.2.  Informative References  . . . . . . . . . . . . . . . . . 25
89	   Appendix A. Generating and verifying the database signature with
90	               OpenSSL  . . . . . . . . . . . . . . . . . . . . . . . 26
91	   Appendix B. Changes  . . . . . . . . . . . . . . . . . . . . . . . 27
92	   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28

94	1.  Introduction

96	   Locator/ID Separation Protocol (LISP) [I-D.ietf-lisp] separates an IP
97	   address used by a host and local routing system from the locators
98	   advertised by BGP participants on the Internet in general, and in the
99	   default free zone (DFZ) in particular.  It accomplishes this by
100	   establishing a mapping between globally unique endpoint identifiers
101	   (EIDs) and routing locators (RLOCs).  This reduces the amount of
102	   state change that occurs on routers within the default-free zone on
103	   the Internet, while enabling end sites to be multihomed.

105	   In some mapping distribution approaches to LISP the mapping is
106	   learned via data-triggered control messages between ingress tunnel
107	   routers (ITRs) and egress tunnel routers (ETRs) through an alternate
108	   routing topology [I-D.ietf-lisp-alt].  In other approaches of LISP,
109	   the mapping from EIDs to RLOCs is instead  learned through some other
110	   means.  This memo addresses different approaches to the problem, and
111	   specifies a Not-so-novel EID RLOC Database (NERD) and methods to both
112	   receive the database and to receive updates.

114	   NERD is offered primarily as a way to avoid dropping packets, the
115	   underlying assumption being that dropping packets is bad for
116	   applications and end users.  Those who do not agree with this
117	   underlying assumption may find that other approaches make more sense.

119	   NERD is specified in such a way that the methods used to distribute
120	   or retrieve it may vary over time.  Multiple databases are supported
121	   in order to allow for multiple data sources.  An effort has been made
122	   to divorce the database from access methods so that both can evolve
123	   independently through experimentation and operational validation.

125	1.1.  Applicability

127	   This memo is based on experiments performed in the 2007-2009 time
128	   frame.  At the time of its publication, the author is unaware of
129	   operational use of NERD.  Those wishing to pursue NERD should
130	   consider the substantial amount of work left for the future.  See
131	   Section 10 for more details.

133	1.2.  Base Assumptions

135	   In order to specify a mapping it is important to understand how it
136	   will be used, and the nature of the data being mapped.  In the case
137	   of LISP, the following assumptions are pertinent:

139	   o  The data contained within the mapping changes only on provisioning
140	      or configuration operations, and is not intended to change when a
141	      link either fails or is restored.  Some other mechanism such as
142	      the use of LISP Reachability Bits with mapping replies handles
143	      healing operations, particularly when a tail circuit within an
144	      service provider's aggregate goes down.  NERD can be used as a
145	      verification method to ensure that whatever operational mapping
146	      changes an ITR receives are authorized.

148	   o  While weight and priority are defined, these are not hop-by-hop
149	      metrics.  Hence the information contained within the mapping does
150	      not change based on where one sits within the topology.

152	   o  A purpose of LISP being to reduce control plane overhead by
153	      reducing "rate X state" complexity, updates to the mapping will be
154	      relatively rare.

156	   o  Because NERD is designed to ease interdomain routing, its use is
157	      intended within the inter-domain environment.  That is, NERD is
158	      best implemented at either the customer edge or provider edge, and
159	      there will be on the order of as many ITRs and EID Prefixes as
160	      there are connections to Internet Service Providers by end
161	      customers.

163	   o  As such, NERD cannot be the sole means to implement host mobility,
164	      although NERD may be in used in conjunction with other mechanisms.

166	1.3.  What is NERD?

168	   NERD is a Not-so-novel EID to RLOC Database.  It consists of the
169	   following components:

171	   1.  a network database format;

173	   2.  a change distribution format;

175	   3.  a database retrieval/bootstrapping method;

177	   4.  a change distribution method.

179	   The network database format is compressible.  However, at this time
180	   we specify no compression method.  NERD will make use of potentially
181	   several transport methods, but most notably HTTP [RFC2616].  HTTP has
182	   restart and compression capabilities.  It is also widely deployed.

184	   There exist many methods to show differences between two versions of
185	   a database or a file, UNIX's "diff" being the classic example.  In
186	   this case, because the data is well structured and easily keyed, we
187	   can make use of a very simple format for version differences that
188	   simply provides a list of EID/RLOC mappings that have changed using
189	   the same record format as the database, and a list of EIDs that are
190	   to be removed.

192	1.4.  Glossary

194	   The reader is once again referred to [I-D.ietf-lisp] for a general
195	   glossary of terms related to LISP.  The following terms are specific
196	   to this memo.

198	   Base Distribution URI: An Absolute-URI as defined in Section 4.3 of
199	      [RFC3986] from which other references are relative.  The base
200	      distribution URI is used to construct a URI to an EID/RLOC mapping
201	      database.  If more than one NERD is known then there will be one
202	      or more base distribution URIs associated with each (although each
203	      such base distribution URI may have the same value).

205	   EID Database Authority: The authority that will sign database files
206	      and updates.  It is the source of both.

208	   The Authority: Shorthand for the EID Database Authority.

210	   NERD: (N)ot-so-novel (E)ID to (R)LOC (D)atabase.

212	   AFI Address Family Identifier.

214	   Pull Model: An architecture where clients pull only the information
215	      they need at any given time, such as when a packet arrives for
216	      forwarding.

218	   Push Model: An architecture in which clients receive an entire
219	      dataset, containing data they may or may not require, such as
220	      mappings for EIDs that no host served is attempting to send to.

222	   Hybrid Model: An architecture in which some information is pushed
223	      toward the receiver from a source and some information is pulled
224	      by the receiver.

226	2.  Theory of Operation

228	   Operational functions are split into two components: database updates
229	   and state exchange between ITR and ETR during a communication.

231	2.1.  Database Updates

233	   What follows is a summary of how NERDs are generated and updated.
234	   Specifics can be found in Section 3. The general way in which NERD
235	   works is as follows:

237	   1.  A NERD is generated by an authority that allocates provider
238	       independent (PI) addresses (e.g., IANA or an RIR) which are used
239	       by sites as EIDs.  As part of this process the authority
240	       generates a digest for the database and signs it with a private
241	       key whose public key is part of an X.509 certificate.
242	       [ITU.X509.2000]  That signature along with a copy of the
243	       authority's public key is included in the NERD.

245	   2.  The NERD is distributed to a group of well known servers.

247	   3.  ITRs retrieve an initial copy of the NERD via HTTP when they come
248	       into service.

250	   4.  ITRs are preconfigured with a group of certificates whose private
251	       keys are used by database authorities to sign the NERD.  This
252	       list of certificates should be configurable by administrators.

254	   5.  ITRs next verify both the validity of the public key and the
255	       signed digest.  If either fail validation, the ITR attempts to
256	       retrieve the NERD from a different source.  The process iterates
257	       until either a valid database is found or the list of sources is
258	       exhausted.

260	   6.  Once a valid NERD is retrieved, the ITR installs it into both
261	       non-volatile and local memory.

263	   7.  At some point the authority updates the NERD and increments the
264	       database version counter.  At the same time it generates a list
265	       of changes, which it also signs, as it does with the original
266	       database.

268	   8.  Periodically ITRs will poll from their list of servers to
269	       determine if a new version of the database exists.  When a new
270	       version is found, an ITR will attempt to retrieve a change file,
271	       using its list of preconfigured servers.

273	   9.  The ITR validates a change file just as it does the original
274	       database.  Assuming the change file passes validation, the ITR
275	       installs new entries, overwrites existing ones, and removes empty
276	       entries, based on the content of the change file.

278	   As time goes on it is quite possible that an ITR may probe a list of
279	   configured peers for a database or change file copy.  It is equally
280	   possible that peers might advertise to each other the version number
281	   of their database.  Such methods are not explored in depth in this
282	   memo, but are mentioned for future consideration.

284	2.2.  Communications between ITR and ETR

286	   [I-D.ietf-lisp] describes the basic approach to what happens when a
287	   packet arrives at an ITR, and what communications between ITR and ETR
288	   take place.  NERD provides an optimistic approach to establishing
289	   communications with an ETR that is responsible for a given EID
290	   prefix.  State must be kept, however, on an ITR to determine whether
291	   that ETR is in fact reachable.  It is expected that this is a common
292	   requirement across LISP mapping systems, and will be handled in the
293	   core LISP architecture.

295	2.3.  Who are database authorities?

297	   This memo does not specify who the database authority is.  That is
298	   because there are several possible operational models.  In each case
299	   the number of database authorities is meant to be small so that ITRs
300	   need only keep a small list of authorities, similar to the way a name
301	   server might cache a list of root servers.

303	   o  A single database authority exists.  In this case all entries in
304	      the database are registered to a single entity, and that entity
305	      distributes the database.  Because the EID space is provider
306	      independent address space, there is no architectural requirement
307	      that address space be hierarchically distributed to anyone, as
308	      there is with provider-assigned address space.  Hence, there is a
309	      natural affinity between the IANA function and the database
310	      authority function.

312	   o  Each region runs a database authority.  In this case, provider
313	      independent address space is allocated to either Regional Internet
314	      Registries (RIRs) or to affiliates of such organizations of
315	      network operations guilds (NOGs).  The benefit of this approach is
316	      that there is no single organization that controls the database.
317	      It allows one database authority to backup another.  One could
318	      envision as many as ten database authorities in this scenario.
319	      One drawback to this approach, however, is that any reference to a
320	      region imposes a notion of locality, thus potentially diminishing
321	      the split between locator and identifier.

323	   o  Each country runs a database authority.  This could occur should
324	      countries decide to regulate this function.  While limiting the
325	      scope of any single database authority as the previous scenario
326	      describes, this approach would introduce some overhead as the list
327	      of database authorities would grow to as many as 200, and possibly
328	      more if jurisdictions within countries attempted to regulate the
329	      function.  There are two drawbacks to this approach.  First, as
330	      distribution of EIDs is driven to more local jurisdictions, an EID
331	      prefix is tied even tighter to a location.  Second, a large number
332	      of database authorities will demand some sort of discovery
333	      mechanism.

335	   o  Independent operators manage database authorities.  This has the
336	      appeals of being location independent, and enabling competition
337	      for good performance.  This method has the drawback of potentially
338	      requiring a discovery mechanism.

340	   The latter two approaches are not mutually exclusive.  While this
341	   specification allows for multiple databases, discovery mechanisms are
342	   left as future work.

344	3.  NERD Format

346	   The NERD consists of a header that contains a database version and a
347	   signature that is generated by ignoring the signature field and
348	   setting the authentication block length to 0 (NULL).  The
349	   authentication block itself consists of a signature and a certificate
350	   whose private key counterpart was used to generate the signature.

352	   Records are kept sorted in numeric order with AFI plus EID as primary
353	   key and prefix length as secondary.  This is so that after a database
354	   update it should be possible to reconstruct the database to verify
355	   the digest signature, which may be retrieved separately from the
356	   database for verification purposes.

358	        0                   1                   2                   3
359	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
360	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
361	       | Schema Vers=1 |  DB Code      |     Database Name Size        |
362	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
363	       |                      Database Version                         |
364	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
365	       |                   Old Database Version or 0                   |
366	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
367	       |                                                               |
368	       |                        Database Name                          |
369	       |                                                               |
370	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
371	       |       PKCS#7 Block Size       |          Reserved             |
372	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
373	       |                                                               |
374	       |      PKCS#7 Block containing Certificate and Signature        |
375	       |                                                               |
376	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

378	   Database Header

380	   The DB Code indicates 0 if what follows is an entire database or 1 if
381	   what follows is an update.  The database file version is incremented
382	   each time the complete database is generated by the authority.  In
383	   the case of an update, the database file version indicates the new
384	   database file version, and the old database file version is indicated
385	   in the "old DB version" field.  The database file version is used by
386	   routers to determine whether or not they have the most current
387	   database.

389	   The database name is a DNS-ID, as specified in [RFC6125].  This is
390	   the name that will appear in the Subject field of the certificate
391	   used to verify the database.  The purpose of the database name is to
392	   allow for more than one database.  Such databases would be merged by
393	   the router.  It is important that an EID/RLOC mapping be listed in no
394	   more than one database, lest inconsistencies arise.  However, it may
395	   be possible to transition a mapping from one database to another.
396	   During the transition period, the mappings would be identical.  When
397	   they are not, the resultant behavior will be undefined.  The database
398	   name is padded with NULLs to the nearest fourth byte.

400	   The PKCS#7 [RFC2315] authentication block contains a DER encoded
401	   [ITU.X509.2000] signature and associated public key.  For purposes of
402	   this experiment all implementations will support the RSA encryption
403	   signature algorithm and SHA1 digest algorithm, and the standard
404	   attributes are expected to be present.

406	   N.B., it has been suggested that Cryptographic Message Syntax (CMS)
407	   [RFC5652] be used instead of PKCS#7.  At the time this experiment was
408	   performed, CMS was not yet widely deployed.  However, it is certainly
409	   the correct direction, and should be strongly considered in future
410	   related work.

412	3.1.  NERD Record Format

414	   As distributed over the network, NERD records appear as follows:

416	        0                   1                   2                   3
417	        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
418	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
419	       | Num. RLOCs    | EID Pref. Len  |           EID AFI            |
420	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
421	       |                       End point identifier                    |
422	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
423	       | Priority 1    |    Weight 1   |             AFI 1             |
424	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
425	       |                       Routing Locator 1                       |
426	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
427	       | Priority 2    |    Weight 2   |             AFI 2             |
428	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
429	       |                       Routing Locator 2                       |
430	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
431	       | Priority 3    |    Weight 3   |             AFI 3             |
432	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
433	       |                       Routing Locator 3...                    |
434	       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

436	   EID AFI is the AFI of the EID.  Priority N and Weight N, and AFI N
437	   are associated with Routing Locator N.  There will always be at least
438	   one routing locator.  The minimum record size for IPv4 is 16 bytes.
439	   Each additional IPv4 RLOC increases the record size by 8 bytes.  The
440	   purpose of this format is to keep the database compact, but somewhat
441	   easily read.  The meaning of weight and priority are described in
442	   [I-D.ietf-lisp].  The format of the AFI is specified by IANA the
443	   "Address Family Numbers" registry, with the exception of how IPv6 EID
444	   prefixes are stored.

446	   NERD assumes that EIDs stored in the database are prefixes, and
447	   therefore are accompanied with prefix lengths.  In order to reduce
448	   storage and transmission amounts for IPv6, only the necessary number
449	   of bytes of an EID as specified by the prefix length are kept in the
450	   record, rounded to the nearest four byte (word) boundary.  For
451	   instance, if the prefix length is /49, the nearest four-byte word
452	   boundary would require that eight bytes are stored.  IPv6 RLOCs are
453	   represented as normal 128-bit IPv6 addresses.

455	3.2.  Database Update Format

457	   A database update contains a set of changes to an existing database.
458	   Each AFI/EID/mask-length tuple may have zero or more RLOCs associated
459	   with it.  In the case where there are no RLOCs, the EID entry is
460	   removed from the database.   Records that contain EIDs and prefix
461	   lengths that were not previously listed are simply added.  Otherwise,
462	   the old record for the EID and prefix length is replaced by the more
463	   current information.  The record format used by the a database update
464	   is the same as described in Section 3.1.

466	4.  NERD Distribution Mechanism

468	4.1.  Initial Bootstrap

470	   Bootstrap occurs when a router needs to retrieve the entire database.
471	   It knows it needs to retrieve the entire database because either it
472	   has none or an update too substantial to process, as might be the
473	   case if a router has been out of service for a substantially lengthy
474	   period of time.

476	   To bootstrap the ITR appends the database name plus "/current/
477	   entiredb" to a Base Distribution URI and retrieves the file via HTTP.
478	   More formally (using ABNF from [RFC5234]):

480	      entire-db =    base-uri dbname "/current/entiredb"
481	      base-uri  =    uri ; From RFC 3986
482	      dbname    =    DNS-ID ; from RFC-6125

484	   For example,if the base distribution URI is "http://www.example.com/
485	   eiddb/", and assuming a database name of "nerd.arin.net", the ITR
486	   would request
487	   "http://www.example.com/eiddb/nerd.arin.net/current/entiredb".
488	   Routers check the signature on the database prior to installing it,
489	   and check that the database schema matches a schema they understand.
490	   Once a router has a valid database it stores that database in some
491	   sort of non-volatile memory (e.g., disk, flash memory, etc).

493	   N.B., the host component for such URIs should not resolve to a LISP
494	   EID, lest a circular dependency be created.

496	4.2.  Retrieving Changes

498	   In order to retrieve a set of database changes an ITR will have
499	   previously retrieved the entire database.  Hence it knows the current
500	   version of the database it has.  Its first step for retrieving
501	   changes is to retrieve the current version number of the database.
502	   It does so by appending "/current/version" to the base distribution
503	   URI and database name and retrieving the file.  Its format is text
504	   and it contains the integer value of the current database version.

506	   Once an ITR has retrieved the current version it compares the version
507	   of its local copy.  If there is no difference, then the router is up
508	   to date and need take no further actions until it next checks.

510	   If the versions differ, the router next sends a request for the
511	   appropriate change file by appending "current/changes/" and the
512	   textual representation of the version of its local copy of the
513	   database to the base distribution URI.  More formally:

515	      db-version    =    base-uri dbname "/current/version"
516	      db-curupdate  =    base-uri dbname "/current/changes/" old-version
517	      old-version   =    1*DIGIT

519	   For example, if the current version of the database is 1105503 and
520	   router's version is 1105500, and the base URI and database name are
521	   the same as above, the router would first request "http://
522	   www.example.com/eiddb/nerd.arin.net/current/version" to determine
523	   that it is out of date, and to also learn the current version.  It
524	   would then attempt to retrieve "http://www.example.com/eiddb/
525	   nerd.arin.net/current/changes/1105500".

527	   The server may not have that change file, either because there are
528	   too many versions between what the router has and what is current, or
529	   because no such change file was generated.  If the server has changes
530	   from the routers version to any later version, the server issues an
531	   HTTP redirect to that change file, and the router retrieves and
532	   process it.  More formally:

534	      db-incupdate    =    base-uri dbname "/" newer-version
535	                           "/changes/" old-version
536	      newer-version   =    1*DIGIT

538	   For example:

540	   "http://www.example.com/eiddb/nerd.arin.net/1105450/changes/1105401"
541	   would update a router from version 1105401 to 1105450. Once it has
542	   done so, the router should then repeat the process until it has
543	   brought itself up to date.

545	   This begs the question: how does a router know to retrieve version
546	   1105450 in our example above?  It cannot.  A redirect must be given
547	   by the server to that URI when the router attempts to retrieve
548	   differences from the current version, say, 1105503.

550	   While it is unlikely that database versions would wrap, as they
551	   consists of 32 bit integers, should the event occur, ITRs should
552	   attempt first to retrieve a change file when their current version
553	   number is within 10,000 of 2^32 and they see a version available that
554	   is less than 10,000.  Barring the availability of a change file, the
555	   ITR can still assume that the database version has wrapped and
556	   retrieve a new copy.  It may be safer in future work to include
557	   additional wrap information or a larger field to avoid having to use
558	   any heuristics.

560	5.  Analysis

562	   We will start our analysis by looking at how much data will be
563	   transferred to a router during bootstrap conditions.  We will then
564	   look at the bandwidth required.  Next we will turn our concerns to
565	   servers.  Finally we will ponder the effect of providing only
566	   changes.

568	   In the analysis below we treat the overhead of the database header as
569	   insignificant (because it is).  The analysis should be similar,
570	   whether a single database or multiple databases are employed, as we
571	   would assume that no entry would appear more than once.

573	5.1.  Database Size

575	   By its very nature the information to be transported is relatively
576	   static and is specifically designed to be topologically insensitive.
577	   That is, every ITR is intended to have the same set of RLOCs for a
578	   given EID.  While some processing power will be necessary to install
579	   a table, the amount required should be far less than that of a
580	   routing information database because the level of entropy is intended
581	   to be lower.

583	   For purposes of this analysis, we will assume that the world has
584	   migrated to IPv6, as this increases the size of the database, which
585	   would be our primary concern.  However, to mitigate the size
586	   increase, we have limited the size of the prefix transmitted.  For
587	   purposes of this analysis, we shall assume an average prefix length
588	   of 64 bits.

590	   Based on that assumption, Section 3.1 states that mapping information
591	   for each EID/Prefix includes a group of RLOCs, each with an
592	   associated priority and weight, and that a minimum record size with
593	   IPv6 EIDs with at least one RLOC is 30 bytes uncompressed.  Each
594	   additional IPv6 RLOC costs 20 bytes.

596	               +-----------+--------+--------+---------+
597	               | 10^n EIDs | 2 RLOC | 4 RLOC |  8 RLOC |
598	               +-----------+--------+--------+---------+
599	               |         4 | 500 KB | 900 KB | 1.70 MB |
600	               |         5 | 5.0 MB | 9.0 MB | 17.0 MB |
601	               |         6 |  50 MB |  90 MB |  170 MB |
602	               |         7 | 500 MB | 900 MB | 1.70 GB |
603	               |         8 | 5.0 GB | 9.0 GB | 17.0 GB |
604	               +-----------+--------+--------+---------+

606	    Database size for IPv6 routes with average prefix length = 64 bits

608	   Entries in the above table are derived as follows:

610	        E * (30 + 20 * (R - 1 ))

612	   where E = number of EIDs (10^n), R = number of RLOCs per EID.

614	   Our scaling target is to accommodate 10^8 multihomed systems, which
615	   is one order magnitude greater than what is discussed in [CARP07].
616	   At 10^8 entries, a device could be expected to use between 5 and 17
617	   gigabytes of RAM for the mapping.  No matter the method of
618	   distribution, any router that sits in the core of the Internet would
619	   require near this amount of memory in order to perform the ITR
620	   function.  Large enterprise ETRs would be similarly strained, simply
621	   due to the diversity of of sites that communicate with one another.
622	   The good news is that this is not our starting point, but rather our
623	   scaling target, a number that we intend to reach by the year 2050.
624	   Our starting point is more likely in the neighborhood of 10^4 or 10^5
625	   EIDs, thus requiring between 500KB and 17 MB.

627	5.2.  Router Throughput Versus Time

629	       +-------------------+---------+--------+---------+-------+
630	       | Table Size (10^N) |   1mb/s | 10mb/s | 100mb/s | 1gb/s |
631	       +-------------------+---------+--------+---------+-------+
632	       |                 6 |       8 |    0.8 |    0.08 | 0.008 |
633	       |                 7 |      80 |      8 |     0.8 |  0.08 |
634	       |                 8 |     800 |     80 |       8 |   0.8 |
635	       |                 9 |   8,000 |    800 |      80 |     8 |
636	       |                10 |  80,000 |  8,000 |     800 |    80 |
637	       |                11 | 800,000 | 80,000 |   8,000 |   800 |
638	       +-------------------+---------+--------+---------+-------+

640	                     Number of seconds to process NERD

642	   The length of time it takes to receive the database is significant in
643	   models where the device acquires the entire table.  During this
644	   period of time, either the router will be unable to route packets
645	   using LISP or it must use some sort of query mechanism for specific
646	   EIDs as the rest it populates its table through the transfer.  Table
647	   2 shows us that at our scaling target, the length of time it would
648	   take for a router using 1 mb/s of bandwidth is about 80 seconds.  We
649	   can measure the processing rate in small numbers of hours for any
650	   transfer speed greater than that.  The fastest processing time shows
651	   us as taking 8 seconds to process an entire table of 10^9 bytes and
652	   80 seconds for 10^10 bytes.

654	5.3.  Number of Servers Required

656	   As easy as it may be for a router to retrieve, the aggregate
657	   information may be difficult for servers to transmit, assuming the
658	   information is transmitted in aggregate (we'll revisit that
659	   assumption later).

661	   +----------------+--------------+-------------+----------+----------+
662	   | # Simultaneous |   10 Servers | 100 Servers |    1,000 |   10,000 |
663	   |       Requests |              |             |  Servers |  Servers |
664	   +----------------+--------------+-------------+----------+----------+
665	   |            100 |          720 |          72 |       72 |       72 |
666	   |          1,000 |        7,200 |         720 |       72 |       72 |
667	   |         10,000 |       72,000 |       7,200 |      720 |       72 |
668	   |        100,000 |      720,000 |      72,000 |    7,200 |      720 |
669	   |      1,000,000 |    7,200,000 |     720,000 |   72,000 |    7,200 |
670	   |     10,000,000 |   72,000,000 |   7,200,000 |  720,000 |   72,000 |
671	   +----------------+--------------+-------------+----------+----------+

673	     Retrieval time per number of servers in seconds.  Assumes average
674	   10^8 entries with 4 RLOCs per EID and that each server has access to
675	    1gb/s and 100% efficient use of that bandwidth and no compression.

677	   Entries in the above table were generated using the following method:

679	   For 10^8 entries with four RLOCs per EID, the table size is 9.0GB,
680	   per our previous table.  Assume 1 Gb/s transfer rates and 100%
681	   utilization.  Protocol overhead is ignored for this exercise.  Hence
682	   a single transfer X takes 48 seconds and can get no faster.

684	   With this in mind, each entry is as follows:

686	            max(1X,N*X/S)

688	     where N=number of transfers, X = 72 seconds,
689	     and S = number of servers.

691	   If we have a distribution model which every device must retrieve the
692	   mapping information upon start, Table 3 shows the length of time in
693	   seconds it will take for a given number of servers to complete a
694	   transfer to a given number of devices.  This table says, as an
695	   example, that it would take 72,000 seconds (20 hours) for one million
696	   ITRs to simultaneously retrieve the database from one thousand
697	   servers, assuming equal load distribution.  Should a cold start
698	   scenario occur, this number should be of some concern.  Hence it is
699	   important to take some measures both to avoid such a scenario, and to
700	   ease the load should it occur.  The primary defense should be for
701	   ITRs to first attempt to retrieve their databases from their peers or
702	   upstream providers.  Secondary defenses could include data sanity
703	   checks within ITRs, with agreed norms for how much the database
704	   should change in any given update or over any given period of time.
705	   As we will see below, dissemination of changes is considerably less
706	   volume.

708	   +----------------+-------------+---------------+----------------+
709	   | % Daily Change | 100 Servers | 1,000 Servers | 10,000 Servers |
710	   +----------------+-------------+---------------+----------------+
711	   |           0.1% |         300 |            30 |              3 |
712	   |           0.5% |        1500 |           150 |             15 |
713	   |             1% |        3000 |           300 |             30 |
714	   |             5% |      15,000 |          1500 |            150 |
715	   |            10% |      30,000 |          3000 |            300 |
716	   +----------------+-------------+---------------+----------------+

718	     Assuming 10 million routers and a database size of 9GB, resulting
719	   transfer times for hourly updates are shown in seconds, given number
720	     of servers and daily rate of change.  Note that when insufficient
721	    resources are devoted to servers, an unsustainable situation arises
722	   where updates for the next batch would begin prior to the completion
723	                           of the current batch.

725	   This table shows us that with 10,000 servers the average transfer
726	   time with 1Gb/s links for 10,000,000 routers will be 300 seconds with
727	   10% daily change spread over 24 hourly updates.  For a 0.1% daily
728	   change, that number is 3 seconds for a database of size 9.0GB.

730	   The amount of change goes to the purpose of LISP.  If its purpose is
731	   to provide effective multihoming support to end customers, then we
732	   might anticipate relatively few changes.  If, on the other, service
733	   providers attempt to make use of LISP to provide some form of traffic
734	   engineering, we can expect the same data to change more often.  We
735	   can probably not conclude much in this regard without additional
736	   operational experience.  The one thing we can say is that different
737	   applications of the LISP protocol may require new and different
738	   distribution mechanisms.  Such optimization is left for another day.

740	5.4.  Security Considerations

742	   Whichever the answer to our previous question, we must consider the
743	   security of the information being transported.  If an attacker can
744	   forge an update or tamper with the database, he can in effect
745	   redirect traffic to end sites.  Hence, integrity and authenticity of
746	   the NERD is critical.  In addition, a means is required to determine
747	   whether a source is authorized to modify a given database.  No data
748	   privacy is required.  Quite to the contrary, this information will be
749	   necessary for any ITR.

751	   The first question one must ask is who to trust to provide the ITR a
752	   mapping.  Ultimately the owner of the EID prefix is most
753	   authoritative for the mapping to RLOCs.  However, were all owners to
754	   sign all such mappings, ITRs would need to know which owner is
755	   authorized to modify which mapping, creating a problem of O(N^2)
756	   complexity.

758	   We can reduce this problem substantially by investing some trust in a
759	   small number of entities that are allowed to sign entries.  If
760	   authority  manages EIDs much the same way a domain name registrar
761	   handles domains, then the owner of the EID would choose a database
762	   authority she or he trusts, and ITRs must trust each such authority
763	   in order to map the EIDs listed by that authority to RLOCs.  This
764	   reduces the amount of management complexity on the ETR to retaining
765	   knowledge of  O(#authorities), but does require that each authority
766	   establish procedures for authenticating the owner of an EID.  Those
767	   procedures needn't be the same.

769	   There are two classic methods to ensure integrity of data:

771	   o  secure transport of the source of the data to the consumer, such
772	      as Transport Layer Security (TLS) [RFC4346]; and

774	   o  provide object level security.

776	   These methods are not mutually exclusive, although one can argue
777	   about the need for the former, given the latter.

779	   In the case of TLS, when it is properly implemented, the objects
780	   being transported cannot easily be modified by interlopers or so-
781	   called men in the middle.  When data objects are distributed to
782	   multiple servers, each of those servers must be trusted.  As we have
783	   seen above, we could have quite a large number of servers, thus
784	   providing an attacker a large number of targets.  We conclude that
785	   some form of object level security is required.

787	   Object level security involves an authority signing an object in a
788	   way that can easily be verified by a consumer, in this case a router.
789	   In this case, we would want the mapping table and any incremental
790	   update to be signed by the originator of the update.  This implies
791	   that we cannot simply make use of a tool like CVS [CVS].  Instead,
792	   the originator will want to generate diffs, sign them, and make them
793	   available either directly or through some sort of content
794	   distribution or peer to peer network.

796	5.4.1.  Use of Public Key Infrastructures (PKIs)

798	   X.509 provides a certificate hierarchy that has scaled to the size of
799	   the Internet.  The system is most manageable when there are few
800	   certificates to manage.  The model proposed in this memo makes use of
801	   one current certificate per database authority.  The two pieces of
802	   information necessary to verify a signature, therefore, are as
803	   follows:

805	   o  the certificate of the database authority, which can be provided
806	      along with the database; and

808	   o  the certificate authority's certificate.

810	   The latter two pieces of information must be very well known and must
811	   be configured on each ITR.  It is expected that both would change
812	   very rarely, and it would not be unreasonable for such updates to
813	   occur as part of a normal OS release process.

815	   The tools for both signing and verifying are readily available.
816	   OpenSSL [1] provides tools and libraries for both signing and
817	   verifying.  Other tools commonly exist.

819	   Use of PKIs is not without implementation, operational complexity or
820	   risk.  The following risks and mitigations are identified with NERD's
821	   use of PKIs:

823	   If a NERD database authority private key is exposed:

825	      In this case an attacker could sign a false database update,
826	      either redirecting traffic, or otherwise causing havoc.  In this
827	      case, the NERD database administrator must revoke its existing key
828	      and issue a new one.  The certificate is added to a certificate
829	      revocation list (CRL), which may be distributed with both this and
830	      other databases, as well as through other channels.  Because this
831	      event is expected to be rare, and the number of database
832	      authorities is expected to be small, a CRL will be small.  When a
833	      router receives a revocation, it checks it against its existing
834	      databases, and attempts to update the one that is revoked.  This
835	      implies that prior to issuing the revocation, the database
836	      authority would sign an update with the new key.  Routers would
837	      discard updates they have already received that were signed after
838	      the revocation was generated.  If a router cannot confirm that
839	      whether the authority's certificate was revoked before or after a
840	      particular update, it will retrieve a fresh new copy of the
841	      database with a valid signature.

843	   The private key associated with a CA in the chain of       trust of the Authority's certificate is compromised:

845	      In this case, it becomes possible for an attacker to masquerade as
846	      the database authority.  To ameliorate damage, the database
847	      authority revokes its certificate and get a new certificate issued
848	      from a CA that is not compromised.  Once it has done so, the
849	      previous procedure is followed.  The compromised certificate can
850	      be removed during the normal operating system upgrade cycle.    In
851	      the case of the root authority, the situation could be more
852	      serious.  Updates to the OS in the IRT need to be validated prior
853	      to installation.  One possible method of doing this is provided in
854	      [RFC4108].  Trust Anchors are assumed to be updated as part of an
855	      OS update, implementers should consider using a key other than the
856	      trust anchor for validating OS updates.

858	   An algorithm used if either the certificate or the signature is cracked:

860	      This is a catastrophic failure and the above forms of attack
861	      become possible.  The only mitigation is to make use of a new
862	      algorithm.  In theory this should be possible, but in practice has
863	      proved very difficult.  For this reason, additional work is
864	      recommended to make alternative algorithms available.

866	   The Database Authority loses its key or disappears:

868	      In this case nobody can update the existing database.  There are
869	      few programmatic mitigations.  If the database authority places
870	      its private keys and suitable amounts of information escrow, under
871	      agreed upon circumstances, such as no updates for three days, for
872	      example, the escrow agent would release the information to a party
873	      competent of generating a database update.

875	5.4.2.  Other Risks

877	   Because this specification does not require secure transport, if an
878	   attacker prevents updates to an ITR for the purposes of having that
879	   ITR continue to use a compromised ETR, the ITR could continue to use
880	   an old version of the database without realizing a new version has
881	   been made available.  If one is worried about such an attack, a
882	   secure channel such as SSL to a secure chain back to the database
883	   authority should be used.  It is possible that after some operational
884	   experience, later versions of this format will contain additional
885	   semantics to address this attack.  SSL would also prevent attempts
886	   spoof false database versions on the server.

888	   As discussed above, substantial risk would be a cold start scenario.
889	   If an attacker found a bug in a common operating system that allowed
890	   it to erase an ITR's database, and was able to disseminate that bug,
891	   the collective ability of ITRs to retrieve new copies of the database
892	   could be taxed by collective demand.  The remedy to this is for
893	   devices to share copies of the database with their peers, thus making
894	   each potential requester a potential service.

896	6.  Why not use XML?

898	   Many objects these days are distributed as either XML pages or
899	   something derived as XML [W3C.REC-xml11-20040204], such as SOAP [W3C
900	   .REC-soap12-part1-20070427],[W3C.REC-soap12-part2-20070427].  Use of
901	   such well known standards allows for high level tools and library
902	   reuse.  XML's strength is extensibility.  Without a doubt XML would
903	   be more extensible than a fixed field database.  Why not, then, use
904	   these standards in this case?  The greatest concern the author had
905	   was compactness of the data stream.  In as much as this mechanism is
906	   used at all in the future, so long as that concern could be
907	   addressed, and so long as signatures of the database can be verified,
908	   XML probably should be considered.

910	7.  Other Distribution Mechanisms

912	   We now consider various different mechanisms.  The problem of
913	   distributing changes in various databases is as old as databases.
914	   The author is aware of two obvious approaches that have been well
915	   used in the past.  One approach would be the wide distribution of CVS
916	   repositories.  However, for reasons mentioned in Section 5.4.1, CVS
917	   is insufficient to the task.

919	   The other tried and true approach is the use of periodic updates in
920	   the form of messages.  Good old NNTP [RFC3977] itself provides two
921	   separate mechanisms (one push and another pull) to provide a coherent
922	   update process.  This was in fact used to update molecular biology
923	   databases [gb91] in the early 1990s.  Netnews offers a way to
924	   determine whether articles with specified Article-Ids have been
925	   received.  In the case where the mapping file source of authority
926	   wishes to transmit updates, it can sign a change file and then post
927	   it into the network.  Routers merely need to keep a record of article
928	   ids that it has received.  Netnews systems have years ago handled far
929	   greater volume of traffic than we envision.  [2] Initially this is
930	   probably overkill, but it may not be so later in this process.  Some
931	   consideration should be given to a mechanism known to widely
932	   distribute vast amounts of data, as instantaneously either the sender
933	   or the receiver wishes.

935	   To attain an additional level of hierarchy in the distribution
936	   network, service providers could retrieve information to their own
937	   local servers, and configure their routers with the host portion of
938	   the above URI.

940	   Another possibility would be for providers to establish an agreement
941	   on a small set of anycast addresses for use for this purpose.  There
942	   are limitations to the use of anycast, particularly with TCP.  In the
943	   midst of a routing flap anycast address can become all but unusable.
944	   Careful study of such a use as well as appropriate use of HTTP
945	   redirects is expected.

947	7.1.  What About DNS as a mapping retrieval model?

949	   It has been proposed that a query/response mechanism be used for this
950	   information, and that specifically the domain name system (DNS)
951	   [RFC1034] be used.  The previous models do not preclude the DNS.  DNS
952	   has the advantage that the administrative lines are well drawn, and
953	   that the ID/RLOC mapping is likely to appear very close to these
954	   boundaries.  DNS also has the added benefit that an entire
955	   distribution infrastructure already exists.  There are, however, some
956	   problems that could impact end hosts when intermediate routers make
957	   queries, some of which were first pointed out in [RFC1383]:

959	   o  Any query mechanism offers an opportunity for a resource attack if
960	      an attacker can force the ITR to query for information.  In this
961	      case, all that would be necessary would be for a "botnet" (a group
962	      of computers that have been compromised and used as vehicles to
963	      attack others) to ping or otherwise contact via some normal
964	      service hosts that sit behind the ETR.  If the botnet hosts
965	      themselves are behind ETRs, the victim's ITR will need to query
966	      for each and every one of them, thus becoming part of a classic
967	      reflector attack.

969	   o  Packets will be delayed at the very least, and probably dropped in
970	      the process of a mapping query.  This could be at the beginning of
971	      a communication, but it will be impossible for a router to
972	      conclude with certainty that this is the case.

974	   o  The DNS has a backoff algorithm that presumes that applications
975	      are making queries prior to the beginning of a communication.
976	      This is appropriate for end hosts who know in fact when a
977	      communication begins.  An end user  may not enjoy a router waiting
978	      seconds for a retry.

980	   o  While the administrative lines may appear to be correct, the
981	      location of name servers may not be.  If name servers sit within
982	      PI address space, thus requiring LISP to reach, a circular
983	      dependency is created.  This is precisely where many enterprise
984	      name servers sit.  The LISP experiment should not predicate its
985	      success on relocation of such name servers.

987	   Never-the-less, DNS may be able to play a role in providing the
988	   enterprise control over the mapping of its EIDs to RLOCs.  Posit a
989	   new DNS record "EID2RLOC".  This record is used by the authority to
990	   collect and aggregate mapping information so that it may be
991	   distributed through one of the other mechanisms.  As an example:

993	      $ORIGIN 0.10.PI-SPACE.
994	       128   EID2RLOC   mask 23 priority 10 weight 5 172.16.5.60
995	             EID2RLOC   mask 23 priority 15 weight 5 192.168.1.5

997	   In the above figure network 10.0.128/23 would delegated to some end
998	   system, say EXAMPLE.COM.  They would manage the above zone
999	   information.   This would allow a DNS mechanism to work, but it would
1000	   also allow someone to aggregate the information and distribution a
1001	   table.

1003	7.2.  Use of BGP and LISP+ALT

1005	   Border Gateway Protocol (BGP) [RFC4271] is currently used to
1006	   distribute inter-domain routing throughout the Internet.  Why not,
1007	   then, use BGP to distribute mapping entries, or provide a rendezvous
1008	   mechanism to initialize mapping entries?  In fact this is precisely
1009	   what LISP+ALT [I-D.ietf-lisp-alt] accomplishes, using a completely
1010	   separate topology from the normal DFZ.  It does so using existing
1011	   code paths and expertise.  The alternate topology also provides an
1012	   extremely accurate control path from ITRs to ETRs, whereas NERD's
1013	   operational model requires an optimistic assumption and control plane
1014	   functionality to cycle through unresponsive ETRs in an EID prefix's
1015	   mapping entry.  The memory scaling characteristics of LISP+ALT are
1016	   extremely attractive because of expected strong aggregation, whereas
1017	   NERD makes almost no attempt at aggregation.

1019	   A number of key deployment issues are left open.  The principle issue
1020	   is whether it is deemed acceptable for routers to drop packets
1021	   occasionally while mapping information is being gathered.  This
1022	   should be the subject of future research for ALT, as it was a key
1023	   design goal of NERD to avoid such a situation.

1025	7.3.  Perhaps use a hybrid model?

1027	   Perhaps it would be useful to use both a prepopulated database such
1028	   as NERD and a query mechanism (perhaps LISP+ALT, LISP-CONS [I-D
1029	   .meyer-lisp-cons], or DNS) to determine an EID/RLOC mapping.  One
1030	   idea would be to receive a subset of the mappings, say, by taking
1031	   only the NERD for certain regions.  This alleviates the need to drop
1032	   packets for some subset of destinations under the assumption that
1033	   one's business is localized to a particular region.  If one did not
1034	   have a local entry for a particular EID one would then make a query.

1036	   One approach to using DNS to query live would be to periodically walk
1037	   "interesting" portions of the  network, in search of relevant
1038	   records, and caching them to non-volatile storage.  While preventing
1039	   resource attacks, the walk itself could be viewed as an attack, if
1040	   the algorithm was not selective enough about what it thought was
1041	   interesting.  A similar approach could be applied to LISP+ALT or
1042	   LISP-CONS by forcing a data-driven Map Reply for certain sites.

1044	8.  Deployment Issues

1046	   While LISP and NERD are intended as experiments at this point, it is
1047	   already obvious one must give serious consideration to circular
1048	   dependencies with regard to the protocols used and the elements
1049	   within them.

1051	8.1.  HTTP

1053	   In as much as HTTP depends on DNS, either due to the authority
1054	   section of a URI, or due to the configured base distribution URI,
1055	   these same concerns apply.  In addition, any HTTP server that itself
1056	   makes use of provider independent addresses would be a poor choice to
1057	   distribute the database for these exact same reasons.

1059	   One issue with using HTTP is that it is possible that a middlebox of
1060	   some form, such as a cache, may intercept and process requests.  In
1061	   some cases this might be a good thing.  For instance, if a cache
1062	   correctly returns a database, some amount of bandwidth is conserved.
1063	   On the other hand, if the cache itself fails to function properly for
1064	   whatever reason, end to end connectivity could be impaired.  For
1065	   example, if the cache itself depended on the mapping being in place
1066	   and functional, a cold start scenario might leave the cache
1067	   functioning improperly, in turn providing routers no means to update
1068	   their databases.  Some care must be given to avoid such
1069	   circumstances.

1071	9.  Open Questions
1072	   Do we need to discuss reachability in more detail?  This was clearly
1073	   an issue at the IST-RING workshop.  There are two key issues.  First,
1074	   what is the appropriate architectural separation between the data
1075	   plane and the control plane?  Second, is there some specific way in
1076	   which NERD impacts the data plane?

1078	   Should we specify a (perhaps compressed) tarball that treads a middle
1079	   ground for the last question, where each update tarball contains both
1080	   a signature for the update and for the entire database, once the
1081	   update is applied.

1083	   Should we compress?  In some initial testing of databases with 1, 5,
1084	   and 10 million IPv4 EIDs and a random distribution of IPv4 RLOCs, the
1085	   current format in this document compresses down by a factor of
1086	   between 35% and 36%, using  Burrows-Wheeler block sorting text
1087	   compression algorithm (bzip2).  The NERD used random EIDs with prefix
1088	   lengths varying from 19-29, with probability weighted toward the
1089	   smaller masks.  This only very roughly reflects reality.  A better
1090	   test would be to start with the existing prefixes found in the DFZ.

1092	10.  Conclusions

1094	   This memo has specified a database format, an update format, a URI
1095	   convention, an update method, and a validation method for EID/RLOC
1096	   mappings.  We have shown that beyond the predictions of 10^8 EID-
1097	   prefix entries, the aggregate database size would likely be at most
1098	   17GB.  We have considered the amount of servers to distribute that
1099	   information and we have demonstrated the limitations of a simple
1100	   content distribution network and other well known mechanisms.  The
1101	   effort required to retrieve a database change amounts to between 3
1102	   and 30 seconds of processing time per hour at at today's gigabit
1103	   speeds.  We conclude that there is no need for an off box query
1104	   mechanism today, and that there are distinct disadvantages for having
1105	   such a mechanism in the control plane.

1107	   Beyond this we have examined alternatives that allow for hybrid
1108	   models that do use query mechanisms, should our operating assumptions
1109	   prove overly optimistic.  Use of NERD today does not foreclose use of
1110	   such models in the future, and in fact both models can happily co-
1111	   exist.

1113	   Since the first draft of this document in 2007, portions of this work
1114	   have been implemented.  Future work should consider the size of
1115	   fields, such as the version field, as well as key roll-over and
1116	   revokation issues.  As previously noted CMS is now widely deployed.
1117	   Current work on DNS-based Authentication of Named Entities may
1118	   provide a means to test authorization of a NERD provider to carry a
1119	   specific prefix.  [I-D.ietf-dane-protocol]

1121	   We leave to future work how the list of databases is distributed, how
1122	   BGP can play a role in distributing knowledge of the databases, and
1123	   how DNS can play a role in aggregating information into these
1124	   databases.

1126	   We also leave to future work whether HTTP is the best protocol for
1127	   the job, and whether the scheme described in this document is the
1128	   most efficient.  One could easily envision that when applied in high
1129	   delay or high loss environments, a broadcast or multicast method may
1130	   prove more effective.

1132	   Speaking of multicast, we also leave to future work how multicast is
1133	   implemented, if at all, either in conjunction or as an extension to
1134	   this model.

1136	   Finally, perhaps the most interesting future work would be to
1137	   understand if and how NERD could be integrated with the LISP mapping
1138	   server.  [I-D.ietf-lisp-ms]

1140	11.  IANA Considerations

1142	   This memo makes no requests of IANA.

1144	12.  Acknowledgments

1146	   Dino Farinacci, Patrik Faltstrom, Dave Meyer, Joel Halpern, Jim
1147	   Schaad, Dave Thaler, Mohamed Boucadair, Robin Whittle, Max Pritikin,
1148	   Scott Brim, S. Moonesamy, and Stephen Farrel were very helpful with
1149	   their reviews of this work.  Thanks also to the participants of the
1150	   Routing Research Group and the IST-RING workshop held in Madrid in
1151	   December of 2007 for their incisive comments.  The astute will notice
1152	   a lengthy References section.  This work stands on the shoulders of
1153	   many others' efforts.

1155	13.  References

1157	13.1.  Normative References

1159	   [I-D.ietf-lisp]
1160	              Farinacci, D., Fuller, V., Meyer, D. and D. Lewis,
1161	              "Locator/ID Separation Protocol (LISP)", Internet-Draft
1162	              draft-ietf-lisp-22, February 2012.

1164	   [ITU.X509.2000]
1165	              International Telecommunications Union, "Information
1166	              technology - Open Systems Interconnection - The Directory:
1167	              Public-key and attribute certificate frameworks", ITU-T
1168	              Recommendation X.509, ISO Standard 9594-8, March 2000.

1170	   [RFC3986]  Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform
1171	              Resource Identifier (URI): Generic Syntax", STD 66, RFC
1172	              3986, January 2005.

1174	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1175	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

1177	   [RFC6125]  Saint-Andre, P. and J. Hodges, "Representation and
1178	              Verification of Domain-Based Application Service Identity
1179	              within Internet Public Key Infrastructure Using X.509
1180	              (PKIX) Certificates in the Context of Transport Layer
1181	              Security (TLS)", RFC 6125, March 2011.

1183	13.2.  Informative References

1185	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1186	              Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext
1187	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

1189	   [RFC2315]  Kaliski, B., "PKCS #7: Cryptographic Message Syntax
1190	              Version 1.5", RFC 2315, March 1998.

1192	   [RFC5652]  Housley, R., "Cryptographic Message Syntax (CMS)", RFC
1193	              5652, September 2009.

1195	   [RFC3977]  Feather, C., "Network News Transfer Protocol (NNTP)", RFC
1196	              3977, October 2006.

1198	   [RFC1034]  Mockapetris, P., "Domain names - concepts and facilities",
1199	              STD 13, RFC 1034, November 1987.

1201	   [RFC1383]  Huitema, C., "An Experiment in DNS Based IP Routing", RFC
1202	              1383, December 1992.

1204	   [RFC4271]  Rekhter, Y., Li, T. and S. Hares, "A Border Gateway
1205	              Protocol 4 (BGP-4)", RFC 4271, January 2006.

1207	   [RFC4108]  Housley, R., "Using Cryptographic Message Syntax (CMS) to
1208	              Protect Firmware Packages", RFC 4108, August 2005.

1210	   [RFC4346]  Dierks, T. and E. Rescorla, "The Transport Layer Security
1211	              (TLS) Protocol Version 1.1", RFC 4346, April 2006.

1213	   [CARP07]   Carpenter, B. R., "IETF Plenary Presentation: Routing and
1214	              Addressing: Where we are today", March 2007.

1216	   [CVS]      Grune, R., Baalbergen, E., Waage, M., Berliner, B. and J.
1217	              Polk, "CVS: Concurrent Versions System", November 1985.

1219	   [gb91]     Smith, R.H., Gottesman, Y., Hobbs, B., Lear, E.,
1220	              Kristofferson, D., Benton, D. and P.R. Smith, "A mechanism
1221	              for maintaining an up-to-date GenBank database via Usenet
1222	              ", CABIOS , April 1991.

1224	   [W3C.REC-xml11-20040204]
1225	              Paoli, J., Maler, E., Yergeau, F., Cowan, J., Bray, T. and
1226	              C. Sperberg-McQueen, "Extensible Markup Language (XML)
1227	              1.1", World Wide Web Consortium FirstEdition REC-
1228	              xml11-20040204, February 2004, <http://www.w3.org/TR/2004/
1229	              REC-xml11-20040204>.

1231	   [W3C.REC-soap12-part1-20070427]
1232	              Hadley, M., Mendelsohn, N., Moreau, J., Karmarkar, A.,
1233	              Nielsen, H., Lafon, Y. and M. Gudgin, "SOAP Version 1.2
1234	              Part 1: Messaging Framework (Second Edition)", World Wide
1235	              Web Consortium Recommendation REC-soap12-part1-20070427,
1236	              April 2007, <http://www.w3.org/TR/2007/REC-
1237	              soap12-part1-20070427>.

1239	   [W3C.REC-soap12-part2-20070427]
1240	              Mendelsohn, N., Gudgin, M., Nielsen, H., Lafon, Y.,
1241	              Moreau, J., Hadley, M. and A. Karmarkar, "SOAP Version 1.2
1242	              Part 2: Adjuncts (Second Edition)", World Wide Web
1243	              Consortium Recommendation REC-soap12-part2-20070427, April
1244	              2007, <http://www.w3.org/TR/2007/REC-
1245	              soap12-part2-20070427>.

1247	   [I-D.ietf-lisp-alt]
1248	              Fuller, V., Farinacci, D., Meyer, D. and D. Lewis, "LISP
1249	              Alternative Topology (LISP+ALT)", Internet-Draft draft-
1250	              ietf-lisp-alt-10, December 2011.

1252	   [I-D.ietf-dane-protocol]
1253	              Hoffman, P. and J. Schlyter, "The DNS-Based Authentication
1254	              of Named Entities (DANE) Protocol for Transport Layer
1255	              Security (TLS)", Internet-Draft draft-ietf-dane-
1256	              protocol-19, April 2012.

1258	   [I-D.meyer-lisp-cons]
1259	              Brim, S., "LISP-CONS: A Content distribution Overlay
1260	              Network Service for LISP", Internet-Draft draft-meyer-
1261	              lisp-cons-04, April 2008.

1263	   [I-D.ietf-lisp-ms]
1264	              Fuller, V. and D. Farinacci, "LISP Map Server Interface",
1265	              Internet-Draft draft-ietf-lisp-ms-15, January 2012.

1267	Appendix A.  Generating and verifying the database signature with
1268	             OpenSSL

1270	   As previously mentioned, one goal of NERD was to use off-the-shelf
1271	   tools to both generate and retrieve the database.  To many, PKI is
1272	   magic.  This section is meant to provide at least some clarification
1273	   as to both the generation and verification process, complete with
1274	   command line examples.  Not included is how you get the entries
1275	   themselves.  We'll assume they exist, and that you're just trying to
1276	   sign the database.

1278	   To sign the database, to start with, you need a database file that
1279	   has a database header described in Section 3.  Block size should be
1280	   zero, and there should be no PKCS#7 block at this point.  You also
1281	   need a certificate and its private key with which you will sign the
1282	   database.

1284	   The OpenSSL "smime" command contains all the functions we need from
1285	   this point forth.  To sign the database, issue the following command:

1287	         openssl smime -binary -sign -outform DER -signer yourcert.crt \
1288	                 -inkey yourcert.key -in database-file -out signature

1290	   -binary states that no MIME canonicalization should be performed.
1291	   -sign indicates that you are signing the file that was given as the
1292	   argument to -in.  The output format (-outform) is binary DER, and
1293	   your public certificate is provided with -signer along with your key
1294	   with -inkey.  The signature itself is specified with -out.

1296	   The resulting file "signature" is then copied into to PKCS#7 block in
1297	   the database header, its size in bytes is recorded in the PKCS#7
1298	   block size field, and the resulting file is ready for distribution to
1299	   ITRs.

1301	   To verify a database file, first retrieve the PKCS#7 block from the
1302	   file by copying the appropriate number of bytes into another file,
1303	   say "signature".  Next, zero this field, and set the block size field
1304	   to 0.  Next use the "smime" command to verify the signature as
1305	   follows:

1307	       openssl smime -binary -verify -inform DER -content database-file
1308	               -out /dev/null -in signature

1310	   Openssl will return "Verification OK" if the signature is correct.
1311	   OpenSSL provides sufficiently rich libraries to accomplish the above
1312	   within the C programming language with a single pass.

1314	Appendix B.  Changes

1316	   This section to be removed prior to publication.

1318	   o  06-08: editorial.  Clarify sending diffs,

1320	   o  05: Fix normative/informative references.  Wordsmithing.

1322	   o  04: Analysis change: IPv6 RLOCs are 128 bits.  While they can be
1323	      shortened to 64 bits, that involves substantial ETR changes and
1324	      expenditure of IPv6 networks, which is probably unnecessary, and
1325	      can be left as a later optimization.  Added an option of
1326	      independent operators.  Processed all but two of Dino's comments.
1327	      Addressed Scott's comments.  Removed existing work analysis.
1328	      Saving that for another day.  Clarified OpenSSL Appendix.

1330	   o  05: clean DOWN.  reinsert some text for historical purposes.

1332	   o  04: cleanup

1334	   o  03: Change dbname to a domain name, indicate that is what is in
1335	      the subject of the X.509 certificate, and list editorial changes,
1336	      update acknowledgments.

1338	   o  02: Incorporate some of Dave Thaler's comments.  Add
1339	      authentication block detail.  Modify analysis to take IPv6 into
1340	      account, along with a more realistic number of RLOCs per EID.  Add
1341	      some comments about potential risks of a cold start.  Add S/MIME
1342	      example as appendix A and take out old ToDo.  Provide some amount
1343	      of compression of IPv6 addresses by limiting their size to
1344	      significant bytes rounded to a four byte word boundary.

1346	   o  01: Massive spelling correction, URI example correction.

1348	   o  00: Initial Revision.

1350	Author's Address

1352	   Eliot Lear
1353	   Cisco Systems GmbH
1354	   Richtistrasse 7
1355	   Wallisellen, CH-8304
1356	   Switzerland

1358	   Phone: +41 44 878 9200
1359	   Email: lear@cisco.com