idnits 2.17.1 draft-lear-lisp-nerd-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 6, 2010) is 5166 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-24) exists of draft-ietf-lisp-06 -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 977 (Obsoleted by RFC 3977) -- Obsolete informational reference (is this intentional?): RFC 4346 (Obsoleted by RFC 5246) == Outdated reference: A later version (-10) exists of draft-ietf-lisp-alt-02 == Outdated reference: A later version (-16) exists of draft-ietf-lisp-ms-04 Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Lear 3 Internet-Draft Cisco Systems GmbH 4 Intended status: Experimental March 6, 2010 5 Expires: September 7, 2010 7 NERD: A Not-so-novel EID to RLOC Database 8 draft-lear-lisp-nerd-08.txt 10 Abstract 12 LISP is a protocol to encapsulate IP packets in order to allow end 13 sites to multihome without injecting routes from one end of the 14 Internet to another. This memo presents an experimental database and 15 a discussion of methods to transport the mapping of EIDs to RLOCs to 16 routers in a reliable, scalable, and secure manner. Our analysis 17 concludes that transport of of all EID/RLOC mappings scales well to 18 at least 10^8 entries. 20 Status of this Memo 22 This Internet-Draft is submitted to IETF in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF), its areas, and its working groups. Note that 27 other groups may also distribute working documents as Internet- 28 Drafts. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 The list of current Internet-Drafts can be accessed at 36 http://www.ietf.org/ietf/1id-abstracts.txt. 38 The list of Internet-Draft Shadow Directories can be accessed at 39 http://www.ietf.org/shadow.html. 41 This Internet-Draft will expire on September 7, 2010. 43 Copyright Notice 45 Copyright (c) 2010 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Base Assumptions . . . . . . . . . . . . . . . . . . . . . 3 62 1.2. What is NERD? . . . . . . . . . . . . . . . . . . . . . . 4 63 1.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 2. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 5 65 2.1. Database Updates . . . . . . . . . . . . . . . . . . . . . 5 66 2.2. Communications between ITR and ETR . . . . . . . . . . . . 6 67 2.3. Who are database authorities? . . . . . . . . . . . . . . 6 68 3. NERD Format . . . . . . . . . . . . . . . . . . . . . . . . . 7 69 3.1. NERD Record Format . . . . . . . . . . . . . . . . . . . . 9 70 3.2. Database Update Format . . . . . . . . . . . . . . . . . . 10 71 4. NERD Distribution Mechanism . . . . . . . . . . . . . . . . . 10 72 4.1. Initial Bootstrap . . . . . . . . . . . . . . . . . . . . 10 73 4.2. Retrieving Changes . . . . . . . . . . . . . . . . . . . . 11 74 5. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 75 5.1. Database Size . . . . . . . . . . . . . . . . . . . . . . 13 76 5.2. Router Throughput Versus Time . . . . . . . . . . . . . . 14 77 5.3. Number of Servers Required . . . . . . . . . . . . . . . . 14 78 5.4. Security Considerations . . . . . . . . . . . . . . . . . 16 79 5.4.1. Use of Public Key Infrastructures (PKIs) . . . . . . . 17 80 5.4.2. Other Risks . . . . . . . . . . . . . . . . . . . . . 19 81 6. Why not use XML? . . . . . . . . . . . . . . . . . . . . . . . 20 82 7. Other Distribution Mechanisms . . . . . . . . . . . . . . . . 20 83 7.1. What About DNS as a retrieval model? . . . . . . . . . . . 21 84 7.2. Use of BGP and LISP+ALT . . . . . . . . . . . . . . . . . 22 85 7.3. Perhaps use a hybrid model? . . . . . . . . . . . . . . . 22 86 8. Deployment Issues . . . . . . . . . . . . . . . . . . . . . . 23 87 8.1. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 88 9. Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 23 89 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 24 90 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 91 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 92 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 93 13.1. Normative References . . . . . . . . . . . . . . . . . . . 25 94 13.2. Informative References . . . . . . . . . . . . . . . . . . 25 95 Appendix A. Generating and verifying the database signature 96 with OpenSSL . . . . . . . . . . . . . . . . . . . . 27 97 Appendix B. Changes . . . . . . . . . . . . . . . . . . . . . . . 28 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 29 100 1. Introduction 102 Locator/ID Separation Protocol (LISP) [I-D.ietf-lisp] separates an IP 103 address used by a host and local routing system from the locators 104 advertised by BGP participants on the Internet in general, and in the 105 default free zone (DFZ) in particular. It accomplishes this by 106 establishing a mapping between globally unique endpoint identifiers 107 (EIDs) and routing locators (RLOCs). This reduces the amount of 108 state change that occurs on routers within the default-free zone on 109 the Internet, while enabling end sites to be multihomed. 111 In some mapping distribution approaches to LISP the mapping is 112 learned via data-triggered control messages between ingress tunnel 113 routers (ITRs) and egress tunnel routers (ETRs) through an alternate 114 routing topology [I-D.ietf-lisp-alt]. In other approaches of LISP, 115 the mapping from EIDs to RLOCs is instead learned through some other 116 means. This memo addresses different approaches to the problem, and 117 specifies a Not-so-novel EID RLOC Database (NERD) and methods to both 118 receive the database and to receive updates. 120 NERD is offered primarily as a way to avoid dropping packets, the 121 underlying assumption being that dropping packets is bad for 122 applications and end users. Those who do not agree with this 123 underlying assumption may find that other approaches make more sense. 125 NERD is specified in such a way that the methods used to distribute 126 or retrieve it may vary over time. Multiple databases are supported 127 in order to allow for multiple data sources. An effort has been made 128 to divorce the database from access methods so that both can evolve 129 independently through experimentation and operational validation. 131 1.1. Base Assumptions 133 In order to specify a mapping it is important to understand how it 134 will be used, and the nature of the data being mapped. In the case 135 of LISP, the following assumptions are pertinent: 137 o The data contained within the mapping changes only on provisioning 138 or configuration operations, and is not intended to change when a 139 link either fails or is restored. Some other mechanism such as 140 the use of LISP Reachability Bits with mapping replies handles 141 healing operations, particularly when a tail circuit within an 142 service provider's aggregate goes down. NERD can be used as a 143 verification method to ensure that whatever operational mapping 144 changes an ITR receives are authorized. 145 o While weight and priority are defined, these are not hop-by-hop 146 metrics. Hence the information contained within the mapping does 147 not change based on where one sits within the topology. 149 o A purpose of LISP being to reduce control plane overhead by 150 reducing "rate X state" complexity, updates to the mapping will be 151 relatively rare. 152 o Because NERD is designed to ease interdomain routing, its use is 153 intended within the inter-domain environment. That is, NERD is 154 best implemented at either the customer edge or provider edge, and 155 there will be on the order of as many ITRs and EID Prefixes as 156 there are connections to Internet Service Providers by end 157 customers. 158 o As such, NERD cannot be the sole means to implement host mobility, 159 although NERD may be in used in conjunction with other mechanisms. 161 1.2. What is NERD? 163 NERD is a Not-so-novel EID to RLOC Database. It consists of the 164 following components: 166 1. a network database format; 167 2. a change distribution format; 168 3. a database retrieval/bootstrapping method; 169 4. a change distribution method. 171 The network database format is compressible. However, at this time 172 we specify no compression method. NERD will make use of potentially 173 several transport methods, but most notably HTTP [RFC2616]. HTTP has 174 restart and compression capabilities. It is also widely deployed. 176 There exist many methods to show differences between two versions of 177 a database or a file, UNIX's "diff" being the classic example. In 178 this case, because the data is well structured and easily keyed, we 179 can make use of a very simple format for version differences that 180 simply provides a list of EID/RLOC mappings that have changed using 181 the same record format as the database, and a list of EIDs that are 182 to be removed. 184 1.3. Glossary 186 The reader is once again referred to [I-D.ietf-lisp] for a general 187 glossary of terms related to LISP. The following terms are specific 188 to this memo. 190 Base Distribution URI: An Absolute-URI as defined in Section 4.3 of 191 [RFC3986] from which other references are relative. The base 192 distribution URI is used to construct a URI to an EID/RLOC mapping 193 database. If more than one NERD is known then there will be one 194 or more base distribution URIs associated with each (although each 195 such base distribution URI may have the same value). 197 EID Database Authority: The authority that will sign database files 198 and updates. It is the source of both. 200 The Authority: Shorthand for the EID Database Authority. 202 NERD: (N)ot-so-novel (E)ID to (R)LOC (D)atabase. 204 AFI Address Family Identifier. 206 Pull Model: An architecture where clients pull only the information 207 they need at any given time, such as when a packet arrives for 208 forwarding. 210 Push Model: An architecture in which clients receive an entire 211 dataset, containing data they may or may not require, such as 212 mappings for EIDs that no host served is attempting to send to. 214 Hybrid Model: An architecture in which some information is pushed 215 toward the receiver from a source and some information is pulled 216 by the receiver. 218 2. Theory of Operation 220 Operational functions are split into two components: database updates 221 and state exchange between ITR and ETR during a communication. 223 2.1. Database Updates 225 What follows is a summary of how NERDs are generated and updated. 226 Specifics can be found in Section 3. The general way in which NERD 227 works is as follows: 229 1. A NERD is generated by an authority that allocates provider 230 independent (PI) addresses (e.g., IANA or an RIR) which are used 231 by sites as EIDs. As part of this process the authority 232 generates a digest for the database and signs it with a private 233 key whose public key is part of an X.509 certificate. 234 [ITU.X509.2000] That signature along with a copy of the 235 authority's public key is included in the NERD. 236 2. The NERD is distributed to a group of well known servers. 237 3. ITRs retrieve an initial copy of the NERD via HTTP when they come 238 into service. 240 4. ITRs are preconfigured with a group of certificates whose private 241 keys are used by database authorities to sign the NERD. This 242 list of certificates should be configurable by administrators. 243 5. ITRs next verify both the validity of the public key and the 244 signed digest. If either fail validation, the ITR attempts to 245 retrieve the NERD from a different source. The process iterates 246 until either a valid database is found or the list of sources is 247 exhausted. 248 6. Once a valid NERD is retrieved, the ITR installs it into both 249 non-volatile and local memory. 250 7. At some point the authority updates the NERD and increments the 251 database version counter. At the same time it generates a list 252 of changes, which it also signs, as it does with the original 253 database. 254 8. Periodically ITRs will poll from their list of servers to 255 determine if a new version of the database exists. When a new 256 version is found, an ITR will attempt to retrieve a change file, 257 using its list of preconfigured servers. 258 9. The ITR validates a change file just as it does the original 259 database. Assuming the change file passes validation, the ITR 260 installs new entries, overwrites existing ones, and removes empty 261 entries, based on the content of the change file. 263 As time goes on it is quite possible that an ITR may probe a list of 264 configured peers for a database or change file copy. It is equally 265 possible that peers might advertise to each other the version number 266 of their database. Such methods are not explored in depth in this 267 memo, but are mentioned for future consideration. 269 2.2. Communications between ITR and ETR 271 [I-D.ietf-lisp] describes the basic approach to what happens when a 272 packet arrives at an ITR, and what communications between ITR and ETR 273 take place. NERD provides an optimistic approach to establishing 274 communications with an ETR that is responsible for a given EID 275 prefix. State must be kept, however, on an ITR to determine whether 276 that ETR is in fact reachable. It is expected that this is a common 277 requirement across LISP mapping systems, and will be handled in the 278 core LISP architecture. 280 2.3. Who are database authorities? 282 This memo does not specify who the database authority is. That is 283 because there are several possible operational models. In each case 284 the number of database authorities is meant to be small so that ITRs 285 need only keep a small list of authorities, similar to the way a name 286 server might cache a list of root servers. 288 o A single database authority exists. In this case all entries in 289 the database are registered to a single entity, and that entity 290 distributes the database. Because the EID space is provider 291 independent address space, there is no architectural requirement 292 that address space be hierarchically distributed to anyone, as 293 there is with provider-assigned address space. Hence, there is a 294 natural affinity between the IANA function and the database 295 authority function. 296 o Each region runs a database authority. In this case, provider 297 independent address space is allocated to either Regional Internet 298 Registries (RIRs) or to affiliates of such organizations of 299 network operations guilds (NOGs). The benefit of this approach is 300 that there is no single organization that controls the database. 301 It allows one database authority to backup another. One could 302 envision as many as ten database authorities in this scenario. 303 One drawback to this approach, however, is that any reference to a 304 region imposes a notion of locality, thus potentially diminishing 305 the split between locator and identifier. 306 o Each country runs a database authority. This could occur should 307 countries decide to regulate this function. While limiting the 308 scope of any single database authority as the previous scenario 309 describes, this approach would introduce some overhead as the list 310 of database authorities would grow to as many as 200, and possibly 311 more if jurisdictions within countries attempted to regulate the 312 function. There are two drawbacks to this approach. First, as 313 distribution of EIDs is driven to more local jurisdictions, an EID 314 prefix is tied even tighter to a location. Second, a large number 315 of database authorities will demand some sort of discovery 316 mechanism. 317 o Independent operators manage database authorities. This has the 318 appeals of being location independent, and enabling competition 319 for good performance. This method has the drawback of potentially 320 requiring a discovery mechanism. 322 The latter two approaches are not mutually exclusive. While this 323 specification allows for multiple databases, discovery mechanisms are 324 left as future work. 326 3. NERD Format 328 The NERD consists of a header that contains a database version and a 329 signature that is generated by ignoring the signature field and 330 setting the authentication block length to 0 (NULL). The 331 authentication block itself consists of a signature and a certificate 332 whose private key counterpart was used to generate the signature. 334 Records are kept sorted in numeric order with AFI plus EID as primary 335 key and prefix length as secondary. This is so that after a database 336 update it should be possible to reconstruct the database to verify 337 the digest signature, which may be retrieved separately from the 338 database for verification purposes. 340 0 1 2 3 341 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 342 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 343 | Schema Vers=1 | DB Code | Database Name Size | 344 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 345 | Database Version | 346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 347 | Old Database Version or 0 | 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 349 | | 350 | Database Name | 351 | | 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | PKCS#7 Block Size | Reserved | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 | | 356 | PKCS#7 Block containing Certificate and Signature | 357 | | 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 360 Database Header 362 The DB Code indicates 0 if what follows is an entire database or 1 if 363 what follows is an update. The database file version is incremented 364 each time the complete database is generated by the authority. In 365 the case of an update, the database file version indicates the new 366 database file version, and the old database file version is indicated 367 in the "old DB version" field. The database file version is used by 368 routers to determine whether or not they have the most current 369 database. 371 The database name is an ASCII-encoded domain name, as specified in 372 [RFC5321]. This is the name that will appear in the Subject field of 373 the certificate used to verify the database. The purpose of the 374 database name is to allow for more than one database. Such databases 375 would be merged by the router. It is important that an EID/RLOC 376 mapping be listed in no more than one database, lest inconsistencies 377 arise. However, it may be possible to transition a mapping from one 378 database to another. During the transition period, the mappings 379 would be identical. When they are not, the resultant behavior will 380 be undefined. The database name is padded with NULLs to the nearest 381 fourth byte. 383 The PKCS#7 [RFC2315] authentication block contains a DER encoded 384 [ITU.X509.2000] signature and associated public key. For purposes of 385 this experiment all implementations will support the RSA encryption 386 signature algorithm and SHA1 digest algorithm, and the standard 387 attributes are expected to be present. 389 N.B., it has been suggested that Cryptographic Message Syntax (CMS) 390 [RFC5652] be used instead of PKCS#7. At the time of this writing, 391 CMS is not yet widely deployed. However, it is certainly the correct 392 direction, and should be strongly considered should NERD be 393 standardized. 395 3.1. NERD Record Format 397 As distributed over the network, NERD records appear as follows: 399 0 1 2 3 400 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 401 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 402 | Num. RLOCs | EID Pref. Len | EID AFI | 403 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 404 | End point identifier | 405 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 406 | Priority 1 | Weight 1 | AFI 1 | 407 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 408 | Routing Locator 1 | 409 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 410 | Priority 2 | Weight 2 | AFI 2 | 411 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 412 | Routing Locator 2 | 413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 | Priority 3 | Weight 3 | AFI 3 | 415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 | Routing Locator 3... | 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 EID AFI is the AFI of the EID. Priority N and Weight N, and AFI N 420 are associated with Routing Locator N. There will always be at least 421 one routing locator. The minimum record size for IPv4 is 16 bytes. 423 Each additional IPv4 RLOC increases the record size by 8 bytes. The 424 purpose of this format is to keep the database compact, but somewhat 425 easily read. The meaning of weight and priority are described in 426 [I-D.ietf-lisp]. The format of the AFI is specified by IANA as 427 "Address Family Numbers", with the exception of how IPv6 EID prefixes 428 are stored. 430 In order to reduce storage and transmission amounts for IPv6, only 431 the necessary number of bytes of an EID as specified by the prefix 432 length are kept in the record, rounded to the nearest four byte 433 (word) boundary. For instance, if the prefix length is /49, the 434 nearest four-byte word boundary would require that eight bytes are 435 stored. IPv6 RLOCs are represented as normal 128-bit IPv6 addresses. 437 3.2. Database Update Format 439 A database update contains a set of changes to an existing database. 440 Each AFI/EID/mask-length tuple may have zero or more RLOCs associated 441 with it. In the case where there are no RLOCs, the EID entry is 442 removed from the database. Records that contain EIDs and prefix 443 lengths that were not previously listed are simply added. Otherwise, 444 the old record for the EID and prefix length is replaced by the more 445 current information. The record format used by the a database update 446 is the same as described in Section 3.1. 448 4. NERD Distribution Mechanism 450 4.1. Initial Bootstrap 452 Bootstrap occurs when a router needs to retrieve the entire database. 453 It knows it needs to retrieve the entire database because either it 454 has none or an update too substantial to process, as might be the 455 case if a router has been out of service for a substantially lengthy 456 period of time. 458 To bootstrap the ITR appends the database name plus "/current/ 459 entiredb" to a Base Distribution URI and retrieves the file via HTTP. 460 More formally (using ABNF from [RFC5234]): 462 entire-db = base-uri dbname "/current/entiredb" 463 base-uri = uri ; From RFC 3986 464 dbname = Domain ; from RFC5321 466 For example,if the base distribution URI is 467 "http://www.example.com/eiddb/", and assuming a database name of 468 "nerd.arin.net", the ITR would request 470 "http://www.example.com/eiddb/nerd.arin.net/current/entiredb". 471 Routers check the signature on the database prior to installing it, 472 and check that the database schema matches a schema they understand. 473 Once a router has a valid database it stores that database in some 474 sort of non-volatile memory (e.g., disk, flash memory, etc). 476 N.B., the host component for such URIs should not resolve to a LISP 477 EID, lest a circular dependency be created. 479 4.2. Retrieving Changes 481 In order to retrieve a set of database changes an ITR will have 482 previously retrieved the entire database. Hence it knows the current 483 version of the database it has. Its first step for retrieving 484 changes is to retrieve the current version of the database. It does 485 so by appending "/current/version" to the base distribution URI and 486 database name and retrieving the file. Its format is text and it 487 contains the integer value of the current database version. 489 Once an ITR has retrieved the current version it compares the version 490 of its local copy. If there is no difference, then the router is up 491 to date and need take no further actions until it next checks. 493 If the versions differ, the router next sends a request for the 494 appropriate change file by appending "current/changes/" and the 495 textual representation of the version of its local copy of the 496 database to the base distribution URI. More formally: 498 db-version = base-uri dbname "/current/version" 499 db-curupdate = base-uri dbname "/current/changes/" old-version 500 old-version = 1*DIGIT 502 For example, if the current version of the database is 1105503 and 503 router's version is 1105500, and the base URI and database name are 504 the same as above, the router would first request 505 "http://www.example.com/eiddb/nerd.arin.net/current/version" to 506 determine that it is out of date, and to also learn the current 507 version. It would then attempt to retrieve 508 "http://www.example.com/eiddb/nerd.arin.net/current/changes/1105500". 510 The server may not have that change file, either because there are 511 too many versions between what the router has and what is current, or 512 because no such change file was generated. If the server has changes 513 from the routers version to any later version, the server issues an 514 HTTP redirect to that change file, and the router retrieves and 515 process it. More formally: 517 db-incupdate = base-uri dbname "/" newer-version 518 "/changes/" old-version 519 newer-version = 1*DIGIT 521 For example: 523 "http://www.example.com/eiddb/nerd.arin.net/1105450/changes/1105401" 524 would update a router from version 1105401 to 1105450. Once it has 525 done so, the router should then repeat the process until it has 526 brought itself up to date. 528 This begs the question: how does a router know to retrieve version 529 1105450 in our example above? It cannot. A redirect must be given 530 by the server to that URI when the router attempts to retrieve 531 differences from the current version, say, 1105503. 533 While it is unlikely that database versions would wrap, as they 534 consists of 32 bit integers, should the event occur, ITRs should 535 attempt first to retrieve a change file when their current version 536 number is within 10,000 of 2^32 and they see a version available that 537 is less than 10,000. Barring the availability of a change file, the 538 ITR can still assume that the database version has wrapped and 539 retrieve a new copy. It may be safer in future work to include 540 additional wrap information or a larger field to avoid having to use 541 any heuristics. 543 5. Analysis 545 We will start our analysis by looking at how much data will be 546 transferred to a router during bootstrap conditions. We will then 547 look at the bandwidth required. Next we will turn our concerns to 548 servers. Finally we will ponder the effect of providing only 549 changes. 551 In the analysis below we treat the overhead of the database header as 552 insignificant (because it is). The analysis should be similar, 553 whether a single database or multiple databases are employed, as we 554 would assume that no entry would appear more than once. 556 5.1. Database Size 558 By its very nature the information to be transported is relatively 559 static and is specifically designed to be topologically insensitive. 560 That is, every ITR is intended to have the same set of RLOCs for a 561 given EID. While some processing power will be necessary to install 562 a table, the amount required should be far less than that of a 563 routing information database because the level of entropy is intended 564 to be lower. 566 For purposes of this analysis, we will assume that the world has 567 migrated to IPv6, as this increases the size of the database, which 568 would be our primary concern. However, to mitigate the size 569 increase, we have limited the size of the prefix transmitted. For 570 purposes of this analysis, we shall assume an average prefix length 571 of 64 bits. 573 Based on that assumption, Section 3.1 states that mapping information 574 for each EID/Prefix includes a group of RLOCs, each with an 575 associated priority and weight, and that a minimum record size with 576 IPv6 EIDs with at least one RLOC is 30 bytes uncompressed. Each 577 additional IPv6 RLOC costs 20 bytes. 579 +-----------+--------+--------+---------+ 580 | 10^n EIDs | 2 RLOC | 4 RLOC | 8 RLOC | 581 +-----------+--------+--------+---------+ 582 | 4 | 500 KB | 900 KB | 1.70 MB | 583 | 5 | 5.0 MB | 9.0 MB | 17.0 MB | 584 | 6 | 50 MB | 90 MB | 170 MB | 585 | 7 | 500 MB | 900 MB | 1.70 GB | 586 | 8 | 5.0 GB | 9.0 GB | 17.0 GB | 587 +-----------+--------+--------+---------+ 589 Database size for IPv6 routes with average prefix length = 64 bits 591 Table 1 593 Entries in the above table are derived as follows: 595 E * (30 + 20 * (R - 1 )) 597 where E = number of EIDs (10^n), R = number of RLOCs per EID. 599 Our scaling target is to accommodate 10^8 multihomed systems, which 600 is one order magnitude greater than what is discussed in [CARP07]. 601 At 10^8 entries, a device could be expected to use between 5 and 17 602 gigabytes of RAM for the mapping. No matter the method of 603 distribution, any router that sits in the core of the Internet would 604 require near this amount of memory in order to perform the ITR 605 function. Large enterprise ETRs would be similarly strained, simply 606 due to the diversity of of sites that communicate with one another. 607 The good news is that this is not our starting point, but rather our 608 scaling target, a number that we intend to reach by the year 2050. 609 Our starting point is more likely in the neighborhood of 10^4 or 10^5 610 EIDs, thus requiring between 500KB and 17 MB. 612 5.2. Router Throughput Versus Time 614 +-------------------+---------+--------+---------+-------+ 615 | Table Size (10^N) | 1mb/s | 10mb/s | 100mb/s | 1gb/s | 616 +-------------------+---------+--------+---------+-------+ 617 | 6 | 8 | 0.8 | 0.08 | 0.008 | 618 | 7 | 80 | 8 | 0.8 | 0.08 | 619 | 8 | 800 | 80 | 8 | 0.8 | 620 | 9 | 8,000 | 800 | 80 | 8 | 621 | 10 | 80,000 | 8,000 | 800 | 80 | 622 | 11 | 800,000 | 80,000 | 8,000 | 800 | 623 +-------------------+---------+--------+---------+-------+ 625 Number of seconds to process NERD 627 Table 2 629 The length of time it takes to process the database is significant in 630 models where the device acquires the entire table. During this 631 period of time, either the router will be unable to route packets 632 using LISP or it must use some sort of query mechanism for specific 633 EIDs as the rest it populates its table through the transfer. 634 Table 2 shows us that at our scaling target, the length of time it 635 would take for a router using 1 mb/s of bandwidth is about 80 636 seconds. We can measure the processing rate in small numbers of 637 hours for any transfer speed greater than that. The fastest 638 processing time shows us as taking 8 seconds to process an entire 639 table of 10^9 bytes and 80 seconds for 10^10 bytes. 641 5.3. Number of Servers Required 643 As easy as it may be for a router to retrieve, the aggregate 644 information may be difficult for servers to transmit, assuming the 645 information is transmitted in aggregate (we'll revisit that 646 assumption later). 648 +----------------+------------+-----------+------------+------------+ 649 | # Simultaneous | 10 Servers | 100 | 1,000 | 10,000 | 650 | Requests | | Servers | Servers | Servers | 651 +----------------+------------+-----------+------------+------------+ 652 | 100 | 720 | 72 | 72 | 72 | 653 | 1,000 | 7,200 | 720 | 72 | 72 | 654 | 10,000 | 72,000 | 7,200 | 720 | 72 | 655 | 100,000 | 720,000 | 72,000 | 7,200 | 720 | 656 | 1,000,000 | 7,200,000 | 720,000 | 72,000 | 7,200 | 657 | 10,000,000 | 72,000,000 | 7,200,000 | 720,000 | 72,000 | 658 +----------------+------------+-----------+------------+------------+ 660 Retrieval time per number of servers in seconds. Assumes average 661 10^8 entries with 4 RLOCs per EID and that each server has access to 662 1gb/s and 100% efficient use of that bandwidth and no compression. 664 Table 3 666 Entries in the above table were generated using the following method: 668 For 10^8 entries with four RLOCs per EID, the table size is 9.0GB, 669 per our previous table. Assume 1 Gb/s transfer rates and 100% 670 utilization. Protocol overhead is ignored for this exercise. Hence 671 a single transfer X takes 48 seconds and can get no faster. 673 With this in mind, each entry is as follows: 675 max(1X,N*X/S) 677 where N=number of transfers, X = 72 seconds, 678 and S = number of servers. 680 If we have a distribution model which every device must retrieve the 681 mapping information upon start, Table 3 shows the length of time in 682 seconds it will take for a given number of servers to complete a 683 transfer to a given number of devices. This table says, as an 684 example, that it would take 72,000 seconds (20 hours) for one million 685 ITRs to simultaneously retrieve the database from one thousand 686 servers, assuming equal load distribution. Should a cold start 687 scenario occur, this number should be of some concern. Hence it is 688 important to take some measures both to avoid such a scenario, and to 689 ease the load should it occur. The primary defense should be for 690 ITRs to first attempt to retrieve their databases from their peers or 691 upstream providers. Secondary defenses could include data sanity 692 checks within ITRs, with agreed norms for how much the database 693 should change in any given update or over any given period of time. 695 As we will see below, dissemination of changes is considerably less 696 volume. 698 +----------------+-------------+---------------+----------------+ 699 | % Daily Change | 100 Servers | 1,000 Servers | 10,000 Servers | 700 +----------------+-------------+---------------+----------------+ 701 | 0.1% | 300 | 30 | 3 | 702 | 0.5% | 1500 | 150 | 15 | 703 | 1% | 3000 | 300 | 30 | 704 | 5% | 15,000 | 1500 | 150 | 705 | 10% | 30,000 | 3000 | 300 | 706 +----------------+-------------+---------------+----------------+ 708 Assuming 10 million routers and a database size of 9GB, resulting 709 transfer times for hourly updates are shown in seconds, given number 710 of servers and daily rate of change. Note that when insufficient 711 resources are devoted to servers, an unsustainable situation arises 712 where updates for the next batch would begin prior to the completion 713 of the current batch. 715 Table 4 717 This table shows us that with 10,000 servers the average transfer 718 time with 1Gb/s links for 10,000,000 routers will be 300 seconds with 719 10% daily change spread over 24 hourly updates. For a 0.1% daily 720 change, that number is 3 seconds for a database of size 9.0GB. 722 The amount of change goes to the purpose of LISP. If its purpose is 723 to provide effective multihoming support to end customers, then we 724 might anticipate relatively few changes. If, on the other, service 725 providers attempt to make use of LISP to provide some form of traffic 726 engineering, we can expect the same data to change more often. We 727 can probably not conclude much in this regard without additional 728 operational experience. The one thing we can say is that different 729 applications of the LISP protocol may require new and different 730 distribution mechanisms. Such optimization is left for another day. 732 5.4. Security Considerations 734 Whichever the answer to our previous question, we must consider the 735 security of the information being transported. If an attacker can 736 forge an update or tamper with the database, he can in effect 737 redirect traffic to end sites. Hence, integrity and authenticity of 738 the NERD is critical. In addition, a means is required to determine 739 whether a source is authorized to modify a given database. No data 740 privacy is required. Quite to the contrary, this information will be 741 necessary for any ITR. 743 The first question one must ask is who to trust to provide the ITR a 744 mapping. Ultimately the owner of the EID prefix is most 745 authoritative for the mapping to RLOCs. However, were all owners to 746 sign all such mappings, ITRs would need to know which owner is 747 authorized to modify which mapping, creating a problem of O(N^2) 748 complexity. 750 We can reduce this problem substantially by investing some trust in a 751 small number of entities that are allowed to sign entries. If 752 authority manages EIDs much the same way a domain name registrar 753 handles domains, then the owner of the EID would choose a database 754 authority she or he trusts, and ITRs must trust each such authority 755 in order to map the EIDs listed by that authority to RLOCs. This 756 reduces the amount of management complexity on the ETR to retaining 757 knowledge of O(#authorities), but does require that each authority 758 establish procedures for authenticating the owner of an EID. Those 759 procedures needn't be the same. 761 There are two classic methods to ensure integrity of data: 763 o secure transport of the source of the data to the consumer, such 764 as Transport Layer Security (TLS) [RFC4346]; and 765 o provide object level security. 767 These methods are not mutually exclusive, although one can argue 768 about the need for the former, given the latter. 770 In the case of TLS, when it is properly implemented, the objects 771 being transported cannot easily be modified by interlopers or so- 772 called men in the middle. When data objects are distributed to 773 multiple servers, each of those servers must be trusted. As we have 774 seen above, we could have quite a large number of servers, thus 775 providing an attacker a large number of targets. We conclude that 776 some form of object level security is required. 778 Object level security involves an authority signing an object in a 779 way that can easily be verified by a consumer, in this case a router. 780 In this case, we would want the mapping table and any incremental 781 update to be signed by the originator of the update. This implies 782 that we cannot simply make use of a tool like CVS [CVS]. Instead, 783 the originator will want to generate diffs, sign them, and make them 784 available either directly or through some sort of content 785 distribution or peer to peer network. 787 5.4.1. Use of Public Key Infrastructures (PKIs) 789 X.509 provides a certificate hierarchy that has scaled to the size of 790 the Internet. The system is most manageable when there are few 791 certificates to manage. The model proposed in this memo makes use of 792 one current certificate per database authority. The two pieces of 793 information necessary to verify a signature, therefore, are as 794 follows: 796 o the certificate of the database authority, which can be provided 797 along with the database; and 798 o the certificate authority's certificate. 800 The latter two pieces of information must be very well known and must 801 be configured on each ITR. It is expected that both would change 802 very rarely, and it would not be unreasonable for such updates to 803 occur as part of a normal OS release process. 805 The tools for both signing and verifying are readily available. 806 OpenSSL [1] provides tools and libraries for both signing and 807 verifying. Other tools commonly exist. 809 Use of PKIs is not without implementation, operational complexity or 810 risk. The following risks and mitigations are identified with NERD's 811 use of PKIs: 813 If a NERD database authority private key is exposed: 815 In this case an attacker could sign a false database update, 816 either redirecting traffic, or otherwise causing havoc. In this 817 case, the NERD database administrator must revoke its existing key 818 and issue a new one. The certificate is added to a certificate 819 revocation list (CRL), which may be distributed with both this and 820 other databases, as well as through other channels. Because this 821 event is expected to be rare, and the number of database 822 authorities is expected to be small, a CRL will be small. When a 823 router receives a revocation, it checks it against its existing 824 databases, and attempts to update the one that is revoked. This 825 implies that prior to issuing the revocation, the database 826 authority would sign an update with the new key. Routers would 827 discard updates they have already received that were signed after 828 the revocation was generated. If a router cannot confirm that 829 whether the authority's certificate was revoked before or after a 830 particular update, it will retrieve a fresh new copy of the 831 database with a valid signature. 833 The private key associated with a CA in the chain of trust of the 834 Authority's certificate is compromised: 836 In this case, it becomes possible for an attacker to masquerade as 837 the database authority. To ameliorate damage, the database 838 authority revokes its certificate and get a new certificate issued 839 from a CA that is not compromised. Once it has done so, the 840 previous procedure is followed. The compromised certificate can 841 be removed during the normal operating system upgrade cycle. In 842 the case of the root authority, the situation could be more 843 serious. Updates to the OS in the IRT need to be validated prior 844 to installation. One possible method of doing this is provided in 845 [RFC4108]. Trust Anchors are assumed to be updated as part of an 846 OS update, implementers should consider using a key other than the 847 trust anchor for validating OS updates. 849 An algorithm used if either the certificate or the signature is 850 cracked: 852 This is a catastrophic failure and the above forms of attack 853 become possible. The only mitigation is to make use of a new 854 algorithm. In theory this should be possible, but in practice has 855 proved very difficult. For this reason, additional work is 856 recommended to make alternative algorithms available. 858 The Database Authority loses its key or disappears: 860 In this case nobody can update the existing database. There are 861 few programmatic mitigations. If the database authority places 862 its private keys and suitable amounts of information escrow, under 863 agreed upon circumstances, such as no updates for three days, for 864 example, the escrow agent would release the information to a party 865 competent of generating a database update. 867 5.4.2. Other Risks 869 Because this specification does not require secure transport, if an 870 attacker prevents updates to an ITR for the purposes of having that 871 ITR continue to use a compromised ETR, the ITR could continue to use 872 an old version of the database without realizing a new version has 873 been made available. If one is worried about such an attack, a 874 secure channel such as SSL to a secure chain back to the database 875 authority should be used. It is possible that after some operational 876 experience, later versions of this format will contain additional 877 semantics to address this attack. SSL would also prevent attempts 878 spoof false database versions on the server. 880 As discussed above, substantial risk would be a cold start scenario. 881 If an attacker found a bug in a common operating system that allowed 882 it to erase an ITR's database, and was able to disseminate that bug, 883 the collective ability of ITRs to retrieve new copies of the database 884 could be taxed by collective demand. The remedy to this is for 885 devices to share copies of the database with their peers, thus making 886 each potential requester a potential service. 888 6. Why not use XML? 890 Many objects these days are distributed as either XML pages or 891 something derived as XML [W3C.REC-xml11-20040204], such as SOAP 892 [W3C.REC-soap12-part1-20070427],[W3C.REC-soap12-part2-20070427]. Use 893 of such well known standards allows for high level tools and library 894 reuse. XML's strength is extensibility. Without a doubt XML would 895 be more extensible than a fixed field database. Why not, then, use 896 these standards in this case? The greatest concern the author had 897 was compactness of the data stream. In as much as this mechanism is 898 used at all in the future, so long as that concern could be 899 addressed, and so long as signatures of the database can be verified, 900 XML probably should be considered. 902 7. Other Distribution Mechanisms 904 We now consider various different mechanisms. The problem of 905 distributing changes in various databases is as old as databases. 906 The author is aware of two obvious approaches that have been well 907 used in the past. One approach would be the wide distribution of CVS 908 repositories. However, for reasons mentioned in the previous 909 section, CVS is insufficient to the task. 911 The other tried and true approach is the use of periodic updates in 912 the form of messages. Good old NNTP [RFC0977] itself provides two 913 separate mechanisms (one push and another pull) to provide a coherent 914 update process. This was in fact used to update molecular biology 915 databases [gb91] in the early 1990s. Netnews offers a way to 916 determine whether articles with specified Article-Ids have been 917 received. In the case where the mapping file source of authority 918 wishes to transmit updates, it can sign a change file and then post 919 it into the network. Routers merely need to keep a record of article 920 ids that it has received. Netnews systems have years ago handled far 921 greater volume of traffic than we envision. [2] Initially this is 922 probably overkill, but it may not be so later in this process. Some 923 consideration should be given to a mechanism known to widely 924 distribute vast amounts of data, as instantaneously either the sender 925 or the receiver wishes. 927 To attain an additional level of hierarchy in the distribution 928 network, service providers could retrieve information to their own 929 local servers, and configure their routers with the host portion of 930 the above URI. 932 Another possibility would be for providers to establish an agreement 933 on a small set of anycast addresses for use for this purpose. There 934 are limitations to the use of anycast, particularly with TCP. In the 935 midst of a routing flap anycast address can become all but unusable. 936 Careful study of such a use as well as appropriate use of HTTP 937 redirects is expected. 939 7.1. What About DNS as a retrieval model? 941 It has been proposed that a query/response mechanism be used for this 942 information, and that specifically the domain name system (DNS) 943 [RFC1034] be used. The previous models do not preclude the DNS. DNS 944 has the advantage that the administrative lines are well drawn, and 945 that the ID/RLOC mapping is likely to appear very close to these 946 boundaries. DNS also has the added benefit that an entire 947 distribution infrastructure already exists. There are, however, some 948 problems that could impact end hosts when intermediate routers make 949 queries, some of which were first pointed out in [RFC1383]: 951 o Any query mechanism offers an opportunity for a resource attack if 952 an attacker can force the ITR to query for information. In this 953 case, all that would be necessary would be for a "botnet" (a group 954 of computers that have been compromised and used as vehicles to 955 attack others) to ping or otherwise contact via some normal 956 service hosts that sit behind the ETR. If the botnet hosts 957 themselves are behind ETRs, the victim's ITR will need to query 958 for each and every one of them, thus becoming part of a classic 959 reflector attack. 960 o Packets will be delayed at the very least, and probably dropped in 961 the process of a mapping query. This could be at the beginning of 962 a communication, but it will be impossible for a router to 963 conclude with certainty that this is the case. 964 o The DNS has a backoff algorithm that presumes that applications 965 are making queries prior to the beginning of a communication. 966 This is appropriate for end hosts who know in fact when a 967 communication begins. An end user may not enjoy a router waiting 968 seconds for a retry. 969 o While the administrative lines may appear to be correct, the 970 location of name servers may not be. If name servers sit within 971 PI address space, thus requiring LISP to reach, a circular 972 dependency is created. This is precisely where many enterprise 973 name servers sit. The LISP experiment should not predicate its 974 success on relocation of such name servers. 976 Never-the-less, DNS may be able to play a role in providing the 977 enterprise control over the mapping of its EIDs to RLOCs. Posit a 978 new DNS record "EID2RLOC". This record is used by the authority to 979 collect and aggregate mapping information so that it may be 980 distributed through one of the other mechanisms. As an example: 982 $ORIGIN 0.10.PI-SPACE. 983 128 EID2RLOC mask 23 priority 10 weight 5 172.16.5.60 984 EID2RLOC mask 23 priority 15 weight 5 192.168.1.5 986 In the above figure network 10.0.128/23 would delegated to some end 987 system, say EXAMPLE.COM. They would manage the above zone 988 information. This would allow a DNS mechanism to work, but it would 989 also allow someone to aggregate the information and distribution a 990 table. 992 7.2. Use of BGP and LISP+ALT 994 Border Gateway Protocol (BGP) [RFC4271] is currently used to 995 distribute inter-domain routing throughout the Internet. Why not, 996 then, use BGP to distribute mapping entries, or provide a rendezvous 997 mechanism to initialize mapping entries? In fact this is precisely 998 what LISP+ALT [I-D.ietf-lisp-alt] accomplishes, using a completely 999 separate topology from the normal DFZ. It does so using existing 1000 code paths and expertise. The alternate topology also provides an 1001 extremely accurate control path from ITRs to ETRs, whereas NERD's 1002 operational model requires an optimistic assumption and control plane 1003 functionality to cycle through unresponsive ETRs in an EID prefix's 1004 mapping entry. The memory scaling characteristics of LISP+ALT are 1005 extremely attractive because of expected strong aggregation, whereas 1006 NERD makes almost no attempt at aggregation. 1008 A number of key deployment issues are left open. The principle issue 1009 is whether it is deemed acceptable for routers to drop packets 1010 occasionally while mapping information is being gathered. This 1011 should be the subject of future research for ALT, as it was a key 1012 design goal of NERD to avoid such a situation. 1014 7.3. Perhaps use a hybrid model? 1016 Perhaps it would be useful to use both a prepopulated database such 1017 as NERD and a query mechanism (perhaps LISP+ALT, LISP-CONS 1018 [I-D.meyer-lisp-cons], or DNS) to determine an EID/RLOC mapping. One 1019 idea would be to receive a subset of the mappings, say, by taking 1020 only the NERD for certain regions. This alleviates the need to drop 1021 packets for some subset of destinations under the assumption that 1022 one's business is localized to a particular region. If one did not 1023 have a local entry for a particular EID one would then make a query. 1025 One approach to using DNS to query live would be to periodically walk 1026 "interesting" portions of the network, in search of relevant records, 1027 and caching them to non-volatile storage. While preventing resource 1028 attacks, the walk itself could be viewed as an attack, if the 1029 algorithm was not selective enough about what it thought was 1030 interesting. A similar approach could be applied to LISP+ALT or 1031 LISP-CONS by forcing a data-driven Map Reply for certain sites. 1033 8. Deployment Issues 1035 While LISP and NERD are intended as experiments at this point, it is 1036 already obvious one must give serious consideration to circular 1037 dependencies with regard to the protocols used and the elements 1038 within them. 1040 8.1. HTTP 1042 In as much as HTTP depends on DNS, either due to the authority 1043 section of a URI, or due to the configured base distribution URI, 1044 these same concerns apply. In addition, any HTTP server that itself 1045 makes use of provider independent addresses would be a poor choice to 1046 distribute the database for these exact same reasons. 1048 One issue with using HTTP is that it is possible that a middlebox of 1049 some form, such as a cache, may intercept and process requests. In 1050 some cases this might be a good thing. For instance, if a cache 1051 correctly returns a database, some amount of bandwidth is conserved. 1052 On the other hand, if the cache itself fails to function properly for 1053 whatever reason, end to end connectivity could be impaired. For 1054 example, if the cache itself depended on the mapping being in place 1055 and functional, a cold start scenario might leave the cache 1056 functioning improperly, in turn providing routers no means to update 1057 their databases. Some care must be given to avoid such 1058 circumstances. 1060 9. Open Questions 1062 Do we need to discuss reachability in more detail? This was clearly 1063 an issue at the IST-RING workshop. There are two key issues. First, 1064 what is the appropriate architectural separation between the data 1065 plane and the control plane? Second, is there some specific way in 1066 which NERD impacts the data plane? 1068 Should we specify a (perhaps compressed) tarball that treads a middle 1069 ground for the last question, where each update tarball contains both 1070 a signature for the update and for the entire database, once the 1071 update is applied. 1073 Should we compress? In some initial testing of databases with 1, 5, 1074 and 10 million IPv4 EIDs and a random distribution of IPv4 RLOCs, the 1075 current format in this document compresses down by a factor of 1076 between 35% and 36%, using Burrows-Wheeler block sorting text 1077 compression algorithm (bzip2). The NERD used random EIDs with prefix 1078 lengths varying from 19-29, with probability weighted toward the 1079 smaller masks. This only very roughly reflects reality. A better 1080 test would be to start with the existing prefixes found in the DFZ. 1082 10. Conclusions 1084 This memo has specified a database format, an update format, a URI 1085 convention, an update method, and a validation method for EID/RLOC 1086 mappings. We have shown that beyond the predictions of 10^8 EID- 1087 prefix entries, the aggregate database size would likely be at most 1088 17GB. We have considered the amount of servers to distribute that 1089 information and we have demonstrated the limitations of a simple 1090 content distribution network and other well known mechanisms. The 1091 effort required to retrieve a database change amounts to between 3 1092 and 30 seconds of processing time per hour at at today's gigabit 1093 speeds. We conclude that there is no need for an off box query 1094 mechanism today, and that there are distinct disadvantages for having 1095 such a mechanism in the control plane. 1097 Beyond this we have examined alternatives that allow for hybrid 1098 models that do use query mechanisms, should our operating assumptions 1099 prove overly optimistic. Use of NERD today does not foreclose use of 1100 such models in the future, and in fact both models can happily co- 1101 exist. 1103 We leave to future work how the list of databases is distributed, how 1104 BGP can play a role in distributing knowledge of the databases, and 1105 how DNS can play a role in aggregating information into these 1106 databases. 1108 We also leave to future work whether HTTP is the best protocol for 1109 the job, and whether the scheme described in this document is the 1110 most efficient. One could easily envision that when applied in high 1111 delay or high loss environments, a broadcast or multicast method may 1112 prove more effective. 1114 Speaking of multicast, we also leave to future work how multicast is 1115 implemented, if at all, either in conjunction or as an extension to 1116 this model. 1118 Finally, perhaps the most interesting future work would be to 1119 understand if and how NERD could be integrated with the LISP mapping 1120 server. [I-D.ietf-lisp-ms] 1122 11. IANA Considerations 1124 This memo makes no requests of IANA. 1126 12. Acknowledgments 1128 Dino Farinacci, Patrik Faltstrom, Dave Meyer, Joel Halpern, Jim 1129 Schaad, Dave Thaler, Mohamed Boucadair, Robin Whittle, Max Pritikin, 1130 and Scott Brim were very helpful with their reviews of this work. 1131 Thanks also to the participants of the Routing Research Group and the 1132 IST-RING workshop held in Madrid in December of 2007 for their 1133 incisive comments. The astute will notice a lengthy References 1134 section. This work stands on the shoulders of many others' efforts. 1136 13. References 1138 13.1. Normative References 1140 [I-D.ietf-lisp] 1141 Farinacci, D., Fuller, V., Meyer, D., and D. Lewis, 1142 "Locator/ID Separation Protocol (LISP)", 1143 draft-ietf-lisp-06 (work in progress), January 2010. 1145 [ITU.X509.2000] 1146 International Telecommunications Union, "Information 1147 technology - Open Systems Interconnection - The Directory: 1148 Public-key and attribute certificate frameworks", ITU- 1149 T Recommendation X.509, ISO Standard 9594-8, March 2000. 1151 [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1152 Resource Identifier (URI): Generic Syntax", STD 66, 1153 RFC 3986, January 2005. 1155 [RFC5321] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 1156 October 2008. 1158 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1159 Specifications: ABNF", STD 68, RFC 5234, January 2008. 1161 13.2. Informative References 1163 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1164 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1165 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1167 [RFC2315] Kaliski, B., "PKCS #7: Cryptographic Message Syntax 1168 Version 1.5", RFC 2315, March 1998. 1170 [RFC5652] Housley, R., "Cryptographic Message Syntax (CMS)", 1171 RFC 5652, September 2009. 1173 [RFC0977] Kantor, B. and P. Lapsley, "Network News Transfer 1174 Protocol", RFC 977, February 1986. 1176 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 1177 STD 13, RFC 1034, November 1987. 1179 [RFC1383] Huitema, C., "An Experiment in DNS Based IP Routing", 1180 RFC 1383, December 1992. 1182 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 1183 Protocol 4 (BGP-4)", RFC 4271, January 2006. 1185 [RFC4108] Housley, R., "Using Cryptographic Message Syntax (CMS) to 1186 Protect Firmware Packages", RFC 4108, August 2005. 1188 [RFC4346] Dierks, T. and E. Rescorla, "The Transport Layer Security 1189 (TLS) Protocol Version 1.1", RFC 4346, April 2006. 1191 [CARP07] Carpenter, B., "IETF Plenary Presentation: Routing and 1192 Addressing: Where we are today", March 2007. 1194 [CVS] Grune, R., Baalbergen, E., Waage, M., Berliner, B., and J. 1195 Polk, "CVS: Concurrent Versions System", November 1985. 1197 [gb91] Smith, R., Gottesman, Y., Hobbs, B., Lear, E., 1198 Kristofferson, D., Benton, D., and P. Smith, "A mechanism 1199 for maintaining an up-to-date GenBank database via 1200 Usenet", CABIOS , April 1991. 1202 [W3C.REC-xml11-20040204] 1203 Yergeau, F., Maler, E., Paoli, J., Cowan, J., Bray, T., 1204 and C. Sperberg-McQueen, "Extensible Markup Language (XML) 1205 1.1", World Wide Web Consortium FirstEdition REC-xml11- 1206 20040204, February 2004, 1207 . 1209 [W3C.REC-soap12-part1-20070427] 1210 Gudgin, M., Karmarkar, A., Nielsen, H., Mendelsohn, N., 1211 Hadley, M., Lafon, Y., and J. Moreau, "SOAP Version 1.2 1212 Part 1: Messaging Framework (Second Edition)", World Wide 1213 Web Consortium Recommendation REC-soap12-part1-20070427, 1214 April 2007, 1215 . 1217 [W3C.REC-soap12-part2-20070427] 1218 Mendelsohn, N., Karmarkar, A., Moreau, J., Lafon, Y., 1219 Gudgin, M., Hadley, M., and H. Nielsen, "SOAP Version 1.2 1220 Part 2: Adjuncts (Second Edition)", World Wide Web 1221 Consortium Recommendation REC-soap12-part2-20070427, 1222 April 2007, 1223 . 1225 [I-D.ietf-lisp-alt] 1226 Fuller, V., Farinacci, D., Meyer, D., and D. Lewis, "LISP 1227 Alternative Topology (LISP+ALT)", draft-ietf-lisp-alt-02 1228 (work in progress), January 2010. 1230 [I-D.meyer-lisp-cons] 1231 Brim, S., "LISP-CONS: A Content distribution Overlay 1232 Network Service for LISP", draft-meyer-lisp-cons-04 (work 1233 in progress), April 2008. 1235 [I-D.ietf-lisp-ms] 1236 Fuller, V. and D. Farinacci, "LISP Map Server", 1237 draft-ietf-lisp-ms-04 (work in progress), October 2009. 1239 URIs 1241 [1] 1243 [2] 1245 Appendix A. Generating and verifying the database signature with 1246 OpenSSL 1248 As previously mentioned, one goal of NERD was to use off-the-shelf 1249 tools to both generate and retrieve the database. To many, PKI is 1250 magic. This section is meant to provide at least some clarification 1251 as to both the generation and verification process, complete with 1252 command line examples. Not included is how you get the entries 1253 themselves. We'll assume they exist, and that you're just trying to 1254 sign the database. 1256 To sign the database, to start with, you need a database file that 1257 has a database header described in Section 3. Block size should be 1258 zero, and there should be no PKCS#7 block at this point. You also 1259 need a certificate and its private key with which you will sign the 1260 database. 1262 The OpenSSL "smime" command contains all the functions we need from 1263 this point forth. To sign the database, issue the following command: 1265 openssl smime -binary -sign -outform DER -signer yourcert.crt \ 1266 -inkey yourcert.key -in database-file -out signature 1268 -binary states that no MIME canonicalization should be performed. 1269 -sign indicates that you are signing the file that was given as the 1270 argument to -in. The output format (-outform) is binary DER, and 1271 your public certificate is provided with -signer along with your key 1272 with -inkey. The signature itself is specified with -out. 1274 The resulting file "signature" is then copied into to PKCS#7 block in 1275 the database header, its size in bytes is recorded in the PKCS#7 1276 block size field, and the resulting file is ready for distribution to 1277 ITRs. 1279 To verify a database file, first retrieve the PKCS#7 block from the 1280 file by copying the appropriate number of bytes into another file, 1281 say "signature". Next, zero this field, and set the block size field 1282 to 0. Next use the "smime" command to verify the signature as 1283 follows: 1285 openssl smime -binary -verify -inform DER -content database-file 1286 -out /dev/null -in signature 1288 Openssl will return "Verification OK" if the signature is correct. 1289 OpenSSL provides sufficiently rich libraries to accomplish the above 1290 within the C programming language with a single pass. 1292 Appendix B. Changes 1294 This section to be removed prior to publication. 1296 o 06-08: editorial. Clarify sending diffs, 1297 o 05: Fix normative/informative references. Wordsmithing. 1298 o 04: Analysis change: IPv6 RLOCs are 128 bits. While they can be 1299 shortened to 64 bits, that involves substantial ETR changes and 1300 expenditure of IPv6 networks, which is probably unnecessary, and 1301 can be left as a later optimization. Added an option of 1302 independent operators. Processed all but two of Dino's comments. 1303 Addressed Scott's comments. Removed existing work analysis. 1304 Saving that for another day. Clarified OpenSSL Appendix. 1305 o 05: clean DOWN. reinsert some text for historical purposes. 1306 o 04: cleanup 1307 o 03: Change dbname to a domain name, indicate that is what is in 1308 the subject of the X.509 certificate, and list editorial changes, 1309 update acknowledgments. 1310 o 02: Incorporate some of Dave Thaler's comments. Add 1311 authentication block detail. Modify analysis to take IPv6 into 1312 account, along with a more realistic number of RLOCs per EID. Add 1313 some comments about potential risks of a cold start. Add S/MIME 1314 example as appendix A and take out old ToDo. Provide some amount 1315 of compression of IPv6 addresses by limiting their size to 1316 significant bytes rounded to a four byte word boundary. 1317 o 01: Massive spelling correction, URI example correction. 1318 o 00: Initial Revision. 1320 Author's Address 1322 Eliot Lear 1323 Cisco Systems GmbH 1324 Glatt-com 1325 Glattzentrum, ZH CH-8301 1326 Switzerland 1328 Phone: +41 44 878 7525 1329 Email: lear@cisco.com