idnits 2.17.1 draft-lear-lisp-nerd-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 14, 2009) is 5247 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-farinacci-lisp-07 ** Obsolete normative reference: RFC 2616 (ref. '2') (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 977 (ref. '7') (Obsoleted by RFC 3977) -- Obsolete informational reference (is this intentional?): RFC 4346 (ref. '11') (Obsoleted by RFC 5246) == Outdated reference: A later version (-05) exists of draft-fuller-lisp-alt-02 Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Lear 3 Internet-Draft Cisco Systems GmbH 4 Intended status: Experimental December 14, 2009 5 Expires: June 17, 2010 7 NERD: A Not-so-novel EID to RLOC Database 8 draft-lear-lisp-nerd-05.txt 10 Abstract 12 LISP is a protocol to encapsulate IP packets in order to allow end 13 sites to multihome without injecting routes from one end of the 14 Internet to another. This memo specifies a database and a method to 15 transport the mapping of EIDs to RLOCs to routers in a reliable, 16 scalable, and secure manner. Our analysis concludes that transport 17 of of all EID/RLOC mappings scales well to at least 10^8 entries. 19 Status of this Memo 21 This Internet-Draft is submitted to IETF in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 This Internet-Draft will expire on June 17, 2010. 42 Copyright Notice 44 Copyright (c) 2009 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1. Base Assumptions . . . . . . . . . . . . . . . . . . . . . 3 61 1.2. What is NERD? . . . . . . . . . . . . . . . . . . . . . . 4 62 1.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 2. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 5 64 2.1. Database Updates . . . . . . . . . . . . . . . . . . . . . 5 65 2.2. Communications between ITR and ETR . . . . . . . . . . . . 6 66 2.3. Who are database authorities? . . . . . . . . . . . . . . 7 67 3. NERD Format . . . . . . . . . . . . . . . . . . . . . . . . . 8 68 3.1. NERD Record Format . . . . . . . . . . . . . . . . . . . . 9 69 3.2. Database Update Format . . . . . . . . . . . . . . . . . . 10 70 4. NERD Distribution Mechanism . . . . . . . . . . . . . . . . . 10 71 4.1. Initial Bootstrap . . . . . . . . . . . . . . . . . . . . 10 72 4.2. Retrieving Changes . . . . . . . . . . . . . . . . . . . . 11 73 5. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 74 5.1. Database Size . . . . . . . . . . . . . . . . . . . . . . 12 75 5.2. Router Throughput Versus Time . . . . . . . . . . . . . . 14 76 5.3. Number of Servers Required . . . . . . . . . . . . . . . . 14 77 5.4. Security Considerations . . . . . . . . . . . . . . . . . 16 78 5.4.1. Use of Public Key Infrastructures (PKIs) . . . . . . . 17 79 5.4.2. Other Risks . . . . . . . . . . . . . . . . . . . . . 19 80 6. Why not use XML? . . . . . . . . . . . . . . . . . . . . . . . 19 81 7. Other Distribution Mechanisms . . . . . . . . . . . . . . . . 20 82 7.1. What About DNS as a retrieval model? . . . . . . . . . . . 21 83 7.2. Use of BGP and LISP+ALT . . . . . . . . . . . . . . . . . 22 84 7.3. Perhaps use a hybrid model? . . . . . . . . . . . . . . . 22 85 8. Deployment Issues . . . . . . . . . . . . . . . . . . . . . . 23 86 8.1. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 87 9. Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 23 88 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 24 89 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 90 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 91 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 92 13.1. Normative References . . . . . . . . . . . . . . . . . . . 25 93 13.2. Informational References . . . . . . . . . . . . . . . . . 25 94 Appendix A. Generating and verifying the database signature 95 with OpenSSL . . . . . . . . . . . . . . . . . . . . 27 97 Appendix B. Changes . . . . . . . . . . . . . . . . . . . . . . . 28 98 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28 100 1. Introduction 102 Locator/ID Separation Protocol (LISP) [1] separates an IP address 103 used by a host and local routing system from the locators advertised 104 by BGP participants on the Internet in general, and in the default 105 free zone (DFZ) in particular. It accomplishes this by establishing 106 a mapping between globally unique endpoint identifiers (EIDs) and 107 routing locators (RLOCs). This reduces the amount of state change 108 that occurs on routers within the default-free zone on the Internet, 109 while enabling end sites to be multihomed. 111 In some mapping distribution approaches to LISP the mapping is 112 learned via data-triggered control messages between ingress tunnel 113 routers (ITRs) and egress tunnel routers (ETRs) through an alternate 114 routing topology [19]. In other approaches of LISP, the mapping from 115 EIDs to RLOCs is instead learned through some other means. This memo 116 addresses different approaches to the problem, and specifies a Not- 117 so-novel EID RLOC Database (NERD) and methods to both receive the 118 database and to receive updates. 120 NERD is offered primarily as a way to avoid dropping packets, the 121 underlying assumption being that dropping packets is bad for 122 applications and end users. Those who do not agree with this 123 underlying assumption may find that other approaches make more sense. 125 LISP and NERD are both currently experimental protocols. The NERD 126 database is specified in such a way that the methods used to 127 distribute or retrieve it may vary over time. Multiple databases are 128 supported in order to allow for multiple data sources. An effort has 129 been made to divorce the database from access methods so that both 130 can evolve independently through experimentation and operational 131 validation. 133 1.1. Base Assumptions 135 In order to specify a mapping it is important to understand how it 136 will be used, and the nature of the data being mapped. In the case 137 of LISP, the following assumptions are pertinent: 139 o The data contained within the mapping changes only on provisioning 140 or configuration operations, and is not intended to change when a 141 link either fails or is restored. Some other mechanism such as 142 the use of LISP Reachability Bits with mapping replies handles 143 healing operations, particularly when a tail circuit within an 144 service provider's aggregate goes down. NERD can be used as a 145 verification method to ensure that whatever operational mapping 146 changes an ITR receives are authorized. 148 o While weight and priority are defined, these are not hop-by-hop 149 metrics. Hence the information contained within the mapping does 150 not change based on where one sits within the topology. 151 o A purpose of LISP being to reduce control plane overhead by 152 reducing "rate X state" complexity, updates to the mapping will be 153 relatively rare. 154 o Because LISP and NERD are designed to ease interdomain routing, 155 their use is intended within the inter-domain environment. That 156 is, LISP is best implemented at either the customer edge or 157 provider edge, and there will be on the order of as many ITRs and 158 EID Prefixes as there are connections to Internet Service 159 Providers by end customers. 160 o As such, NERD cannot be the sole means to implement host mobility, 161 although NERD may be in used in conjunction with other mechanisms. 162 For instance, it would be possible for a mobile node to receive a 163 local address that is an EID and pass that to the correspondent 164 node, who could also make use of an EID. As such use of LISP in 165 this case would be transparent, and no mapping entries are changed 166 for mobility. 168 1.2. What is NERD? 170 NERD is a Not-so-novel EID to RLOC Database. It consists of the 171 following components: 173 1. a network database format; 174 2. a change distribution format; 175 3. a database retrieval/bootstrapping method; 176 4. a change distribution method. 178 The network database format is compressible. However, at this time 179 we specify no compression method. NERD will make use of potentially 180 several transport methods, but most notably HTTP [2]. HTTP has 181 restart and compression capabilities. It is also widely deployed. 183 There exist many methods to show differences between two versions of 184 a database or a file, UNIX's "diff" being the classic example. In 185 this case, because the data is well structured and easily keyed, we 186 can make use of a very simple format for version differences that 187 simply provides a list of EID/RLOC mappings that have changed using 188 the same record format as the database, and a list of EIDs that are 189 to be removed. 191 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 192 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 193 document are to be interpreted as described in RFC 2119 [3]. 195 1.3. Glossary 197 The reader is once again referred to [1] for a general glossary of 198 terms related to LISP. The following terms are specific to this 199 memo. 201 Base Distribution URI: An Absolute-URI as defined in Section 4.3 of 202 [6] from which other references are relative. The base 203 distribution URI is used to construct a URI to an EID/RLOC mapping 204 database. If more than one NERD is known then there will be one 205 or more base distribution URIs associated with each (although each 206 such base distribution URI may have the same value). 208 EID Database Authority: The authority that will sign database files 209 and updates. It is the source of both. 211 The Authority: Shorthand for the EID Database Authority. 213 NERD: (N)ot-so-novel (E)ID to (R)LOC (D)atabase. 215 AFI Address Family Identifier. 217 Pull Model: An architecture where clients pull only the information 218 they need at any given time, such as when a packet arrives for 219 forwarding. 221 Push Model: An architecture in which clients receive an entire 222 dataset, containing data they may or may not require, such as 223 mappings for EIDs that no host served is attempting to send to. 225 Hybrid Model: An architecture in which some information is pushed 226 toward the receiver from a source and some information is pulled 227 by the receiver. 229 2. Theory of Operation 231 Operational functions are split into two components: database updates 232 and state exchange between ITR and ETR during a communication. 234 2.1. Database Updates 236 What follows is a summary of how NERDs are generated and updated. 237 Specifics can be found in Section 3. The general way in which NERD 238 works is as follows: 240 1. A NERD is generated by an authority that allocates provider 241 independent (PI) addresses (e.g., IANA or an RIR) which are used 242 by sites as EIDs. As part of this process the authority 243 generates a digest for the database and signs it with a private 244 key whose public key is part of an X.509 certificate. [15] That 245 signature along with a copy of the authority's public key is 246 included in the NERD. 247 2. The NERD is distributed to a group of well known servers. 248 3. ITRs retrieve an initial copy of the NERD via HTTP when they come 249 into service. 250 4. ITRs are preconfigured with a group of certificates whose private 251 keys are used by database authorities to sign the NERD. This 252 list of certificates should be configurable by administrators. 253 5. ITRs next verify both the validity of the public key and the 254 signed digest. If either fail validation, the ITR attempts to 255 retrieve the NERD from a different source. The process iterates 256 until either a valid database is found or the list of sources is 257 exhausted. 258 6. Once a valid NERD is retrieved, the ITR installs it into both 259 non-volatile and local memory. 260 7. At some point the authority updates the NERD and increments the 261 database version counter. At the same time it generates a list 262 of changes, which it also signs, as it does with the original 263 database. 264 8. Periodically ITRs will poll from their list of servers to 265 determine if a new version of the database exists. When a new 266 version is found, an ITR will attempt to retrieve a change file, 267 using its list of preconfigured servers. 268 9. The ITR validates a change file just as it does the original 269 database. Assuming the change file passes validation, the ITR 270 installs new entries, overwrites existing ones, and removes empty 271 entries, based on the content of the change file. 273 As time goes on it is quite possible that an ITR may probe a list of 274 configured neighbors for a database or change file copy. It is 275 equally possible that neighbors might advertise to each other the 276 version number of their database. Such methods are not explored in 277 depth in this memo, but are mentioned for future consideration. 279 2.2. Communications between ITR and ETR 281 [1] describes the basic approach to what happens when a packet 282 arrives at an ITR, and what communications between ITR and ETR take 283 place. NERD provides an optimistic approach to establishing 284 communications with an ETR that is responsible for a given EID 285 prefix. State must be kept, however, on an ITR to determine whether 286 that ETR is in fact reachable. It is expected that this is a common 287 requirement across LISP mapping systems, and will be handled in the 288 core LISP architecture. 290 2.3. Who are database authorities? 292 This memo does not specify who the database authority is. That is 293 because there are several possible operational models. In each case 294 the number of database authorities is meant to be small so that ITRs 295 need only keep a small list of authorities, similar to the way a name 296 server might cache a list of root servers. 298 o A single database authority exists. In this case all entries in 299 the database are registered to a single entity, and that entity 300 distributes the database. Because the EID space is provider 301 independent address space, there is no architectural requirement 302 that address space be hierarchically distributed to anyone, as 303 there is with provider-assigned address space. Hence, there is a 304 natural affinity between the IANA function and the database 305 authority function. 306 o Each region runs a database authority. In this case, provider 307 independent address space is allocated to either Regional Internet 308 Registries (RIRs) or to affiliates of such organizations of 309 network operations guilds (NOGs). The benefit of this approach is 310 that there is no single organization that controls the database. 311 It allows one database authority to backup another. One could 312 envision as many as ten database authorities in this scenario. 313 One drawback to this approach, however, is that any reference to a 314 region imposes a notion of locality, thus potentially diminishing 315 the split between locator and identifier. 316 o Each country runs a database authority. This could occur should 317 countries decide to regulate this function. While limiting the 318 scope of any single database authority as the previous scenario 319 describes, this approach would introduce some overhead as the list 320 of database authorities would grow to as many as 200, and possibly 321 more if jurisdictions within countries attempted to regulate the 322 function. There are two drawbacks to this approach. First, as 323 distribution of EIDs is driven to more local jurisdictions, an EID 324 prefix is tied even tighter to a location. Second, a large number 325 of database authorities will demand some sort of discovery 326 mechanism. 327 o Independent operators manage database authorities. This has the 328 appeals of being location independent, and enabling competition 329 for good performance. This method has the drawback of potentially 330 requiring a discovery mechanism. 332 The latter two approaches are not mutually exclusive. While this 333 specification allows for multiple databases, discovery mechanisms are 334 left as future work. 336 3. NERD Format 338 The NERD consists of a header that contains a database version and a 339 signature that is generated by ignoring the signature field and 340 setting the authentication block length to 0 (NULL). The 341 authentication block itself consists of a signature and a certificate 342 whose private key counterpart was used to generate the signature. 344 Records are kept sorted in numeric order with AFI plus EID as primary 345 key and mask length as secondary. This is so that after a database 346 update it should be possible to reconstruct the database to verify 347 the digest signature, which may be retrieved separately from the 348 database for verification purposes. 350 0 1 2 3 351 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | Schema Vers=1 | DB Code | Database Name Size | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 355 | Database Version | 356 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 357 | Old Database Version or 0 | 358 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 359 | | 360 | Database Name | 361 | | 362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 363 | PKCS#7 Block Size | Reserved | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | | 366 | PKCS#7 Block containing Certificate and Signature | 367 | | 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 370 Database Header 372 The DB Code indicates 0 if what follows is an entire database or 1 if 373 what follows is an update. The database file version is incremented 374 each time the complete database is generated by the authority. In 375 the case of an update, the database file version indicates the new 376 database file version, and the old database file version is indicated 377 in the "old DB version" field. The database file version is used by 378 routers to determine whether or not they have the most current 379 database. 381 The database name is a domain name. This is the name that will 382 appear in the Subject field of the certificate used to verify the 383 database. The purpose of the database name is to allow for more than 384 one database. Such databases would be merged by the router. It is 385 important that an EID/RLOC mapping be listed in no more than one 386 database, lest inconsistencies arise. However, it may be possible to 387 transition a mapping from one database to another. During the 388 transition period, the mappings MUST be identical. When they are 389 not, the resultant behavior will be undefined. 391 The PKCS#7 [4] authentication block contains a DER encoded [5] 392 signature and associated public key. 394 3.1. NERD Record Format 396 As distributed over the network, NERD records appear as follows: 398 0 1 2 3 399 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 401 | Num. RLOCs | EID Mask Len | EID AFI | 402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 403 | End point identifier | 404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 405 | Priority 1 | Weight 1 | AFI 1 | 406 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 407 | Routing Locator 1 | 408 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 409 | Priority 2 | Weight 2 | AFI 2 | 410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 411 | Routing Locator 2 | 412 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 413 | Priority 3 | Weight 3 | AFI 3 | 414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 415 | Routing Locator 3... | 416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 Priority N and Weight N, and AFI N are associated with Routing 419 Locator N. There will always be at least one routing locator. The 420 minimum record size for IPv4 is 16 bytes. Each additional IPv4 RLOC 421 increases the record size by 8 bytes. The purpose of this format is 422 to keep the database compact, but somewhat easily read. The meaning 423 of weight and priority are described in [1]. The format of the AFI 424 is specified by IANA as "Address Family Numbers", with the exception 425 of how IPv6 EID prefixes are stored. 427 In order to reduce storage and transmission amounts for IPv6, only 428 the necessary number of bytes as specified by the prefix length are 429 kept in the record, rounded to the nearest four byte (word) boundary. 430 For instance, if the prefix length is /49, the nearest four-byte word 431 boundary would require that eight bytes are stored. IPv6 RLOCs are 432 represented as normal 128-bit IPv6 addresses. 434 3.2. Database Update Format 436 A database update contains a set of changes to an existing database. 437 Each AFI/EID/mask-length tuple may have zero or more RLOCs associated 438 with it. In the case where there are no RLOCs, the EID entry is 439 removed from the database. Records that contain EIDs and mask 440 lengths that were not previously listed are simply added. Otherwise, 441 the old record for the EID and mask length is replaced by the more 442 current information. The record format used by the a database update 443 is the same as described in Section 3.1. 445 4. NERD Distribution Mechanism 447 4.1. Initial Bootstrap 449 Bootstrap occurs when a router needs to retrieve the entire database. 450 It knows it needs to retrieve the entire database because either it 451 has none or an update too substantial to process, as might be the 452 case if a router has been out of service for a substantially lengthy 453 period of time. 455 To bootstrap the ITR appends the database name plus "/current/ 456 entiredb" to a Base Distribution URI and retrieves the file via HTTP. 457 For example, if the configured URI is 458 "http://www.example.com/eiddb/", and assuming a database name of 459 "nerd.arin.net", the ITR would request 460 "http://www.example.com/eiddb/current/nerd.arin.net/entiredb". 461 Routers MUST check the signature on the database prior to installing 462 it, and MUST check that the database schema matches a schema they 463 understand. Once a router has a valid database it MUST store that 464 database in some sort of non-volatile memory (e.g., disk, flash 465 memory, etc). 467 N.B., the host component for such URIs MUST NOT resolve to a LISP 468 EID, lest a circular dependency be created. 470 4.2. Retrieving Changes 472 In order to retrieve a set of database changes an ITR will have 473 previously retrieved the entire database. Hence it knows the current 474 version of the database it has. Its first step for retrieving 475 changes is to retrieve the current version of the database. It does 476 so by appending "current/version" to the base distribution URI and 477 retrieving the file. Its format is text and it contains the integer 478 value of the current database version. 480 Once an ITR has retrieved the current version it compares version of 481 its local copy. If there is no difference, then the router is up to 482 date and need take no further actions until it next checks. 484 If the versions differ, the router next sends a request for the 485 appropriate change file by appending "current/changes/" and the 486 textual representation of the version of its local copy of the 487 database to the base distribution URI. For example, if the current 488 version of the database is 1105503 and router's version is 1105500, 489 and the base URI and database name are the same as above, the router 490 would request 491 "http://www.example.com/eiddb/nerd.arin.net/current/changes/1105500". 493 The server may not have that change file, either because there are 494 too many versions between what the router has and what is current, or 495 because no such change file was generated. If the server has changes 496 from the routers version to any later version, the server SHOULD 497 issue an HTTP redirect to that change file, and the router SHOULD 498 retrieve and process it. Once it has done so, the router should then 499 repeat the process until it has brought itself up to date. It is 500 thus important for servers to expire old change files in the order in 501 which they were generated. 503 By way of convention, it is suggested that the URIs issued in 504 redirects be of the following form: 506 {base dist. URI}/{dbname}/{more-recent-version}/changes/ 507 {older-version} 509 where "base dist. URI" is the base distribution URI, "dbname" is the 510 name of the database, and each version is the textual representation 511 of the integer version value. 513 For example, if the current database version was 1105503 and a router 514 made a request for 515 "http://www.example.com/eiddb/nerd.arin.net/current/changes/1105400" 516 but there was no change file from 1105400 to 1105503, and the server 517 had a group of change files to make the router current, it would 518 issue a redirect to 519 "http://www.example.com/eiddb/nerd.arin.net/110450/changes/1105400" 520 that the router would then process. The router would then make a 521 request for 522 "http://www.example.com/eiddb/nerd.arin.net/current/changes/110450" 523 that the server would have. 525 While it is unlikely that database versions would wrap, as they 526 consists of 32 bit integers, should the event occur, ITRs MUST 527 attempt first to retrieve a change file when their current version 528 number is within 10,000 of 2^32 and they see a version available that 529 is less than 10,000. Barring the availability of a change file, the 530 ITR MUST still assume that the database version has wrapped and 531 retrieve a new copy. 533 5. Analysis 535 We will start our analysis by looking at how much data will be 536 transferred to a router during bootstrap conditions. We will then 537 look at the bandwidth required. Next we will turn our concerns to 538 servers. Finally we will ponder the effect of providing only 539 changes. 541 In the analysis below we treat the overhead of the database header as 542 insignificant (because it is). The analysis should be similar, 543 whether a single database or multiple databases are employed, as we 544 would assume that no entry would appear more than once. 546 5.1. Database Size 548 By its very nature the information to be transported is relatively 549 static and is specifically designed to be topologically insensitive. 550 That is, every ITR is intended to have the same set of RLOCs for a 551 given EID. While some processing power will be necessary to install 552 a table, the amount required should be far less than that of a 553 routing information database because the level of entropy is intended 554 to be lower. 556 For purposes of this analysis, we will assume that the world has 557 migrated to IPv6, as this increases the size of the database, which 558 would be our primary concern. However, to mitigate the size 559 increase, we have limited the size of the prefix transmitted. For 560 purposes of this analysis, we shall assume an average prefix length 561 of 64 bits. 563 Based on that assumption, Section 3.1 states that mapping information 564 for each EID/Prefix includes a group of RLOCs, each with an 565 associated priority and weight, and that a minimum record size with 566 IPv6 EIDs with at least one RLOC is 30 bytes uncompressed. Each 567 additional IPv6 RLOC costs 20 bytes. 569 +-----------+--------+--------+---------+ 570 | 10^n EIDs | 2 RLOC | 4 RLOC | 8 RLOC | 571 +-----------+--------+--------+---------+ 572 | 4 | 500 KB | 900 KB | 1.70 MB | 573 | 5 | 5.0 MB | 9.0 MB | 17.0 MB | 574 | 6 | 50 MB | 90 MB | 170 MB | 575 | 7 | 500 MB | 900 MB | 1.70 GB | 576 | 8 | 5.0 GB | 9.0 GB | 17.0 GB | 577 +-----------+--------+--------+---------+ 579 Database size for IPv6 routes with average prefix length = 64 bits 581 Table 1 583 Entries in the above table are derived as follows: 585 E * (30 + 20 * (R - 1 )) 587 where E = number of EIDs (10^n), R = number of RLOCs per EID. 589 Our scaling target is to accommodate 10^8 multihomed systems, which 590 is one order magnitude greater than what is discussed in [12]. At 591 10^8 entries, a device could be expected to use between 5 and 17 592 gigabytes of RAM for the mapping. No matter the method of 593 distribution, any router that sits in the core of the Internet would 594 require near this amount of memory in order to perform the ITR 595 function. Large enterprise ETRs would be similarly strained, simply 596 due to the diversity of of sites that communicate with one another. 597 The good news is that this is not our starting point, but rather our 598 scaling target, a number that we intend to reach by the year 2050. 599 Our starting point is more likely in the neighborhood of 10^4 or 10^5 600 EIDs, thus requiring between 500KB and 17 MB. 602 5.2. Router Throughput Versus Time 604 +-------------------+---------+--------+---------+-------+ 605 | Table Size (10^N) | 1mb/s | 10mb/s | 100mb/s | 1gb/s | 606 +-------------------+---------+--------+---------+-------+ 607 | 6 | 8 | 0.8 | 0.08 | 0.008 | 608 | 7 | 80 | 8 | 0.8 | 0.08 | 609 | 8 | 800 | 80 | 8 | 0.8 | 610 | 9 | 8,000 | 800 | 80 | 8 | 611 | 10 | 80,000 | 8,000 | 800 | 80 | 612 | 11 | 800,000 | 80,000 | 8,000 | 800 | 613 +-------------------+---------+--------+---------+-------+ 615 Number of seconds to process NERD 617 Table 2 619 The length of time it takes to process the database is significant in 620 models where the device acquires the entire table. During this 621 period of time, either the router will be unable to route packets 622 using LISP or it must use some sort of query mechanism for specific 623 EIDs as the rest it populates its table through the transfer. 624 Table 2 shows us that at our scaling target, the length of time it 625 would take for a router using 1 mb/s of bandwidth is about 80 626 seconds. We can measure the processing rate in small numbers of 627 hours for any transfer speed greater than that. The fastest 628 processing time shows us as taking 8 seconds to process an entire 629 table of 10^9 bytes and 80 seconds for 10^10 bytes. 631 5.3. Number of Servers Required 633 As easy as it may be for a router to retrieve, the aggregate 634 information may be difficult for servers to transmit, assuming the 635 information is transmitted in aggregate (we'll revisit that 636 assumption later). 638 +----------------+------------+-----------+------------+------------+ 639 | # Simultaneous | 10 Servers | 100 | 1,000 | 10,000 | 640 | Requests | | Servers | Servers | Servers | 641 +----------------+------------+-----------+------------+------------+ 642 | 100 | 720 | 72 | 72 | 72 | 643 | 1,000 | 7,200 | 720 | 72 | 72 | 644 | 10,000 | 72,000 | 7,200 | 720 | 72 | 645 | 100,000 | 720,000 | 72,000 | 7,200 | 720 | 646 | 1,000,000 | 7,200,000 | 720,000 | 72,000 | 7,200 | 647 | 10,000,000 | 72,000,000 | 7,200,000 | 720,000 | 72,000 | 648 +----------------+------------+-----------+------------+------------+ 650 Retrieval time per number of servers in seconds. Assumes average 651 10^8 entries with 4 RLOCs per EID and that each server has access to 652 1gb/s and 100% efficient use of that bandwidth and no compression. 654 Table 3 656 Entries in the above table were generated using the following method: 658 For 10^8 entries with four RLOCs per EID, the table size is 9.0GB, 659 per our previous table. Assume 1 Gb/s transfer rates and 100% 660 utilization. Protocol overhead is ignored for this exercise. Hence 661 a single transfer X takes 48 seconds and can get no faster. 663 With this in mind, each entry is as follows: 665 max(1X,N*X/S) 667 where N=number of transfers, X = 72 seconds, 668 and S = number of servers. 670 If we have a distribution model which every device must retrieve the 671 mapping information upon start, Table 3 shows the length of time in 672 seconds it will take for a given number of servers to complete a 673 transfer to a given number of devices. This table says, as an 674 example, that it would take 72,000 seconds (20 hours) for one million 675 ITRs to simultaneously retrieve the database from one thousand 676 servers. Should a cold start scenario occur, this number should be 677 of some concern. Hence it is important to take some measures both to 678 avoid such a scenario, and to ease the load should it occur. The 679 primary defense should be for ITRs to first attempt to retrieve their 680 databases from their peers or upstream providers. Secondary defenses 681 could include data sanity checks within ITRs, with agreed norms for 682 how much the database should change in any given update or over any 683 given period of time. As we will see below, dissemination of changes 684 is considerably less volume. 686 +----------------+-------------+---------------+----------------+ 687 | % Daily Change | 100 Servers | 1,000 Servers | 10,000 Servers | 688 +----------------+-------------+---------------+----------------+ 689 | 0.1% | 300 | 30 | 3 | 690 | 0.5% | 1500 | 150 | 15 | 691 | 1% | 3000 | 300 | 30 | 692 | 5% | 15,000 | 1500 | 150 | 693 | 10% | 30,000 | 3000 | 300 | 694 +----------------+-------------+---------------+----------------+ 696 Assuming 10 million routers and a database size of 9GB, resulting 697 hourly transfer times are shown in seconds, given number of servers 698 and daily rate of change. 700 Table 4 702 This table shows us that with 10,000 servers the average transfer 703 time with 1Gb/s links for 10,000,000 routers will be 300 seconds with 704 10% daily change spread over 24 hourly updates. For a 0.1% daily 705 change, that number is 3 seconds for a database of size 9.0GB. 707 The amount of change goes to the purpose of LISP. If its purpose is 708 to provide effective multihoming support to end customers, then we 709 might anticipate relatively few changes. If, on the other, service 710 providers attempt to make use of LISP to provide some form of traffic 711 engineering, we can expect the same data to change more often. We 712 can probably not conclude much in this regard without additional 713 operational experience. The one thing we can say is that different 714 applications of the LISP protocol may require new and different 715 distribution mechanisms. Such optimization is left for another day. 717 5.4. Security Considerations 719 Whichever the answer to our previous question, we must consider the 720 security of the information being transported. If an attacker can 721 forge an update or tamper with the database, he can in effect 722 redirect traffic to end sites. Hence, integrity and authenticity of 723 the NERD is critical. In addition, a means is required to determine 724 whether a source is authorized to modify a given database. No data 725 privacy is required. Quite to the contrary, this information will be 726 necessary for any ITR. 728 The first question one must ask is who to trust to provide the ITR a 729 mapping. Ultimately the owner of the EID prefix is most 730 authoritative for the mapping to RLOCs. However, were all owners to 731 sign all such mappings, ITRs would need to know which owner is 732 authorized to modify which mapping, creating a problem of O(N^2) 733 complexity. 735 We can reduce this problem substantially by investing some trust in a 736 small number of entities that are allowed to sign entries. If 737 authority manages EIDs much the same way a domain name registrar 738 handles domains, then the owner of the EID would choose a database 739 authority she or he trusts, and ITRs must trust each such authority 740 in order to map the EIDs listed by that authority to RLOCs. This 741 reduces the amount of management complexity on the ETR to retaining 742 knowledge of O(#authorities), but does require that each authority 743 establish procedures for authenticating the owner of an EID. Those 744 procedures needn't be the same. 746 There are two classic methods to ensure integrity of data: 748 o secure transport of the source of the data to the consumer, such 749 as Transport Layer Security (TLS) [11]; and 750 o provide object level security. 752 These methods are not mutually exclusive, although one can argue 753 about the need for the former, given the latter. 755 In the case of TLS, when it is properly implemented, the objects 756 being transported cannot easily be modified by interlopers or so- 757 called men in the middle. When data objects are distributed to 758 multiple servers, each of those servers must be trusted. As we have 759 seen above, we could have quite a large number of servers, thus 760 providing an attacker a large number of targets. We conclude that 761 some form of object level security is required. 763 Object level security involves an authority signing an object in a 764 way that can easily be verified by a consumer, in this case a router. 765 In this case, we would want the mapping table and any incremental 766 update to be signed by the originator of the update. This implies 767 that we cannot simply make use of a tool like CVS [13]. Instead, the 768 originator will want to generate diffs, sign them, and make them 769 available either directly or through some sort of content 770 distribution or peer to peer network. 772 5.4.1. Use of Public Key Infrastructures (PKIs) 774 X.509 provides a certificate hierarchy that has scaled to the size of 775 the Internet. The system is most manageable when there are few 776 certificates to manage. The model proposed in this memo makes use of 777 one current certificate per database authority. The three pieces of 778 information necessary to verify a signature, therefore, are as 779 follows: 781 o the certificate of the database authority, which can be provided 782 along with the database; 783 o the certificate authority's certificate; and 784 o A table of database names and distinguished names (DNs) that are 785 allowed to update them. 787 The latter two pieces of information must be very well known and must 788 be configured on each ITR. It is expected that both would change 789 very rarely, and it would not be unreasonable for such updates to 790 occur as part of a normal OS release process. 792 The tools for both signing and verifying are readily available. 793 OpenSSL [21] provides tools and libraries for both signing and 794 verifying. Other tools commonly exist. 796 Use of PKIs is not without implementation, operational complexity or 797 risk. The following risks and mitigations are identified with NERD's 798 use of PKIs: 800 If a NERD database authority private key is exposed: 802 In this case an attacker could sign a false database update, 803 either redirecting traffic, or otherwise causing havoc. In this 804 case, the NERD database administrator must revoke its existing key 805 and issue a new one. The certificate is added to a certificate 806 revocation list (CRL), which may be distributed with both this and 807 other databases, as well as through other channels. Because this 808 event is expected to be rare, and the number of database 809 authorities is expected to be small, a CRL will be small. When a 810 router receives a revocation, it checks it against its existing 811 databases, and attempts to update the one that is revoked. This 812 implies that prior to issuing the revocation, the database 813 authority MUST sign an update with the new key. Routers SHOULD 814 discard updates they have already received that were signed after 815 the revocation was generated. If a router cannot confirm that 816 whether the authority's certificate was revoked before or after a 817 particular update, it MUST retrieve a fresh new copy of the 818 database with a valid signature. 820 The private key associated with the CA that signed the Authority's 821 certificate is compromised: 823 In this case, it becomes possible for an attacker to masquerade as 824 the database authority. To ameliorate damage, the database 825 authority SHOULD revoke its certificate and get a new certificate 826 issued from a CA that is not compromised. Once it has done so, 827 the previous procedure is followed. The compromised certificate 828 can be removed during the normal operating system upgrade cycle. 830 An algorithm used if either the certificate or the signature is 831 cracked: 833 This is a catastrophic failure and the above forms of attack 834 become possible. The only mitigation is to make use of a new 835 algorithm. In theory this should be possible, but in practice has 836 proved very difficult. For this reason, additional work is 837 recommended to make alternative algorithms available. 839 The Database Authority loses its key or disappears: 841 In this case nobody can update the existing database. There are 842 few programmatic mitigations. If the database authority places 843 its private keys and suitable amounts of information escrow, under 844 agreed upon circumstances, such as no updates for three days, for 845 example, the escrow agent would release the information to a party 846 competent of generating a database update. 848 5.4.2. Other Risks 850 Because this specification does not require secure transport, if an 851 attacker prevents updates to an ITR for the purposes of having that 852 ITR continue to use a compromised ETR, the ITR could continue to use 853 an old version of the database without realizing a new version has 854 been made available. If one is worried about such an attack, a 855 secure channel such as SSL to a secure chain back to the database 856 authority should be used. It is possible that after some operational 857 experience, later versions of this format will contain additional 858 semantics to address this attack. 860 As discussed above, substantial risk would be a cold start scenario. 861 If an attacker found a bug in a common operating system that allowed 862 it to erase an ITR's database, and was able to disseminate that bug, 863 the collective ability of ITRs to retrieve new copies of the database 864 could be taxed by collective demand. The remedy to this is for 865 devices to share copies of the database with their neighbors, thus 866 making each potential requester a potential service. 868 6. Why not use XML? 870 Many objects these days are distributed as either XML pages or 871 something derived as XML [16], such as SOAP [17],[18]. Use of such 872 well known standards allows for high level tools and library reuse. 873 XML's strength is extensibility. Without a doubt XML would be more 874 extensible than a fixed field database. Why not, then, use these 875 standards in this case? The greatest concern the author had was 876 compactness of the data stream. In as much as this mechanism is used 877 at all in the future, so long as that concern could be addressed, and 878 so long as signatures of the database can be verified, XML probably 879 should be considered. 881 7. Other Distribution Mechanisms 883 We now consider various different mechanisms. The problem of 884 distributing changes in various databases is as old as databases. 885 The author is aware of two obvious approaches that have been well 886 used in the past. One approach would be the wide distribution of CVS 887 repositories. However, for reasons mentioned in the previous 888 section, CVS is insufficient to the task. 890 The other tried and true approach is the use of periodic updates in 891 the form of messages. Good old NNTP [7] itself provides two separate 892 mechanisms (one push and another pull) to provide a coherent update 893 process. This was in fact used to update molecular biology databases 894 [14] in the early 1990s. Netnews offers a way to determine whether 895 articles with specified Article-Ids have been received. In the case 896 where the mapping file source of authority wishes to transmit 897 updates, it can sign a change file and then post it into the network. 898 Routers merely need to keep a record of article ids that it has 899 received. Netnews systems have years ago handled far greater volume 900 of traffic than we envision. [22] Initially this is probably 901 overkill, but it may not be so later in this process. Some 902 consideration should be given to a mechanism known to widely 903 distribute vast amounts of data, as instantaneously either the sender 904 or the receiver wishes. 906 To attain an additional level of hierarchy in the distribution 907 network, service providers could retrieve information to their own 908 local servers, and configure their routers with the host portion of 909 the above URI. 911 Another possibility would be for providers to establish an agreement 912 on a small set of anycast addresses for use for this purpose. There 913 are limitations to the use of anycast, particularly with TCP. In the 914 midst of a routing flap anycast address can become all but unusable. 915 Careful study of such a use as well as appropriate use of HTTP 916 redirects is expected. 918 7.1. What About DNS as a retrieval model? 920 It has been proposed that a query/response mechanism be used for this 921 information, and that specifically the domain name system (DNS) [8] 922 be used. The previous models do not preclude the DNS. DNS has the 923 advantage that the administrative lines are well drawn, and that the 924 ID/RLOC mapping is likely to appear very close to these boundaries. 925 DNS also has the added benefit that an entire distribution 926 infrastructure already exists. There are, however, some problems 927 that could impact end hosts when intermediate routers make queries, 928 some of which were first pointed out in [9]: 930 o Any query mechanism offers an opportunity for a resource attack if 931 an attacker can force the ITR to query for information. In this 932 case, all that would be necessary would be for a "botnet" (a group 933 of computers that have been compromised and used as vehicles to 934 attack others) to ping or otherwise contact via some normal 935 service hosts that sit behind the ETR. If the botnet hosts 936 themselves are behind ETRs, the victim's ITR will need to query 937 for each and every one of them, thus becoming part of a classic 938 reflector attack. 939 o Packets will be delayed at the very least, and probably dropped in 940 the process of a mapping query. This could be at the beginning of 941 a communication, but it will be impossible for a router to 942 conclude with certainty that this is the case. 943 o The DNS has a backoff algorithm that presumes that applications 944 are making queries prior to the beginning of a communication. 945 This is appropriate for end hosts who know in fact when a 946 communication begins. An end user may not enjoy a router waiting 947 seconds for a retry. 948 o While the administrative lines may appear to be correct, the 949 location of name servers may not be. If name servers sit within 950 PI address space, thus requiring LISP to reach, a circular 951 dependency is created. This is precisely where many enterprise 952 name servers sit. The LISP experiment should not predicate its 953 success on relocation of such name servers. 955 Never-the-less, DNS may be able to play a role in providing the 956 enterprise control over the mapping of its EIDs to RLOCs. Posit a 957 new DNS record "EID2RLOC". This record is used by the authority to 958 collect and aggregate mapping information so that it may be 959 distributed through one of the other mechanisms. As an example: 961 $ORIGIN 0.10.PI-SPACE. 962 128 EID2RLOC mask 23 priority 10 weight 5 172.16.5.60 963 EID2RLOC mask 23 priority 15 weight 5 192.168.1.5 965 In the above figure network 10.0.128/23 would delegated to some end 966 system, say EXAMPLE.COM. They would manage the above zone 967 information. This would allow a DNS mechanism to work, but it would 968 also allow someone to aggregate the information and distribution a 969 table. 971 7.2. Use of BGP and LISP+ALT 973 Border Gateway Protocol (BGP) [10] is currently used to distribute 974 inter-domain routing throughout the Internet. Why not, then, use BGP 975 to distribute mapping entries, or provide a rendezvous mechanism to 976 initialize mapping entries? In fact this is precisely what LISP+ALT 977 [19] accomplishes, using a completely separate topology from the 978 normal DFZ. It does so using existing code paths and expertise. The 979 alternate topology also provides an extremely accurate control path 980 from ITRs to ETRs, whereas NERD's operational model requires an 981 optimistic assumption and control plane functionality to cycle 982 through unresponsive ETRs in an EID prefix's mapping entry. The 983 memory scaling characteristics of LISP+ALT are extremely attractive 984 because of expected strong aggregation, whereas NERD makes almost no 985 attempt at aggregation. 987 A number of key deployment issues are left open. The principle issue 988 is whether it is deemed acceptable for routers to drop packets 989 occasionally while mapping information is being gathered. This 990 should be the subject of future research for ALT, as it was a key 991 design goal of NERD to avoid such a situation. 993 7.3. Perhaps use a hybrid model? 995 Perhaps it would be useful to use both a prepopulated database such 996 as NERD and a query mechanism (perhaps LISP+ALT, LISP-CONS [20], or 997 DNS) to determine an EID/RLOC mapping. One idea would be to receive 998 a subset of the mappings, say, by taking only the NERD for certain 999 regions. This alleviates the need to drop packets for some subset of 1000 destinations under the assumption that one's business is localized to 1001 a particular region. If one did not have a local entry for a 1002 particular EID one would then make a query. 1004 One approach to using DNS to query live would be to periodically walk 1005 "interesting" portions of the network, in search of relevant records, 1006 and caching them to non-volatile storage. While preventing resource 1007 attacks, the walk itself could be viewed as an attack, if the 1008 algorithm was not selective enough about what it thought was 1009 interesting. A similar approach could be applied to LISP+ALT or 1010 LISP-CONS by forcing a data-driven Map Reply for certain sites. 1012 8. Deployment Issues 1014 While LISP and NERD are intended as experiments at this point, it is 1015 already obvious one must give serious consideration to circular 1016 dependencies with regard to the protocols used and the elements 1017 within them. 1019 8.1. HTTP 1021 In as much as HTTP depends on DNS, either due to the authority 1022 section of a URI, or due to the configured base distribution URI, 1023 these same concerns apply. In addition, any HTTP server that itself 1024 makes use of provider independent addresses would be a poor choice to 1025 distribute the database for these exact same reasons. 1027 One issue with using HTTP is that it is possible that a middlebox of 1028 some form, such as a cache, may intercept and process requests. In 1029 some cases this might be a good thing. For instance, if a cache 1030 correctly returns a database, some amount of bandwidth is conserved. 1031 On the other hand, if the cache itself fails to function properly for 1032 whatever reason, end to end connectivity could be impaired. For 1033 example, if the cache itself depended on the mapping being in place 1034 and functional, a cold start scenario might leave the cache 1035 functioning improperly, in turn providing routers no means to update 1036 their databases. Some care must be given to avoid such 1037 circumstances. 1039 9. Open Questions 1041 Do we need to discuss reachability in more detail? This was clearly 1042 an issue at the IST-RING workshop. There are two key issues. First, 1043 what is the appropriate architectural separation between the data 1044 plane and the control plane? Second, is there some specific way in 1045 which NERD impacts the data plane? 1047 Should we specify a (perhaps compressed) tarball that treads a middle 1048 ground for the last question, where each update tarball contains both 1049 a signature for the update and for the entire database, once the 1050 update is applied. 1052 Should we compress? In some initial testing of databases with 1, 5, 1053 and 10 million IPv4 EIDs and a random distribution of IPv4 RLOCs, the 1054 current format in this document compresses down by a factor of 1055 between 35% and 36%, using Burrows-Wheeler block sorting text 1056 compression algorithm (bzip2). The NERD used random EIDs with mask 1057 lengths varying from 19-29, with probability weighted toward the 1058 smaller masks. This only very roughly reflects reality. A better 1059 test would be to start with the existing prefixes found in the DFZ. 1061 10. Conclusions 1063 This memo has specified a database format, an update format, a URI 1064 convention, an update method, and a validation method for EID/RLOC 1065 mappings. We have shown that beyond the predictions of 10^8 EID- 1066 prefix entries, the aggregate database size would likely be at most 1067 17GB. We have considered the amount of servers to distribute that 1068 information and we have demonstrated the limitations of a simple 1069 content distribution network and other well known mechanisms. The 1070 effort required to retrieve a database change amounts to between 3 1071 and 30 seconds of processing time per hour at at today's gigabit 1072 speeds. We conclude that there is no need for an off box query 1073 mechanism today, and that there are distinct disadvantages for having 1074 such a mechanism in the control plane. 1076 Beyond this we have examined alternatives that allow for hybrid 1077 models that do use query mechanisms, should our operating assumptions 1078 prove overly optimistic. Use of NERD today does not foreclose use of 1079 such models in the future, and in fact both models can happily co- 1080 exist. 1082 We leave to future work how the list of databases is distributed, how 1083 BGP can play a role in distributing knowledge of the databases, and 1084 how DNS can play a role in aggregating information into these 1085 databases. 1087 We also leave to future work whether HTTP is the best protocol for 1088 the job, and whether the scheme described in this document is the 1089 most efficient. One could easily envision that when applied in high 1090 delay or high loss environments, a broadcast or multicast method may 1091 prove more effective. 1093 Speaking of multicast, we also leave to future work how multicast is 1094 implemented, if at all, either in conjunction or as an extension to 1095 this model. 1097 11. IANA Considerations 1099 This memo makes no requests of IANA. 1101 12. Acknowledgments 1103 Dino Farinacci, Patrik Faltstrom, Dave Meyer, Joel Halpern, Dave 1104 Thaler, Mohamed Boucadair, Robin Whittle, Max Pritikin, and Scott 1105 Brim were very helpful with their reviews of this work. Thanks also 1106 to the participants of the Routing Research Group and the IST-RING 1107 workshop held in Madrid in December of 2007 for their incisive 1108 comments. The astute will notice a lengthy References section. This 1109 work stands on the shoulders of many others' efforts. 1111 13. References 1113 13.1. Normative References 1115 [1] Farinacci, D., Fuller, V., Oran, D., and D. Meyer, "Locator/ID 1116 Separation Protocol (LISP)", draft-farinacci-lisp-07 (work in 1117 progress), April 2008. 1119 [2] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., 1120 Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- 1121 HTTP/1.1", RFC 2616, June 1999. 1123 [3] Bradner, S., "Key words for use in RFCs to Indicate Requirement 1124 Levels", BCP 14, RFC 2119, March 1997. 1126 [4] Kaliski, B., "PKCS #7: Cryptographic Message Syntax Version 1127 1.5", RFC 2315, March 1998. 1129 [5] International Telecommunications Union, "Information technology 1130 - Open Systems Interconnection - The Directory: Public-key and 1131 attribute certificate frameworks", ITU-T Recommendation X.509, 1132 ISO Standard 9594-8, March 2000. 1134 [6] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform 1135 Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, 1136 January 2005. 1138 13.2. Informational References 1140 [7] Kantor, B. and P. Lapsley, "Network News Transfer Protocol", 1141 RFC 977, February 1986. 1143 [8] Mockapetris, P., "Domain names - concepts and facilities", 1144 STD 13, RFC 1034, November 1987. 1146 [9] Huitema, C., "An Experiment in DNS Based IP Routing", RFC 1383, 1147 December 1992. 1149 [10] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 4 1150 (BGP-4)", RFC 4271, January 2006. 1152 [11] Dierks, T. and E. Rescorla, "The Transport Layer Security (TLS) 1153 Protocol Version 1.1", RFC 4346, April 2006. 1155 [12] Carpenter, B., "IETF Plenary Presentation: Routing and 1156 Addressing: Where we are today", March 2007. 1158 [13] Grune, R., Baalbergen, E., Waage, M., Berliner, B., and J. 1159 Polk, "CVS: Concurrent Versions System", November 1985. 1161 [14] Smith, R., Gottesman, Y., Hobbs, B., Lear, E., Kristofferson, 1162 D., Benton, D., and P. Smith, "A mechanism for maintaining an 1163 up-to-date GenBank database via Usenet", CABIOS , April 1991. 1165 [15] International International Telephone and Telegraph 1166 Consultative Committee, "Information Technology - Open Systems 1167 Interconnection - The Directory: Authentication Framework", 1168 CCITT Recommendation X.509, November 1988. 1170 [16] Maler, E., Paoli, J., Yergeau, F., Cowan, J., Bray, T., and C. 1171 Sperberg-McQueen, "Extensible Markup Language (XML) 1.1", World 1172 Wide Web Consortium FirstEdition REC-xml11-20040204, 1173 February 2004, . 1175 [17] Nielsen, H., Hadley, M., Karmarkar, A., Lafon, Y., Mendelsohn, 1176 N., Moreau, J., and M. Gudgin, "SOAP Version 1.2 Part 1: 1177 Messaging Framework (Second Edition)", World Wide Web 1178 Consortium Recommendation REC-soap12-part1-20070427, 1179 April 2007, 1180 . 1182 [18] Mendelsohn, N., Karmarkar, A., Lafon, Y., Hadley, M., Gudgin, 1183 M., Nielsen, H., and J. Moreau, "SOAP Version 1.2 Part 2: 1184 Adjuncts (Second Edition)", World Wide Web Consortium 1185 Recommendation REC-soap12-part2-20070427, April 2007, 1186 . 1188 [19] Farinacci, D., "LISP Alternative Topology (LISP+ALT)", 1189 draft-fuller-lisp-alt-02 (work in progress), April 2008. 1191 [20] Brim, S., "LISP-CONS: A Content distribution Overlay Network 1192 Service for LISP", draft-meyer-lisp-cons-04 (work in progress), 1193 April 2008. 1195 URIs 1197 [21] 1199 [22] 1201 Appendix A. Generating and verifying the database signature with 1202 OpenSSL 1204 As previously mentioned, one goal of NERD was to use off-the-shelf 1205 tools to both generate and retrieve the database. To many, PKI is 1206 magic. This section is meant to provide at least some clarification 1207 as to both the generation and verification process, complete with 1208 command line examples. Not included is how you get the entries 1209 themselves. We'll assume they exist, and that you're just trying to 1210 sign the database. 1212 To sign the database, to start with, you need a database file that 1213 has a database header described in Section 3. Block size should be 1214 zero, and there should be no PKCS#7 block at this point. You also 1215 need a certificate and its private key with which you will sign the 1216 database. 1218 The OpenSSL "smime" command contains all the functions we need from 1219 this point forth. To sign the database, issue the following command: 1221 openssl smime -binary -sign -outform DER -signer yourcert.crt \ 1222 -inkey yourcert.key -in database-file -out signature 1224 -binary states that no MIME canonicalization should be performed. 1225 -sign indicates that you are signing the file that was given as the 1226 argument to -in. The output format (-outform) is binary DER, and 1227 your public certificate is provided with -signer along with your key 1228 with -inkey. The signature itself is specified with -out. 1230 The resulting file "signature" is then copied into to PKCS#7 block in 1231 the database header, its size in bytes is recorded in the PKCS#7 1232 block size field, and the resulting file is ready for distribution to 1233 ITRs. 1235 To verify a database file, first retrieve the PKCS#7 block from the 1236 file by copying the appropriate number of bytes into another file, 1237 say "signature". Next, zero this field, and set the block size field 1238 to 0. Next use the "smime" command to verify the signature as 1239 follows: 1241 openssl smime -binary -verify -inform DER -content database-file 1242 -out /dev/null -in signature 1244 Openssl will return "Verification OK" if the signature is correct. 1245 OpenSSL provides sufficiently rich libraries to accomplish the above 1246 within the C programming language with a single pass. 1248 Appendix B. Changes 1250 This section to be removed prior to publication. 1252 o 04: Analysis change: IPv6 RLOCs are 128 bits. While they can be 1253 shortened to 64 bits, that involves substantial ETR changes and 1254 expenditure of IPv6 networks, which is probably unnecessary, and 1255 can be left as a later optimization. Added an option of 1256 independent operators. Processed all but two of Dino's comments. 1257 Addressed Scott's comments. Removed existing work analysis. 1258 Saving that for another day. Clarified OpenSSL Appendix. 1259 o 05: clean DOWN. reinsert some text for historical purposes. 1260 o 04: cleanup 1261 o 03: Change dbname to a domain name, indicate that is what is in 1262 the subject of the X.509 certificate, and list editorial changes, 1263 update acknowledgments. 1264 o 02: Incorporate some of Dave Thaler's comments. Add 1265 authentication block detail. Modify analysis to take IPv6 into 1266 account, along with a more realistic number of RLOCs per EID. Add 1267 some comments about potential risks of a cold start. Add S/MIME 1268 example as appendix A and take out old ToDo. Provide some amount 1269 of compression of IPv6 addresses by limiting their size to 1270 significant bytes rounded to a four byte word boundary. 1271 o 01: Massive spelling correction, URI example correction. 1272 o 00: Initial Revision. 1274 Author's Address 1276 Eliot Lear 1277 Cisco Systems GmbH 1278 Glatt-com 1279 Glattzentrum, ZH CH-8301 1280 Switzerland 1282 Phone: +41 44 878 7525 1283 Email: lear@cisco.com