idnits 2.17.1 draft-lear-lisp-nerd-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 2) being 60 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 2 instances of too long lines in the document, the longest one being 42 characters in excess of 72. == There are 1 instance of lines with non-RFC2606-compliant FQDNs in the document. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 109 has weird spacing: '...instead learn...' == Line 760 has weird spacing: '...thority manag...' == Line 977 has weird spacing: '...nd user may n...' == Line 1037 has weird spacing: '... of the netwo...' == Unrecognized Status in 'Intended status: Experimental Protocol', assuming Proposed Standard (Expected one of 'Standards Track', 'Full Standard', 'Draft Standard', 'Proposed Standard', 'Best Current Practice', 'Informational', 'Experimental', 'Informational', 'Historic'.) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 20, 2012) is 4388 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 816 -- Looks like a reference, but probably isn't: '2' on line 929 == Unused Reference: 'I-D.meyer-lisp-cons' is defined on line 1258, but no explicit reference was found in the text == Outdated reference: A later version (-24) exists of draft-ietf-lisp-22 ** Downref: Normative reference to an Experimental draft: draft-ietf-lisp (ref. 'I-D.ietf-lisp') -- Possible downref: Non-RFC (?) normative reference: ref. 'ITU.X509.2000' ** Obsolete normative reference: RFC 6125 (Obsoleted by RFC 9525) -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 4346 (Obsoleted by RFC 5246) == Outdated reference: A later version (-23) exists of draft-ietf-dane-protocol-19 == Outdated reference: A later version (-16) exists of draft-ietf-lisp-ms-15 Summary: 3 errors (**), 0 flaws (~~), 13 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group E. Lear 3 Internet-Draft Cisco Systems GmbH 4 Intended status: Experimental Protocol April 20, 2012 5 Expires: October 20, 2012 7 NERD: A Not-so-novel EID to RLOC Database 8 draft-lear-lisp-nerd-09.txt 10 Abstract 12 LISP is a protocol to encapsulate IP packets in order to allow end 13 sites to route to one another without injecting routes from one end 14 of the Internet to another. This memo presents an experimental 15 database and a discussion of methods to transport the mapping of EIDs 16 to RLOCs to routers in a reliable, scalable, and secure manner. Our 17 analysis concludes that transport of of all EID/RLOC mappings scales 18 well to at least 10^8 entries. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on October 20, 2012. 37 Copyright Notice 39 Copyright (c) 2012 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents (http://trustee.ietf.org/ 44 license-info) in effect on the date of publication of this document. 45 Please review these documents carefully, as they describe your rights 46 and restrictions with respect to this document. Code Components 47 extracted from this document must include Simplified BSD License text 48 as described in Section 4.e of the Trust Legal Provisions and are 49 provided without warranty as described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Applicability . . . . . . . . . . . . . . . . . . . . . . 3 55 1.2. Base Assumptions . . . . . . . . . . . . . . . . . . . . . 3 56 1.3. What is NERD? . . . . . . . . . . . . . . . . . . . . . . 4 57 1.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . 4 58 2. Theory of Operation . . . . . . . . . . . . . . . . . . . . . 5 59 2.1. Database Updates . . . . . . . . . . . . . . . . . . . . . 5 60 2.2. Communications between ITR and ETR . . . . . . . . . . . . 6 61 2.3. Who are database authorities? . . . . . . . . . . . . . . 6 62 3. NERD Format . . . . . . . . . . . . . . . . . . . . . . . . . 7 63 3.1. NERD Record Format . . . . . . . . . . . . . . . . . . . . 9 64 3.2. Database Update Format . . . . . . . . . . . . . . . . . . 10 65 4. NERD Distribution Mechanism . . . . . . . . . . . . . . . . . 10 66 4.1. Initial Bootstrap . . . . . . . . . . . . . . . . . . . . 10 67 4.2. Retrieving Changes . . . . . . . . . . . . . . . . . . . . 11 68 5. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 69 5.1. Database Size . . . . . . . . . . . . . . . . . . . . . . 12 70 5.2. Router Throughput Versus Time . . . . . . . . . . . . . . 13 71 5.3. Number of Servers Required . . . . . . . . . . . . . . . . 14 72 5.4. Security Considerations . . . . . . . . . . . . . . . . . 16 73 5.4.1. Use of Public Key Infrastructures (PKIs) . . . . . . . 17 74 5.4.2. Other Risks . . . . . . . . . . . . . . . . . . . . . 19 75 6. Why not use XML? . . . . . . . . . . . . . . . . . . . . . . . 19 76 7. Other Distribution Mechanisms . . . . . . . . . . . . . . . . 19 77 7.1. What About DNS as a mapping retrieval model? . . . . . . . 20 78 7.2. Use of BGP and LISP+ALT . . . . . . . . . . . . . . . . . 21 79 7.3. Perhaps use a hybrid model? . . . . . . . . . . . . . . . 22 80 8. Deployment Issues . . . . . . . . . . . . . . . . . . . . . . 22 81 8.1. HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 82 9. Open Questions . . . . . . . . . . . . . . . . . . . . . . . . 22 83 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 23 84 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 85 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 86 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 87 13.1. Normative References . . . . . . . . . . . . . . . . . . 24 88 13.2. Informative References . . . . . . . . . . . . . . . . . 25 89 Appendix A. Generating and verifying the database signature with 90 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . 26 91 Appendix B. Changes . . . . . . . . . . . . . . . . . . . . . . . 27 92 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28 94 1. Introduction 96 Locator/ID Separation Protocol (LISP) [I-D.ietf-lisp] separates an IP 97 address used by a host and local routing system from the locators 98 advertised by BGP participants on the Internet in general, and in the 99 default free zone (DFZ) in particular. It accomplishes this by 100 establishing a mapping between globally unique endpoint identifiers 101 (EIDs) and routing locators (RLOCs). This reduces the amount of 102 state change that occurs on routers within the default-free zone on 103 the Internet, while enabling end sites to be multihomed. 105 In some mapping distribution approaches to LISP the mapping is 106 learned via data-triggered control messages between ingress tunnel 107 routers (ITRs) and egress tunnel routers (ETRs) through an alternate 108 routing topology [I-D.ietf-lisp-alt]. In other approaches of LISP, 109 the mapping from EIDs to RLOCs is instead learned through some other 110 means. This memo addresses different approaches to the problem, and 111 specifies a Not-so-novel EID RLOC Database (NERD) and methods to both 112 receive the database and to receive updates. 114 NERD is offered primarily as a way to avoid dropping packets, the 115 underlying assumption being that dropping packets is bad for 116 applications and end users. Those who do not agree with this 117 underlying assumption may find that other approaches make more sense. 119 NERD is specified in such a way that the methods used to distribute 120 or retrieve it may vary over time. Multiple databases are supported 121 in order to allow for multiple data sources. An effort has been made 122 to divorce the database from access methods so that both can evolve 123 independently through experimentation and operational validation. 125 1.1. Applicability 127 This memo is based on experiments performed in the 2007-2009 time 128 frame. At the time of its publication, the author is unaware of 129 operational use of NERD. Those wishing to pursue NERD should 130 consider the substantial amount of work left for the future. See 131 Section 10 for more details. 133 1.2. Base Assumptions 135 In order to specify a mapping it is important to understand how it 136 will be used, and the nature of the data being mapped. In the case 137 of LISP, the following assumptions are pertinent: 139 o The data contained within the mapping changes only on provisioning 140 or configuration operations, and is not intended to change when a 141 link either fails or is restored. Some other mechanism such as 142 the use of LISP Reachability Bits with mapping replies handles 143 healing operations, particularly when a tail circuit within an 144 service provider's aggregate goes down. NERD can be used as a 145 verification method to ensure that whatever operational mapping 146 changes an ITR receives are authorized. 148 o While weight and priority are defined, these are not hop-by-hop 149 metrics. Hence the information contained within the mapping does 150 not change based on where one sits within the topology. 152 o A purpose of LISP being to reduce control plane overhead by 153 reducing "rate X state" complexity, updates to the mapping will be 154 relatively rare. 156 o Because NERD is designed to ease interdomain routing, its use is 157 intended within the inter-domain environment. That is, NERD is 158 best implemented at either the customer edge or provider edge, and 159 there will be on the order of as many ITRs and EID Prefixes as 160 there are connections to Internet Service Providers by end 161 customers. 163 o As such, NERD cannot be the sole means to implement host mobility, 164 although NERD may be in used in conjunction with other mechanisms. 166 1.3. What is NERD? 168 NERD is a Not-so-novel EID to RLOC Database. It consists of the 169 following components: 171 1. a network database format; 173 2. a change distribution format; 175 3. a database retrieval/bootstrapping method; 177 4. a change distribution method. 179 The network database format is compressible. However, at this time 180 we specify no compression method. NERD will make use of potentially 181 several transport methods, but most notably HTTP [RFC2616]. HTTP has 182 restart and compression capabilities. It is also widely deployed. 184 There exist many methods to show differences between two versions of 185 a database or a file, UNIX's "diff" being the classic example. In 186 this case, because the data is well structured and easily keyed, we 187 can make use of a very simple format for version differences that 188 simply provides a list of EID/RLOC mappings that have changed using 189 the same record format as the database, and a list of EIDs that are 190 to be removed. 192 1.4. Glossary 194 The reader is once again referred to [I-D.ietf-lisp] for a general 195 glossary of terms related to LISP. The following terms are specific 196 to this memo. 198 Base Distribution URI: An Absolute-URI as defined in Section 4.3 of 199 [RFC3986] from which other references are relative. The base 200 distribution URI is used to construct a URI to an EID/RLOC mapping 201 database. If more than one NERD is known then there will be one 202 or more base distribution URIs associated with each (although each 203 such base distribution URI may have the same value). 205 EID Database Authority: The authority that will sign database files 206 and updates. It is the source of both. 208 The Authority: Shorthand for the EID Database Authority. 210 NERD: (N)ot-so-novel (E)ID to (R)LOC (D)atabase. 212 AFI Address Family Identifier. 214 Pull Model: An architecture where clients pull only the information 215 they need at any given time, such as when a packet arrives for 216 forwarding. 218 Push Model: An architecture in which clients receive an entire 219 dataset, containing data they may or may not require, such as 220 mappings for EIDs that no host served is attempting to send to. 222 Hybrid Model: An architecture in which some information is pushed 223 toward the receiver from a source and some information is pulled 224 by the receiver. 226 2. Theory of Operation 228 Operational functions are split into two components: database updates 229 and state exchange between ITR and ETR during a communication. 231 2.1. Database Updates 233 What follows is a summary of how NERDs are generated and updated. 234 Specifics can be found in Section 3. The general way in which NERD 235 works is as follows: 237 1. A NERD is generated by an authority that allocates provider 238 independent (PI) addresses (e.g., IANA or an RIR) which are used 239 by sites as EIDs. As part of this process the authority 240 generates a digest for the database and signs it with a private 241 key whose public key is part of an X.509 certificate. 242 [ITU.X509.2000] That signature along with a copy of the 243 authority's public key is included in the NERD. 245 2. The NERD is distributed to a group of well known servers. 247 3. ITRs retrieve an initial copy of the NERD via HTTP when they come 248 into service. 250 4. ITRs are preconfigured with a group of certificates whose private 251 keys are used by database authorities to sign the NERD. This 252 list of certificates should be configurable by administrators. 254 5. ITRs next verify both the validity of the public key and the 255 signed digest. If either fail validation, the ITR attempts to 256 retrieve the NERD from a different source. The process iterates 257 until either a valid database is found or the list of sources is 258 exhausted. 260 6. Once a valid NERD is retrieved, the ITR installs it into both 261 non-volatile and local memory. 263 7. At some point the authority updates the NERD and increments the 264 database version counter. At the same time it generates a list 265 of changes, which it also signs, as it does with the original 266 database. 268 8. Periodically ITRs will poll from their list of servers to 269 determine if a new version of the database exists. When a new 270 version is found, an ITR will attempt to retrieve a change file, 271 using its list of preconfigured servers. 273 9. The ITR validates a change file just as it does the original 274 database. Assuming the change file passes validation, the ITR 275 installs new entries, overwrites existing ones, and removes empty 276 entries, based on the content of the change file. 278 As time goes on it is quite possible that an ITR may probe a list of 279 configured peers for a database or change file copy. It is equally 280 possible that peers might advertise to each other the version number 281 of their database. Such methods are not explored in depth in this 282 memo, but are mentioned for future consideration. 284 2.2. Communications between ITR and ETR 286 [I-D.ietf-lisp] describes the basic approach to what happens when a 287 packet arrives at an ITR, and what communications between ITR and ETR 288 take place. NERD provides an optimistic approach to establishing 289 communications with an ETR that is responsible for a given EID 290 prefix. State must be kept, however, on an ITR to determine whether 291 that ETR is in fact reachable. It is expected that this is a common 292 requirement across LISP mapping systems, and will be handled in the 293 core LISP architecture. 295 2.3. Who are database authorities? 297 This memo does not specify who the database authority is. That is 298 because there are several possible operational models. In each case 299 the number of database authorities is meant to be small so that ITRs 300 need only keep a small list of authorities, similar to the way a name 301 server might cache a list of root servers. 303 o A single database authority exists. In this case all entries in 304 the database are registered to a single entity, and that entity 305 distributes the database. Because the EID space is provider 306 independent address space, there is no architectural requirement 307 that address space be hierarchically distributed to anyone, as 308 there is with provider-assigned address space. Hence, there is a 309 natural affinity between the IANA function and the database 310 authority function. 312 o Each region runs a database authority. In this case, provider 313 independent address space is allocated to either Regional Internet 314 Registries (RIRs) or to affiliates of such organizations of 315 network operations guilds (NOGs). The benefit of this approach is 316 that there is no single organization that controls the database. 317 It allows one database authority to backup another. One could 318 envision as many as ten database authorities in this scenario. 319 One drawback to this approach, however, is that any reference to a 320 region imposes a notion of locality, thus potentially diminishing 321 the split between locator and identifier. 323 o Each country runs a database authority. This could occur should 324 countries decide to regulate this function. While limiting the 325 scope of any single database authority as the previous scenario 326 describes, this approach would introduce some overhead as the list 327 of database authorities would grow to as many as 200, and possibly 328 more if jurisdictions within countries attempted to regulate the 329 function. There are two drawbacks to this approach. First, as 330 distribution of EIDs is driven to more local jurisdictions, an EID 331 prefix is tied even tighter to a location. Second, a large number 332 of database authorities will demand some sort of discovery 333 mechanism. 335 o Independent operators manage database authorities. This has the 336 appeals of being location independent, and enabling competition 337 for good performance. This method has the drawback of potentially 338 requiring a discovery mechanism. 340 The latter two approaches are not mutually exclusive. While this 341 specification allows for multiple databases, discovery mechanisms are 342 left as future work. 344 3. NERD Format 346 The NERD consists of a header that contains a database version and a 347 signature that is generated by ignoring the signature field and 348 setting the authentication block length to 0 (NULL). The 349 authentication block itself consists of a signature and a certificate 350 whose private key counterpart was used to generate the signature. 352 Records are kept sorted in numeric order with AFI plus EID as primary 353 key and prefix length as secondary. This is so that after a database 354 update it should be possible to reconstruct the database to verify 355 the digest signature, which may be retrieved separately from the 356 database for verification purposes. 358 0 1 2 3 359 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 360 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 361 | Schema Vers=1 | DB Code | Database Name Size | 362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 363 | Database Version | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | Old Database Version or 0 | 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 | | 368 | Database Name | 369 | | 370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 | PKCS#7 Block Size | Reserved | 372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 | | 374 | PKCS#7 Block containing Certificate and Signature | 375 | | 376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 378 Database Header 380 The DB Code indicates 0 if what follows is an entire database or 1 if 381 what follows is an update. The database file version is incremented 382 each time the complete database is generated by the authority. In 383 the case of an update, the database file version indicates the new 384 database file version, and the old database file version is indicated 385 in the "old DB version" field. The database file version is used by 386 routers to determine whether or not they have the most current 387 database. 389 The database name is a DNS-ID, as specified in [RFC6125]. This is 390 the name that will appear in the Subject field of the certificate 391 used to verify the database. The purpose of the database name is to 392 allow for more than one database. Such databases would be merged by 393 the router. It is important that an EID/RLOC mapping be listed in no 394 more than one database, lest inconsistencies arise. However, it may 395 be possible to transition a mapping from one database to another. 396 During the transition period, the mappings would be identical. When 397 they are not, the resultant behavior will be undefined. The database 398 name is padded with NULLs to the nearest fourth byte. 400 The PKCS#7 [RFC2315] authentication block contains a DER encoded 401 [ITU.X509.2000] signature and associated public key. For purposes of 402 this experiment all implementations will support the RSA encryption 403 signature algorithm and SHA1 digest algorithm, and the standard 404 attributes are expected to be present. 406 N.B., it has been suggested that Cryptographic Message Syntax (CMS) 407 [RFC5652] be used instead of PKCS#7. At the time this experiment was 408 performed, CMS was not yet widely deployed. However, it is certainly 409 the correct direction, and should be strongly considered in future 410 related work. 412 3.1. NERD Record Format 414 As distributed over the network, NERD records appear as follows: 416 0 1 2 3 417 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 419 | Num. RLOCs | EID Pref. Len | EID AFI | 420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 421 | End point identifier | 422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 | Priority 1 | Weight 1 | AFI 1 | 424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 425 | Routing Locator 1 | 426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 427 | Priority 2 | Weight 2 | AFI 2 | 428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 429 | Routing Locator 2 | 430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 431 | Priority 3 | Weight 3 | AFI 3 | 432 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 433 | Routing Locator 3... | 434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 EID AFI is the AFI of the EID. Priority N and Weight N, and AFI N 437 are associated with Routing Locator N. There will always be at least 438 one routing locator. The minimum record size for IPv4 is 16 bytes. 439 Each additional IPv4 RLOC increases the record size by 8 bytes. The 440 purpose of this format is to keep the database compact, but somewhat 441 easily read. The meaning of weight and priority are described in 442 [I-D.ietf-lisp]. The format of the AFI is specified by IANA the 443 "Address Family Numbers" registry, with the exception of how IPv6 EID 444 prefixes are stored. 446 NERD assumes that EIDs stored in the database are prefixes, and 447 therefore are accompanied with prefix lengths. In order to reduce 448 storage and transmission amounts for IPv6, only the necessary number 449 of bytes of an EID as specified by the prefix length are kept in the 450 record, rounded to the nearest four byte (word) boundary. For 451 instance, if the prefix length is /49, the nearest four-byte word 452 boundary would require that eight bytes are stored. IPv6 RLOCs are 453 represented as normal 128-bit IPv6 addresses. 455 3.2. Database Update Format 457 A database update contains a set of changes to an existing database. 458 Each AFI/EID/mask-length tuple may have zero or more RLOCs associated 459 with it. In the case where there are no RLOCs, the EID entry is 460 removed from the database. Records that contain EIDs and prefix 461 lengths that were not previously listed are simply added. Otherwise, 462 the old record for the EID and prefix length is replaced by the more 463 current information. The record format used by the a database update 464 is the same as described in Section 3.1. 466 4. NERD Distribution Mechanism 468 4.1. Initial Bootstrap 470 Bootstrap occurs when a router needs to retrieve the entire database. 471 It knows it needs to retrieve the entire database because either it 472 has none or an update too substantial to process, as might be the 473 case if a router has been out of service for a substantially lengthy 474 period of time. 476 To bootstrap the ITR appends the database name plus "/current/ 477 entiredb" to a Base Distribution URI and retrieves the file via HTTP. 478 More formally (using ABNF from [RFC5234]): 480 entire-db = base-uri dbname "/current/entiredb" 481 base-uri = uri ; From RFC 3986 482 dbname = DNS-ID ; from RFC-6125 484 For example,if the base distribution URI is "http://www.example.com/ 485 eiddb/", and assuming a database name of "nerd.arin.net", the ITR 486 would request 487 "http://www.example.com/eiddb/nerd.arin.net/current/entiredb". 488 Routers check the signature on the database prior to installing it, 489 and check that the database schema matches a schema they understand. 490 Once a router has a valid database it stores that database in some 491 sort of non-volatile memory (e.g., disk, flash memory, etc). 493 N.B., the host component for such URIs should not resolve to a LISP 494 EID, lest a circular dependency be created. 496 4.2. Retrieving Changes 498 In order to retrieve a set of database changes an ITR will have 499 previously retrieved the entire database. Hence it knows the current 500 version of the database it has. Its first step for retrieving 501 changes is to retrieve the current version number of the database. 502 It does so by appending "/current/version" to the base distribution 503 URI and database name and retrieving the file. Its format is text 504 and it contains the integer value of the current database version. 506 Once an ITR has retrieved the current version it compares the version 507 of its local copy. If there is no difference, then the router is up 508 to date and need take no further actions until it next checks. 510 If the versions differ, the router next sends a request for the 511 appropriate change file by appending "current/changes/" and the 512 textual representation of the version of its local copy of the 513 database to the base distribution URI. More formally: 515 db-version = base-uri dbname "/current/version" 516 db-curupdate = base-uri dbname "/current/changes/" old-version 517 old-version = 1*DIGIT 519 For example, if the current version of the database is 1105503 and 520 router's version is 1105500, and the base URI and database name are 521 the same as above, the router would first request "http:// 522 www.example.com/eiddb/nerd.arin.net/current/version" to determine 523 that it is out of date, and to also learn the current version. It 524 would then attempt to retrieve "http://www.example.com/eiddb/ 525 nerd.arin.net/current/changes/1105500". 527 The server may not have that change file, either because there are 528 too many versions between what the router has and what is current, or 529 because no such change file was generated. If the server has changes 530 from the routers version to any later version, the server issues an 531 HTTP redirect to that change file, and the router retrieves and 532 process it. More formally: 534 db-incupdate = base-uri dbname "/" newer-version 535 "/changes/" old-version 536 newer-version = 1*DIGIT 538 For example: 540 "http://www.example.com/eiddb/nerd.arin.net/1105450/changes/1105401" 541 would update a router from version 1105401 to 1105450. Once it has 542 done so, the router should then repeat the process until it has 543 brought itself up to date. 545 This begs the question: how does a router know to retrieve version 546 1105450 in our example above? It cannot. A redirect must be given 547 by the server to that URI when the router attempts to retrieve 548 differences from the current version, say, 1105503. 550 While it is unlikely that database versions would wrap, as they 551 consists of 32 bit integers, should the event occur, ITRs should 552 attempt first to retrieve a change file when their current version 553 number is within 10,000 of 2^32 and they see a version available that 554 is less than 10,000. Barring the availability of a change file, the 555 ITR can still assume that the database version has wrapped and 556 retrieve a new copy. It may be safer in future work to include 557 additional wrap information or a larger field to avoid having to use 558 any heuristics. 560 5. Analysis 562 We will start our analysis by looking at how much data will be 563 transferred to a router during bootstrap conditions. We will then 564 look at the bandwidth required. Next we will turn our concerns to 565 servers. Finally we will ponder the effect of providing only 566 changes. 568 In the analysis below we treat the overhead of the database header as 569 insignificant (because it is). The analysis should be similar, 570 whether a single database or multiple databases are employed, as we 571 would assume that no entry would appear more than once. 573 5.1. Database Size 575 By its very nature the information to be transported is relatively 576 static and is specifically designed to be topologically insensitive. 577 That is, every ITR is intended to have the same set of RLOCs for a 578 given EID. While some processing power will be necessary to install 579 a table, the amount required should be far less than that of a 580 routing information database because the level of entropy is intended 581 to be lower. 583 For purposes of this analysis, we will assume that the world has 584 migrated to IPv6, as this increases the size of the database, which 585 would be our primary concern. However, to mitigate the size 586 increase, we have limited the size of the prefix transmitted. For 587 purposes of this analysis, we shall assume an average prefix length 588 of 64 bits. 590 Based on that assumption, Section 3.1 states that mapping information 591 for each EID/Prefix includes a group of RLOCs, each with an 592 associated priority and weight, and that a minimum record size with 593 IPv6 EIDs with at least one RLOC is 30 bytes uncompressed. Each 594 additional IPv6 RLOC costs 20 bytes. 596 +-----------+--------+--------+---------+ 597 | 10^n EIDs | 2 RLOC | 4 RLOC | 8 RLOC | 598 +-----------+--------+--------+---------+ 599 | 4 | 500 KB | 900 KB | 1.70 MB | 600 | 5 | 5.0 MB | 9.0 MB | 17.0 MB | 601 | 6 | 50 MB | 90 MB | 170 MB | 602 | 7 | 500 MB | 900 MB | 1.70 GB | 603 | 8 | 5.0 GB | 9.0 GB | 17.0 GB | 604 +-----------+--------+--------+---------+ 606 Database size for IPv6 routes with average prefix length = 64 bits 608 Entries in the above table are derived as follows: 610 E * (30 + 20 * (R - 1 )) 612 where E = number of EIDs (10^n), R = number of RLOCs per EID. 614 Our scaling target is to accommodate 10^8 multihomed systems, which 615 is one order magnitude greater than what is discussed in [CARP07]. 616 At 10^8 entries, a device could be expected to use between 5 and 17 617 gigabytes of RAM for the mapping. No matter the method of 618 distribution, any router that sits in the core of the Internet would 619 require near this amount of memory in order to perform the ITR 620 function. Large enterprise ETRs would be similarly strained, simply 621 due to the diversity of of sites that communicate with one another. 622 The good news is that this is not our starting point, but rather our 623 scaling target, a number that we intend to reach by the year 2050. 624 Our starting point is more likely in the neighborhood of 10^4 or 10^5 625 EIDs, thus requiring between 500KB and 17 MB. 627 5.2. Router Throughput Versus Time 629 +-------------------+---------+--------+---------+-------+ 630 | Table Size (10^N) | 1mb/s | 10mb/s | 100mb/s | 1gb/s | 631 +-------------------+---------+--------+---------+-------+ 632 | 6 | 8 | 0.8 | 0.08 | 0.008 | 633 | 7 | 80 | 8 | 0.8 | 0.08 | 634 | 8 | 800 | 80 | 8 | 0.8 | 635 | 9 | 8,000 | 800 | 80 | 8 | 636 | 10 | 80,000 | 8,000 | 800 | 80 | 637 | 11 | 800,000 | 80,000 | 8,000 | 800 | 638 +-------------------+---------+--------+---------+-------+ 640 Number of seconds to process NERD 642 The length of time it takes to receive the database is significant in 643 models where the device acquires the entire table. During this 644 period of time, either the router will be unable to route packets 645 using LISP or it must use some sort of query mechanism for specific 646 EIDs as the rest it populates its table through the transfer. Table 647 2 shows us that at our scaling target, the length of time it would 648 take for a router using 1 mb/s of bandwidth is about 80 seconds. We 649 can measure the processing rate in small numbers of hours for any 650 transfer speed greater than that. The fastest processing time shows 651 us as taking 8 seconds to process an entire table of 10^9 bytes and 652 80 seconds for 10^10 bytes. 654 5.3. Number of Servers Required 656 As easy as it may be for a router to retrieve, the aggregate 657 information may be difficult for servers to transmit, assuming the 658 information is transmitted in aggregate (we'll revisit that 659 assumption later). 661 +----------------+--------------+-------------+----------+----------+ 662 | # Simultaneous | 10 Servers | 100 Servers | 1,000 | 10,000 | 663 | Requests | | | Servers | Servers | 664 +----------------+--------------+-------------+----------+----------+ 665 | 100 | 720 | 72 | 72 | 72 | 666 | 1,000 | 7,200 | 720 | 72 | 72 | 667 | 10,000 | 72,000 | 7,200 | 720 | 72 | 668 | 100,000 | 720,000 | 72,000 | 7,200 | 720 | 669 | 1,000,000 | 7,200,000 | 720,000 | 72,000 | 7,200 | 670 | 10,000,000 | 72,000,000 | 7,200,000 | 720,000 | 72,000 | 671 +----------------+--------------+-------------+----------+----------+ 673 Retrieval time per number of servers in seconds. Assumes average 674 10^8 entries with 4 RLOCs per EID and that each server has access to 675 1gb/s and 100% efficient use of that bandwidth and no compression. 677 Entries in the above table were generated using the following method: 679 For 10^8 entries with four RLOCs per EID, the table size is 9.0GB, 680 per our previous table. Assume 1 Gb/s transfer rates and 100% 681 utilization. Protocol overhead is ignored for this exercise. Hence 682 a single transfer X takes 48 seconds and can get no faster. 684 With this in mind, each entry is as follows: 686 max(1X,N*X/S) 688 where N=number of transfers, X = 72 seconds, 689 and S = number of servers. 691 If we have a distribution model which every device must retrieve the 692 mapping information upon start, Table 3 shows the length of time in 693 seconds it will take for a given number of servers to complete a 694 transfer to a given number of devices. This table says, as an 695 example, that it would take 72,000 seconds (20 hours) for one million 696 ITRs to simultaneously retrieve the database from one thousand 697 servers, assuming equal load distribution. Should a cold start 698 scenario occur, this number should be of some concern. Hence it is 699 important to take some measures both to avoid such a scenario, and to 700 ease the load should it occur. The primary defense should be for 701 ITRs to first attempt to retrieve their databases from their peers or 702 upstream providers. Secondary defenses could include data sanity 703 checks within ITRs, with agreed norms for how much the database 704 should change in any given update or over any given period of time. 705 As we will see below, dissemination of changes is considerably less 706 volume. 708 +----------------+-------------+---------------+----------------+ 709 | % Daily Change | 100 Servers | 1,000 Servers | 10,000 Servers | 710 +----------------+-------------+---------------+----------------+ 711 | 0.1% | 300 | 30 | 3 | 712 | 0.5% | 1500 | 150 | 15 | 713 | 1% | 3000 | 300 | 30 | 714 | 5% | 15,000 | 1500 | 150 | 715 | 10% | 30,000 | 3000 | 300 | 716 +----------------+-------------+---------------+----------------+ 718 Assuming 10 million routers and a database size of 9GB, resulting 719 transfer times for hourly updates are shown in seconds, given number 720 of servers and daily rate of change. Note that when insufficient 721 resources are devoted to servers, an unsustainable situation arises 722 where updates for the next batch would begin prior to the completion 723 of the current batch. 725 This table shows us that with 10,000 servers the average transfer 726 time with 1Gb/s links for 10,000,000 routers will be 300 seconds with 727 10% daily change spread over 24 hourly updates. For a 0.1% daily 728 change, that number is 3 seconds for a database of size 9.0GB. 730 The amount of change goes to the purpose of LISP. If its purpose is 731 to provide effective multihoming support to end customers, then we 732 might anticipate relatively few changes. If, on the other, service 733 providers attempt to make use of LISP to provide some form of traffic 734 engineering, we can expect the same data to change more often. We 735 can probably not conclude much in this regard without additional 736 operational experience. The one thing we can say is that different 737 applications of the LISP protocol may require new and different 738 distribution mechanisms. Such optimization is left for another day. 740 5.4. Security Considerations 742 Whichever the answer to our previous question, we must consider the 743 security of the information being transported. If an attacker can 744 forge an update or tamper with the database, he can in effect 745 redirect traffic to end sites. Hence, integrity and authenticity of 746 the NERD is critical. In addition, a means is required to determine 747 whether a source is authorized to modify a given database. No data 748 privacy is required. Quite to the contrary, this information will be 749 necessary for any ITR. 751 The first question one must ask is who to trust to provide the ITR a 752 mapping. Ultimately the owner of the EID prefix is most 753 authoritative for the mapping to RLOCs. However, were all owners to 754 sign all such mappings, ITRs would need to know which owner is 755 authorized to modify which mapping, creating a problem of O(N^2) 756 complexity. 758 We can reduce this problem substantially by investing some trust in a 759 small number of entities that are allowed to sign entries. If 760 authority manages EIDs much the same way a domain name registrar 761 handles domains, then the owner of the EID would choose a database 762 authority she or he trusts, and ITRs must trust each such authority 763 in order to map the EIDs listed by that authority to RLOCs. This 764 reduces the amount of management complexity on the ETR to retaining 765 knowledge of O(#authorities), but does require that each authority 766 establish procedures for authenticating the owner of an EID. Those 767 procedures needn't be the same. 769 There are two classic methods to ensure integrity of data: 771 o secure transport of the source of the data to the consumer, such 772 as Transport Layer Security (TLS) [RFC4346]; and 774 o provide object level security. 776 These methods are not mutually exclusive, although one can argue 777 about the need for the former, given the latter. 779 In the case of TLS, when it is properly implemented, the objects 780 being transported cannot easily be modified by interlopers or so- 781 called men in the middle. When data objects are distributed to 782 multiple servers, each of those servers must be trusted. As we have 783 seen above, we could have quite a large number of servers, thus 784 providing an attacker a large number of targets. We conclude that 785 some form of object level security is required. 787 Object level security involves an authority signing an object in a 788 way that can easily be verified by a consumer, in this case a router. 789 In this case, we would want the mapping table and any incremental 790 update to be signed by the originator of the update. This implies 791 that we cannot simply make use of a tool like CVS [CVS]. Instead, 792 the originator will want to generate diffs, sign them, and make them 793 available either directly or through some sort of content 794 distribution or peer to peer network. 796 5.4.1. Use of Public Key Infrastructures (PKIs) 798 X.509 provides a certificate hierarchy that has scaled to the size of 799 the Internet. The system is most manageable when there are few 800 certificates to manage. The model proposed in this memo makes use of 801 one current certificate per database authority. The two pieces of 802 information necessary to verify a signature, therefore, are as 803 follows: 805 o the certificate of the database authority, which can be provided 806 along with the database; and 808 o the certificate authority's certificate. 810 The latter two pieces of information must be very well known and must 811 be configured on each ITR. It is expected that both would change 812 very rarely, and it would not be unreasonable for such updates to 813 occur as part of a normal OS release process. 815 The tools for both signing and verifying are readily available. 816 OpenSSL [1] provides tools and libraries for both signing and 817 verifying. Other tools commonly exist. 819 Use of PKIs is not without implementation, operational complexity or 820 risk. The following risks and mitigations are identified with NERD's 821 use of PKIs: 823 If a NERD database authority private key is exposed: 825 In this case an attacker could sign a false database update, 826 either redirecting traffic, or otherwise causing havoc. In this 827 case, the NERD database administrator must revoke its existing key 828 and issue a new one. The certificate is added to a certificate 829 revocation list (CRL), which may be distributed with both this and 830 other databases, as well as through other channels. Because this 831 event is expected to be rare, and the number of database 832 authorities is expected to be small, a CRL will be small. When a 833 router receives a revocation, it checks it against its existing 834 databases, and attempts to update the one that is revoked. This 835 implies that prior to issuing the revocation, the database 836 authority would sign an update with the new key. Routers would 837 discard updates they have already received that were signed after 838 the revocation was generated. If a router cannot confirm that 839 whether the authority's certificate was revoked before or after a 840 particular update, it will retrieve a fresh new copy of the 841 database with a valid signature. 843 The private key associated with a CA in the chain of trust of the Authority's certificate is compromised: 845 In this case, it becomes possible for an attacker to masquerade as 846 the database authority. To ameliorate damage, the database 847 authority revokes its certificate and get a new certificate issued 848 from a CA that is not compromised. Once it has done so, the 849 previous procedure is followed. The compromised certificate can 850 be removed during the normal operating system upgrade cycle. In 851 the case of the root authority, the situation could be more 852 serious. Updates to the OS in the IRT need to be validated prior 853 to installation. One possible method of doing this is provided in 854 [RFC4108]. Trust Anchors are assumed to be updated as part of an 855 OS update, implementers should consider using a key other than the 856 trust anchor for validating OS updates. 858 An algorithm used if either the certificate or the signature is cracked: 860 This is a catastrophic failure and the above forms of attack 861 become possible. The only mitigation is to make use of a new 862 algorithm. In theory this should be possible, but in practice has 863 proved very difficult. For this reason, additional work is 864 recommended to make alternative algorithms available. 866 The Database Authority loses its key or disappears: 868 In this case nobody can update the existing database. There are 869 few programmatic mitigations. If the database authority places 870 its private keys and suitable amounts of information escrow, under 871 agreed upon circumstances, such as no updates for three days, for 872 example, the escrow agent would release the information to a party 873 competent of generating a database update. 875 5.4.2. Other Risks 877 Because this specification does not require secure transport, if an 878 attacker prevents updates to an ITR for the purposes of having that 879 ITR continue to use a compromised ETR, the ITR could continue to use 880 an old version of the database without realizing a new version has 881 been made available. If one is worried about such an attack, a 882 secure channel such as SSL to a secure chain back to the database 883 authority should be used. It is possible that after some operational 884 experience, later versions of this format will contain additional 885 semantics to address this attack. SSL would also prevent attempts 886 spoof false database versions on the server. 888 As discussed above, substantial risk would be a cold start scenario. 889 If an attacker found a bug in a common operating system that allowed 890 it to erase an ITR's database, and was able to disseminate that bug, 891 the collective ability of ITRs to retrieve new copies of the database 892 could be taxed by collective demand. The remedy to this is for 893 devices to share copies of the database with their peers, thus making 894 each potential requester a potential service. 896 6. Why not use XML? 898 Many objects these days are distributed as either XML pages or 899 something derived as XML [W3C.REC-xml11-20040204], such as SOAP [W3C 900 .REC-soap12-part1-20070427],[W3C.REC-soap12-part2-20070427]. Use of 901 such well known standards allows for high level tools and library 902 reuse. XML's strength is extensibility. Without a doubt XML would 903 be more extensible than a fixed field database. Why not, then, use 904 these standards in this case? The greatest concern the author had 905 was compactness of the data stream. In as much as this mechanism is 906 used at all in the future, so long as that concern could be 907 addressed, and so long as signatures of the database can be verified, 908 XML probably should be considered. 910 7. Other Distribution Mechanisms 912 We now consider various different mechanisms. The problem of 913 distributing changes in various databases is as old as databases. 914 The author is aware of two obvious approaches that have been well 915 used in the past. One approach would be the wide distribution of CVS 916 repositories. However, for reasons mentioned in Section 5.4.1, CVS 917 is insufficient to the task. 919 The other tried and true approach is the use of periodic updates in 920 the form of messages. Good old NNTP [RFC3977] itself provides two 921 separate mechanisms (one push and another pull) to provide a coherent 922 update process. This was in fact used to update molecular biology 923 databases [gb91] in the early 1990s. Netnews offers a way to 924 determine whether articles with specified Article-Ids have been 925 received. In the case where the mapping file source of authority 926 wishes to transmit updates, it can sign a change file and then post 927 it into the network. Routers merely need to keep a record of article 928 ids that it has received. Netnews systems have years ago handled far 929 greater volume of traffic than we envision. [2] Initially this is 930 probably overkill, but it may not be so later in this process. Some 931 consideration should be given to a mechanism known to widely 932 distribute vast amounts of data, as instantaneously either the sender 933 or the receiver wishes. 935 To attain an additional level of hierarchy in the distribution 936 network, service providers could retrieve information to their own 937 local servers, and configure their routers with the host portion of 938 the above URI. 940 Another possibility would be for providers to establish an agreement 941 on a small set of anycast addresses for use for this purpose. There 942 are limitations to the use of anycast, particularly with TCP. In the 943 midst of a routing flap anycast address can become all but unusable. 944 Careful study of such a use as well as appropriate use of HTTP 945 redirects is expected. 947 7.1. What About DNS as a mapping retrieval model? 949 It has been proposed that a query/response mechanism be used for this 950 information, and that specifically the domain name system (DNS) 951 [RFC1034] be used. The previous models do not preclude the DNS. DNS 952 has the advantage that the administrative lines are well drawn, and 953 that the ID/RLOC mapping is likely to appear very close to these 954 boundaries. DNS also has the added benefit that an entire 955 distribution infrastructure already exists. There are, however, some 956 problems that could impact end hosts when intermediate routers make 957 queries, some of which were first pointed out in [RFC1383]: 959 o Any query mechanism offers an opportunity for a resource attack if 960 an attacker can force the ITR to query for information. In this 961 case, all that would be necessary would be for a "botnet" (a group 962 of computers that have been compromised and used as vehicles to 963 attack others) to ping or otherwise contact via some normal 964 service hosts that sit behind the ETR. If the botnet hosts 965 themselves are behind ETRs, the victim's ITR will need to query 966 for each and every one of them, thus becoming part of a classic 967 reflector attack. 969 o Packets will be delayed at the very least, and probably dropped in 970 the process of a mapping query. This could be at the beginning of 971 a communication, but it will be impossible for a router to 972 conclude with certainty that this is the case. 974 o The DNS has a backoff algorithm that presumes that applications 975 are making queries prior to the beginning of a communication. 976 This is appropriate for end hosts who know in fact when a 977 communication begins. An end user may not enjoy a router waiting 978 seconds for a retry. 980 o While the administrative lines may appear to be correct, the 981 location of name servers may not be. If name servers sit within 982 PI address space, thus requiring LISP to reach, a circular 983 dependency is created. This is precisely where many enterprise 984 name servers sit. The LISP experiment should not predicate its 985 success on relocation of such name servers. 987 Never-the-less, DNS may be able to play a role in providing the 988 enterprise control over the mapping of its EIDs to RLOCs. Posit a 989 new DNS record "EID2RLOC". This record is used by the authority to 990 collect and aggregate mapping information so that it may be 991 distributed through one of the other mechanisms. As an example: 993 $ORIGIN 0.10.PI-SPACE. 994 128 EID2RLOC mask 23 priority 10 weight 5 172.16.5.60 995 EID2RLOC mask 23 priority 15 weight 5 192.168.1.5 997 In the above figure network 10.0.128/23 would delegated to some end 998 system, say EXAMPLE.COM. They would manage the above zone 999 information. This would allow a DNS mechanism to work, but it would 1000 also allow someone to aggregate the information and distribution a 1001 table. 1003 7.2. Use of BGP and LISP+ALT 1005 Border Gateway Protocol (BGP) [RFC4271] is currently used to 1006 distribute inter-domain routing throughout the Internet. Why not, 1007 then, use BGP to distribute mapping entries, or provide a rendezvous 1008 mechanism to initialize mapping entries? In fact this is precisely 1009 what LISP+ALT [I-D.ietf-lisp-alt] accomplishes, using a completely 1010 separate topology from the normal DFZ. It does so using existing 1011 code paths and expertise. The alternate topology also provides an 1012 extremely accurate control path from ITRs to ETRs, whereas NERD's 1013 operational model requires an optimistic assumption and control plane 1014 functionality to cycle through unresponsive ETRs in an EID prefix's 1015 mapping entry. The memory scaling characteristics of LISP+ALT are 1016 extremely attractive because of expected strong aggregation, whereas 1017 NERD makes almost no attempt at aggregation. 1019 A number of key deployment issues are left open. The principle issue 1020 is whether it is deemed acceptable for routers to drop packets 1021 occasionally while mapping information is being gathered. This 1022 should be the subject of future research for ALT, as it was a key 1023 design goal of NERD to avoid such a situation. 1025 7.3. Perhaps use a hybrid model? 1027 Perhaps it would be useful to use both a prepopulated database such 1028 as NERD and a query mechanism (perhaps LISP+ALT, LISP-CONS [I-D 1029 .meyer-lisp-cons], or DNS) to determine an EID/RLOC mapping. One 1030 idea would be to receive a subset of the mappings, say, by taking 1031 only the NERD for certain regions. This alleviates the need to drop 1032 packets for some subset of destinations under the assumption that 1033 one's business is localized to a particular region. If one did not 1034 have a local entry for a particular EID one would then make a query. 1036 One approach to using DNS to query live would be to periodically walk 1037 "interesting" portions of the network, in search of relevant 1038 records, and caching them to non-volatile storage. While preventing 1039 resource attacks, the walk itself could be viewed as an attack, if 1040 the algorithm was not selective enough about what it thought was 1041 interesting. A similar approach could be applied to LISP+ALT or 1042 LISP-CONS by forcing a data-driven Map Reply for certain sites. 1044 8. Deployment Issues 1046 While LISP and NERD are intended as experiments at this point, it is 1047 already obvious one must give serious consideration to circular 1048 dependencies with regard to the protocols used and the elements 1049 within them. 1051 8.1. HTTP 1053 In as much as HTTP depends on DNS, either due to the authority 1054 section of a URI, or due to the configured base distribution URI, 1055 these same concerns apply. In addition, any HTTP server that itself 1056 makes use of provider independent addresses would be a poor choice to 1057 distribute the database for these exact same reasons. 1059 One issue with using HTTP is that it is possible that a middlebox of 1060 some form, such as a cache, may intercept and process requests. In 1061 some cases this might be a good thing. For instance, if a cache 1062 correctly returns a database, some amount of bandwidth is conserved. 1063 On the other hand, if the cache itself fails to function properly for 1064 whatever reason, end to end connectivity could be impaired. For 1065 example, if the cache itself depended on the mapping being in place 1066 and functional, a cold start scenario might leave the cache 1067 functioning improperly, in turn providing routers no means to update 1068 their databases. Some care must be given to avoid such 1069 circumstances. 1071 9. Open Questions 1072 Do we need to discuss reachability in more detail? This was clearly 1073 an issue at the IST-RING workshop. There are two key issues. First, 1074 what is the appropriate architectural separation between the data 1075 plane and the control plane? Second, is there some specific way in 1076 which NERD impacts the data plane? 1078 Should we specify a (perhaps compressed) tarball that treads a middle 1079 ground for the last question, where each update tarball contains both 1080 a signature for the update and for the entire database, once the 1081 update is applied. 1083 Should we compress? In some initial testing of databases with 1, 5, 1084 and 10 million IPv4 EIDs and a random distribution of IPv4 RLOCs, the 1085 current format in this document compresses down by a factor of 1086 between 35% and 36%, using Burrows-Wheeler block sorting text 1087 compression algorithm (bzip2). The NERD used random EIDs with prefix 1088 lengths varying from 19-29, with probability weighted toward the 1089 smaller masks. This only very roughly reflects reality. A better 1090 test would be to start with the existing prefixes found in the DFZ. 1092 10. Conclusions 1094 This memo has specified a database format, an update format, a URI 1095 convention, an update method, and a validation method for EID/RLOC 1096 mappings. We have shown that beyond the predictions of 10^8 EID- 1097 prefix entries, the aggregate database size would likely be at most 1098 17GB. We have considered the amount of servers to distribute that 1099 information and we have demonstrated the limitations of a simple 1100 content distribution network and other well known mechanisms. The 1101 effort required to retrieve a database change amounts to between 3 1102 and 30 seconds of processing time per hour at at today's gigabit 1103 speeds. We conclude that there is no need for an off box query 1104 mechanism today, and that there are distinct disadvantages for having 1105 such a mechanism in the control plane. 1107 Beyond this we have examined alternatives that allow for hybrid 1108 models that do use query mechanisms, should our operating assumptions 1109 prove overly optimistic. Use of NERD today does not foreclose use of 1110 such models in the future, and in fact both models can happily co- 1111 exist. 1113 Since the first draft of this document in 2007, portions of this work 1114 have been implemented. Future work should consider the size of 1115 fields, such as the version field, as well as key roll-over and 1116 revokation issues. As previously noted CMS is now widely deployed. 1117 Current work on DNS-based Authentication of Named Entities may 1118 provide a means to test authorization of a NERD provider to carry a 1119 specific prefix. [I-D.ietf-dane-protocol] 1121 We leave to future work how the list of databases is distributed, how 1122 BGP can play a role in distributing knowledge of the databases, and 1123 how DNS can play a role in aggregating information into these 1124 databases. 1126 We also leave to future work whether HTTP is the best protocol for 1127 the job, and whether the scheme described in this document is the 1128 most efficient. One could easily envision that when applied in high 1129 delay or high loss environments, a broadcast or multicast method may 1130 prove more effective. 1132 Speaking of multicast, we also leave to future work how multicast is 1133 implemented, if at all, either in conjunction or as an extension to 1134 this model. 1136 Finally, perhaps the most interesting future work would be to 1137 understand if and how NERD could be integrated with the LISP mapping 1138 server. [I-D.ietf-lisp-ms] 1140 11. IANA Considerations 1142 This memo makes no requests of IANA. 1144 12. Acknowledgments 1146 Dino Farinacci, Patrik Faltstrom, Dave Meyer, Joel Halpern, Jim 1147 Schaad, Dave Thaler, Mohamed Boucadair, Robin Whittle, Max Pritikin, 1148 Scott Brim, S. Moonesamy, and Stephen Farrel were very helpful with 1149 their reviews of this work. Thanks also to the participants of the 1150 Routing Research Group and the IST-RING workshop held in Madrid in 1151 December of 2007 for their incisive comments. The astute will notice 1152 a lengthy References section. This work stands on the shoulders of 1153 many others' efforts. 1155 13. References 1157 13.1. Normative References 1159 [I-D.ietf-lisp] 1160 Farinacci, D., Fuller, V., Meyer, D. and D. Lewis, 1161 "Locator/ID Separation Protocol (LISP)", Internet-Draft 1162 draft-ietf-lisp-22, February 2012. 1164 [ITU.X509.2000] 1165 International Telecommunications Union, "Information 1166 technology - Open Systems Interconnection - The Directory: 1167 Public-key and attribute certificate frameworks", ITU-T 1168 Recommendation X.509, ISO Standard 9594-8, March 2000. 1170 [RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform 1171 Resource Identifier (URI): Generic Syntax", STD 66, RFC 1172 3986, January 2005. 1174 [RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax 1175 Specifications: ABNF", STD 68, RFC 5234, January 2008. 1177 [RFC6125] Saint-Andre, P. and J. Hodges, "Representation and 1178 Verification of Domain-Based Application Service Identity 1179 within Internet Public Key Infrastructure Using X.509 1180 (PKIX) Certificates in the Context of Transport Layer 1181 Security (TLS)", RFC 6125, March 2011. 1183 13.2. Informative References 1185 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1186 Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext 1187 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1189 [RFC2315] Kaliski, B., "PKCS #7: Cryptographic Message Syntax 1190 Version 1.5", RFC 2315, March 1998. 1192 [RFC5652] Housley, R., "Cryptographic Message Syntax (CMS)", RFC 1193 5652, September 2009. 1195 [RFC3977] Feather, C., "Network News Transfer Protocol (NNTP)", RFC 1196 3977, October 2006. 1198 [RFC1034] Mockapetris, P., "Domain names - concepts and facilities", 1199 STD 13, RFC 1034, November 1987. 1201 [RFC1383] Huitema, C., "An Experiment in DNS Based IP Routing", RFC 1202 1383, December 1992. 1204 [RFC4271] Rekhter, Y., Li, T. and S. Hares, "A Border Gateway 1205 Protocol 4 (BGP-4)", RFC 4271, January 2006. 1207 [RFC4108] Housley, R., "Using Cryptographic Message Syntax (CMS) to 1208 Protect Firmware Packages", RFC 4108, August 2005. 1210 [RFC4346] Dierks, T. and E. Rescorla, "The Transport Layer Security 1211 (TLS) Protocol Version 1.1", RFC 4346, April 2006. 1213 [CARP07] Carpenter, B. R., "IETF Plenary Presentation: Routing and 1214 Addressing: Where we are today", March 2007. 1216 [CVS] Grune, R., Baalbergen, E., Waage, M., Berliner, B. and J. 1217 Polk, "CVS: Concurrent Versions System", November 1985. 1219 [gb91] Smith, R.H., Gottesman, Y., Hobbs, B., Lear, E., 1220 Kristofferson, D., Benton, D. and P.R. Smith, "A mechanism 1221 for maintaining an up-to-date GenBank database via Usenet 1222 ", CABIOS , April 1991. 1224 [W3C.REC-xml11-20040204] 1225 Paoli, J., Maler, E., Yergeau, F., Cowan, J., Bray, T. and 1226 C. Sperberg-McQueen, "Extensible Markup Language (XML) 1227 1.1", World Wide Web Consortium FirstEdition REC- 1228 xml11-20040204, February 2004, . 1231 [W3C.REC-soap12-part1-20070427] 1232 Hadley, M., Mendelsohn, N., Moreau, J., Karmarkar, A., 1233 Nielsen, H., Lafon, Y. and M. Gudgin, "SOAP Version 1.2 1234 Part 1: Messaging Framework (Second Edition)", World Wide 1235 Web Consortium Recommendation REC-soap12-part1-20070427, 1236 April 2007, . 1239 [W3C.REC-soap12-part2-20070427] 1240 Mendelsohn, N., Gudgin, M., Nielsen, H., Lafon, Y., 1241 Moreau, J., Hadley, M. and A. Karmarkar, "SOAP Version 1.2 1242 Part 2: Adjuncts (Second Edition)", World Wide Web 1243 Consortium Recommendation REC-soap12-part2-20070427, April 1244 2007, . 1247 [I-D.ietf-lisp-alt] 1248 Fuller, V., Farinacci, D., Meyer, D. and D. Lewis, "LISP 1249 Alternative Topology (LISP+ALT)", Internet-Draft draft- 1250 ietf-lisp-alt-10, December 2011. 1252 [I-D.ietf-dane-protocol] 1253 Hoffman, P. and J. Schlyter, "The DNS-Based Authentication 1254 of Named Entities (DANE) Protocol for Transport Layer 1255 Security (TLS)", Internet-Draft draft-ietf-dane- 1256 protocol-19, April 2012. 1258 [I-D.meyer-lisp-cons] 1259 Brim, S., "LISP-CONS: A Content distribution Overlay 1260 Network Service for LISP", Internet-Draft draft-meyer- 1261 lisp-cons-04, April 2008. 1263 [I-D.ietf-lisp-ms] 1264 Fuller, V. and D. Farinacci, "LISP Map Server Interface", 1265 Internet-Draft draft-ietf-lisp-ms-15, January 2012. 1267 Appendix A. Generating and verifying the database signature with 1268 OpenSSL 1270 As previously mentioned, one goal of NERD was to use off-the-shelf 1271 tools to both generate and retrieve the database. To many, PKI is 1272 magic. This section is meant to provide at least some clarification 1273 as to both the generation and verification process, complete with 1274 command line examples. Not included is how you get the entries 1275 themselves. We'll assume they exist, and that you're just trying to 1276 sign the database. 1278 To sign the database, to start with, you need a database file that 1279 has a database header described in Section 3. Block size should be 1280 zero, and there should be no PKCS#7 block at this point. You also 1281 need a certificate and its private key with which you will sign the 1282 database. 1284 The OpenSSL "smime" command contains all the functions we need from 1285 this point forth. To sign the database, issue the following command: 1287 openssl smime -binary -sign -outform DER -signer yourcert.crt \ 1288 -inkey yourcert.key -in database-file -out signature 1290 -binary states that no MIME canonicalization should be performed. 1291 -sign indicates that you are signing the file that was given as the 1292 argument to -in. The output format (-outform) is binary DER, and 1293 your public certificate is provided with -signer along with your key 1294 with -inkey. The signature itself is specified with -out. 1296 The resulting file "signature" is then copied into to PKCS#7 block in 1297 the database header, its size in bytes is recorded in the PKCS#7 1298 block size field, and the resulting file is ready for distribution to 1299 ITRs. 1301 To verify a database file, first retrieve the PKCS#7 block from the 1302 file by copying the appropriate number of bytes into another file, 1303 say "signature". Next, zero this field, and set the block size field 1304 to 0. Next use the "smime" command to verify the signature as 1305 follows: 1307 openssl smime -binary -verify -inform DER -content database-file 1308 -out /dev/null -in signature 1310 Openssl will return "Verification OK" if the signature is correct. 1311 OpenSSL provides sufficiently rich libraries to accomplish the above 1312 within the C programming language with a single pass. 1314 Appendix B. Changes 1316 This section to be removed prior to publication. 1318 o 06-08: editorial. Clarify sending diffs, 1320 o 05: Fix normative/informative references. Wordsmithing. 1322 o 04: Analysis change: IPv6 RLOCs are 128 bits. While they can be 1323 shortened to 64 bits, that involves substantial ETR changes and 1324 expenditure of IPv6 networks, which is probably unnecessary, and 1325 can be left as a later optimization. Added an option of 1326 independent operators. Processed all but two of Dino's comments. 1327 Addressed Scott's comments. Removed existing work analysis. 1328 Saving that for another day. Clarified OpenSSL Appendix. 1330 o 05: clean DOWN. reinsert some text for historical purposes. 1332 o 04: cleanup 1334 o 03: Change dbname to a domain name, indicate that is what is in 1335 the subject of the X.509 certificate, and list editorial changes, 1336 update acknowledgments. 1338 o 02: Incorporate some of Dave Thaler's comments. Add 1339 authentication block detail. Modify analysis to take IPv6 into 1340 account, along with a more realistic number of RLOCs per EID. Add 1341 some comments about potential risks of a cold start. Add S/MIME 1342 example as appendix A and take out old ToDo. Provide some amount 1343 of compression of IPv6 addresses by limiting their size to 1344 significant bytes rounded to a four byte word boundary. 1346 o 01: Massive spelling correction, URI example correction. 1348 o 00: Initial Revision. 1350 Author's Address 1352 Eliot Lear 1353 Cisco Systems GmbH 1354 Richtistrasse 7 1355 Wallisellen, CH-8304 1356 Switzerland 1358 Phone: +41 44 878 9200 1359 Email: lear@cisco.com