idnits 2.17.1 draft-jen-apt-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1032. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1043. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1050. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1056. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 18, 2007) is 6005 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-12) exists of draft-farinacci-lisp-05 Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group D. Jen 3 Internet-Draft M. Meisel 4 Intended status: Informational D. Massey 5 Expires: May 21, 2008 L. Wang 6 B. Zhang 7 L. Zhang 8 November 18, 2007 10 APT: A Practical Transit Mapping Service 11 draft-jen-apt-01.txt 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on May 21, 2008. 38 Copyright Notice 40 Copyright (C) The IETF Trust (2007). 42 Abstract 44 The size of the global routing table is a rapidly growing problem. 45 Several solutions have been proposed. These solutions commonly 46 divide the Internet into two address spaces, one for determining the 47 delivery location, and one to use during transit. Packets destined 48 for delivery addresses are tunneled through the default-free zone 49 (DFZ), which uses only transit addresses. For this process to work, 50 there must be a mapping service that can supply an appropriate 51 destination transit address for any given delivery address. We 52 present a design for such a mapping service. We adhere to a "do no 53 harm" design philosophy: maintain all desirable features of the 54 current architecture without negatively affecting its security or 55 reliability. Our design aims to minimize delay and prevent loss in 56 packet encapsulation, minimize the number of modifications to 57 existing hardware, minimize the number of new devices, and keep the 58 level of control traffic manageable. 60 Table of Contents 62 1. Requirements Notation . . . . . . . . . . . . . . . . . . . . 4 63 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 64 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 4. APT Overview and Requirements . . . . . . . . . . . . . . . . 6 66 5. The Mapping Service . . . . . . . . . . . . . . . . . . . . . 8 67 5.1. A Mapping Example . . . . . . . . . . . . . . . . . . . . 9 68 6. Multihoming Support . . . . . . . . . . . . . . . . . . . . . 11 69 6.1. Using Alternate ETRs During Failures . . . . . . . . . . . 12 70 6.1.1. Handling Taddr Prefix Failures . . . . . . . . . . . . 12 71 6.1.2. Handling Single-ETR Failures . . . . . . . . . . . . . 13 72 6.1.3. Handling TR-to-DN Link Failures . . . . . . . . . . . 13 73 7. Exchanging MapSets Between TNs . . . . . . . . . . . . . . . . 14 74 7.1. MapSet Dissemination via DM-BGP . . . . . . . . . . . . . 14 75 7.2. Regular MapSet Refresh . . . . . . . . . . . . . . . . . . 15 76 8. Security and Reliability . . . . . . . . . . . . . . . . . . . 15 77 8.1. Authenticating the Originator of Mapping Updates . . . . . 15 78 8.2. Detecting MapSet Misconfigurations . . . . . . . . . . . . 16 79 8.3. APT Control Messages . . . . . . . . . . . . . . . . . . . 17 80 9. Scalability through Recursion . . . . . . . . . . . . . . . . 17 81 10. Mapping Announcements . . . . . . . . . . . . . . . . . . . . 18 82 11. APT Header and Control Messages . . . . . . . . . . . . . . . 19 83 11.1. APT Header Fields . . . . . . . . . . . . . . . . . . . . 19 84 11.2. Cache Add Messages . . . . . . . . . . . . . . . . . . . . 20 85 11.3. Cache Drop Messages . . . . . . . . . . . . . . . . . . . 20 86 11.4. ETR Unreachable Messages . . . . . . . . . . . . . . . . . 20 87 11.5. DN Unreachable Messages . . . . . . . . . . . . . . . . . 21 88 11.6. The ETR-to-DN Link Failure Message Type . . . . . . . . . 21 89 12. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 21 90 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 91 14. Security Considerations . . . . . . . . . . . . . . . . . . . 21 92 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 21 93 15.1. Normative References . . . . . . . . . . . . . . . . . . . 21 94 15.2. Informative References . . . . . . . . . . . . . . . . . . 22 95 Appendix A. Open Issues . . . . . . . . . . . . . . . . . . . . . 22 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 97 Intellectual Property and Copyright Statements . . . . . . . . . . 25 99 1. Requirements Notation 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 103 document are to be interpreted as described in [RFC2119]. 105 2. Problem Statement 107 The unexpected, explosive growth of the Internet is causing a greater 108 and greater strain on its infrastructure. This problem has been 109 well-documented in [RAWS][AddrAlloc]. Several solutions have been 110 proposed to address this problem [CRIO][EFIT][EFIT-ID][LISP][SixOne] 111 the majority of which involve separating the Internet into two parts, 112 one for determining the delivery location, and one to use during 113 transit. Routers in transit space would only need to know how to 114 route to transit prefixes, which are stable and conducive to 115 topological aggregation. When a packet is sent from source delivery 116 address A to destination delivery address B, A's provider-edge router 117 (the ingress tunnel router, or "ITR", as defined in [LISP]) 118 encapsulates the packet and sends it through transit space to B's 119 provider-edge router (the egress tunnel router, or "ETR"). B's ETR 120 decapsulates the packet and forwards it to the appropriate recipient, 121 B. 123 When encapsulating a packet, A's ITR must somehow determine B's ETR's 124 transit address and include it in the outer header. In general, any 125 ITR must be able to map any given delivery address to a corresponding 126 ETR transit address for proper tunneling through transit space. This 127 illustrates the need for a mapping service that can provide this 128 address. The design details of this mapping service will play a 129 large part in determining the effectiveness of any proposed 130 implementation of a delivery/transit address space separation. The 131 mapping service also presents a new opportunity to enhance the 132 services currently offered by the Internet, which is further reason 133 to carefully consider how this service should be implemented. Should 134 mapping information be distributed via a push or a pull model? What 135 additional information, if any, should be distributed along with the 136 mapping information? Can we satisfy the mapping requirement without 137 impacting packet delivery quality? 139 Our answers to these questions are rooted in a "do no harm" design 140 philosophy: improve routing scalability without sacrificing any 141 desirable features in the current architecture or negatively 142 affecting its security and reliability. To this end, we present APT, 143 a practical transit mapping service designed with the following goals 144 in mind. 146 o Minimize delay and prevent loss in packet encapsulation. 148 o Minimize the number of devices that need to be modified to support 149 APT. 151 o Minimize the number of devices that will require additional 152 resources or complexity. 154 o Keep the design modular so that the method used to propagate 155 mapping information is independent from the method used to 156 retrieve mapping information for tunneling. 158 3. Terminology 160 Transit Network (TN) - An AS whose business is to provide packet 161 transport services for its customers. Transit networks provide 162 packet forwarding services for delivery networks (see definition 163 below). As a rule of thumb, if the AS appears in the middle of any 164 ASPATH in a BGP route today, it is considered a transit network. 166 Delivery Network (DN) - A network that is a source or destination of 167 IP packets, but forwards packets between neither TNs nor other 168 delivery networks. 170 Transit Space - The IP address space used by transit networks. We 171 will also use the term "transit space" to refer to the topological 172 area of the Internet where transit addresses are routable. 174 Delivery Space - The set of all IP address spaces used by delivery 175 networks. We will also use the term "delivery space" to refer to the 176 topological area of the Internet outside of transit space -- that is, 177 where only delivery addresses are routable. 179 Transit Address (Taddr) - A Taddr is an address in transit space. 181 Delivery Address (Daddr) - A Daddr is an address in delivery space. 183 Default Mapper - A new device required by APT. Each transit network 184 MUST have at least one default mapper. A default mapper maintains a 185 complete mapping table. In other words, given any Daddr, default 186 mappers can return a corresponding Taddr. To support the growing 187 trend towards multihoming, the mappings stored in default mappers 188 will map a Daddr prefix to a non-empty SET of destination Taddrs, all 189 of which are expected to have a direct connection to the DN. 191 Tunnel Router (TR) - All edge routers in a TN will become TRs. Like 192 ITRs and ETRs in LISP [LISP], TRs provide the encapsulation and 193 decapsulation services required for tunneling packets through transit 194 space. A TR has both ITR and ETR functionality, meaning that any TR 195 can perform both encapsulation and decapsulation of packets. To 196 properly encapsulate any given packet, TRs can query the default 197 mappers for mapping information. TRs also cache commonly used 198 MapRecs locally. Note that TR cache entries are NOT identical to the 199 mappings stored at default mappers (see the definitions of "MapSet" 200 and "MapRec" below). 202 APT Node - A general term referring to any device type introduced by 203 APT. This includes both default mappers and TRs. 205 MapSet - A MapSet contains a Daddr prefix and a non-empty SET of ETR 206 Taddrs associated with the prefix. MapSets also include related 207 information such as priority rankings for each of the ETRs in the 208 set. Default mappers store MapSets. 210 MapRec - A MapRec contains a Daddr prefix and any SINGLE ETR Taddr 211 associated with that prefix. Any MapRec is a subset of the complete 212 MapSet for its Daddr prefix. TRs store MapRecs along with an 213 associated TTL. A MapRec is removed from a TR's cache once its TTL 214 expires. 216 4. APT Overview and Requirements 218 This section is a comprehensive overview of the devices and protocols 219 introduced by APT. For explanations and justifications, see the 220 corresponding referenced sections. 222 Default Mapper Requirements (see Section 5) 224 o Default mappers must have enough storage space to store the full, 225 global mapping table and associated metadata. 227 o Every destination Taddr in a MapSet MUST have an associated time 228 before retry (TBR, see Section 6.1). 230 o Default mappers MUST keep track of the Taddrs of the TRs they 231 serve. 233 o Default mappers MUST examine the destination Taddr of incoming 234 packets for addresses other than their own. 236 TR Requirements (see Section 5) 238 o TRs MUST keep a small cache to hold recently-used MapRecs and 239 their TTLs. 241 o TRs MUST have a default route to their default mapper. 243 o TRs MUST be able to encapsulate and decapsulate IP-in-UDP packets 244 with an APT header (see Section 11). 246 Failover for Multihomed DNs (see Section 6.1) 248 o When a Taddr prefix is withdrawn via BGP (see Section 6.1.1) 250 * ITRs forward packets destined for unroutable Taddrs to their 251 default mapper. 253 * The default mapper forwards the packet to an alternate ETR if 254 one is available. 256 * The default mapper sends a Cache Add Message to the originating 257 ITR. 259 o When a TR becomes unreachable (see Section 6.1.2) 261 * Packets destined for the TR are intercepted by its default 262 mapper. 264 * The default mapper sets the TBR for the appropriate MapRec. 266 * The default mapper forwards TR-addressed packets to an 267 alternate ETR if one is available. 269 * The default mapper sends an ETR Unreachable packet to the ITR's 270 default mapper. 272 * The default mapper broadcasts a Cache Drop Message to its TRs. 274 * The ITR's default mapper sets the TBR for the appropriate 275 MapRec. 277 * The ITR's default mapper broadcasts a Cache Drop Message to its 278 TRs. 280 o When a DN becomes unreachable from its TR (see Section 6.1.3) 282 * The TR forwards packets destined for the DN to its default 283 mapper, setting the APT packet type to ETR-to-DN link failure 284 (see Section 11.1). 286 * The default mapper sets the TBR for the appropriate MapRec. 288 * The default mapper forwards the packet to an alternate ETR if 289 one is available. 291 * The default mapper sends a Delivery Network Unreachable packet 292 to the ITR's default mapper. 294 * The default mapper broadcasts a Cache Drop Message to its TRs. 296 * The ITR's default mapper sets the TBR for the appropriate 297 MapRec. 299 * The ITR's default mapper broadcasts a Cache Drop Message to its 300 TRs. 302 Mapping Dissemination 304 o Default mappers MUST sign updates with their TN's private key. 306 o Default mappers MUST verify the signature before processing or 307 forwarding MapSet updates (see Section 8). 309 o Default mappers MUST NOT remove or alter the signature when 310 forwarding the update. 312 o Default mappers MUST cryptographically sign control messages that 313 may need to travel between ASes. 315 o Default mappers MUST speak DM-BGP and peer with other default 316 mappers (see Section 7.1). 318 * DM-BGP is a separate instance of standard BGP that runs on a 319 different TCP port. 321 * Only default mappers speak DM-BGP. 323 * DM-BGP updates carry mapping updates in a new attribute type. 325 5. The Mapping Service 327 TRs serve as the gateway between delivery and transit space. When a 328 TR receives a packet from a DN that needs to be routed through 329 transit space, it maps the packet's destination Daddr to an 330 appropriate destination Taddr (the mapping lookup details are 331 presented below). The TR will then encapsulate the packet with a UDP 332 header containing an APT header followed by the original layer-3 333 packet as the UDP payload (see Section 11). The packet can then be 334 routed through transit space. 336 To minimize the latency introduced by encapsulation, APT seeks to 337 store mapping information as close to the ITR as possible. However, 338 the global mapping table is likely to grow very large over time. To 339 avoid undue memory requirements for ITRs while still keeping mapping 340 information within reach, we introduce the concept of default 341 mappers. 343 A TR does not need to store the entire global mapping table. 344 Instead, it queries a default mapper for mapping information and 345 caches recently used MapRecs. 347 Default mappers are the only devices in the network that need to 348 store the complete global mapping table. As we will see in the 349 following example, TRs only make use of default mappers in the event 350 of a cache miss. This means that, given sufficiently sized caches at 351 the TRs, network latency will not heavily depend upon default mapper 352 performance. Note that each TN need only have a single default 353 mapper, but may choose to deploy more to avoid a single point of 354 failure and to enhance overall performance. In the latter case, a TN 355 MAY choose to use anycast to reach one of the default mappers or use 356 multicast to reach all of them. 358 5.1. A Mapping Example 359 Below is a simple topology for demonstrative purposes. A and B are 360 DNs, each addressable via a single Daddr prefix, TN1 and TN2 are TNs, 361 ITR1, ETR1, and ETR2 are TRs, any node labeled "X" is a router, and 362 M1 and M2 are default mappers. A portion of the mapping table for M1 363 is shown. 365 ___ ___ 366 / A \ / B \_________ 367 \___/ \___/ | Delivery Space 368 - - - - -|- - - - - - - - - - - - - - - - -| - - - - -|- - - - - - - - - 369 .--+---. .--+---. | Transit Space 370 __-| ITR1 |-__ __-| ETR1 |-__ | 371 / '------' .`--. .--'. '------' .`--+--. 372 | T ____ | X |------------| X | T ____ | ETR2 | 373 | N | M1 | '-;-' '-:-' N | M2 | '-;----' 374 \ 1 '-/\-' / \ 2 '----' / 375 \_____/ \___/ \____________/ 376 _______/ \___________________ 377 | DN | TS Addr | Priority | 378 |----------|----------|----------| 379 | ... | ... | ... | 380 |----------|----------|----------| 381 | B | ETR1 | 10 | 382 | | ETR2 | 20 | 383 |----------|----------|----------| 384 | ... | ... | ... | 385 '--------------------------------' 387 Figure 1 389 In this section, we illustrate how TRs and default mappers interact 390 within a TN to properly tunnel packets through transit space. 392 In Figure 1, a node in network A sends a packet to a Daddr in network 393 B. When this packet arrives at ITR1, ITR1 looks up the destination 394 Daddr in its MapRec cache. If a matching prefix is present in its 395 cache, ITR1 simply encapsulates the packet with the corresponding 396 destination Taddr and sends it across transit space. If a matching 397 prefix is not present, ITR1 will send the packet through its default 398 mapper, M1. It does this by encapsulating the packet with the 399 (possibly anycast) address for its default mapper(s) as the 400 destination Taddr. 402 This packet will arrive at M1, the only default mapper in TN1. When 403 M1 receives the packet, it decapsulates the packet and examines the 404 destination Daddr. Since default mappers store the full, global 405 mapping table, a default mapper will always be able to encapsulate 406 the packet with a valid destination Taddr. All packets encapsulated 407 by a default mapper MUST contain the default mapper's Taddr as the 408 source address. 410 In addition to forwarding the packet to an appropriate TR (ETR1, in 411 this case), M1 also treats the incoming packet as an implicit request 412 from ITR1 for mapping information. M1 responds to ITR1 with a Cache 413 Add Message (see Section 11.2) containing a MapRec that maps B to 414 ETR1. This allows ITR1 to add this MapRec to its cache so that ITR1 415 can tunnel further packets destined for B directly to ETR1. The 416 MapRec also has an associated time to live (TTL) that is set by M1. 417 The TTL ensures that ITR1 will occasionally re-request this mapping 418 information from M1. At this time, if the mapping information has 419 changed in any way since ITR1's prior request, M1 can respond with an 420 updated MapRec. Without this TTL, ITR1's cached information may 421 become stale over time. 423 6. Multihoming Support 425 In the example above, the observant reader may have noted that B is 426 multihomed. That is, B can be reached through both ETR1 and ETR2. 427 Multihoming provides B with both enhanced reliability in case of a 428 connectivity failure and the flexibility to distribute incoming 429 traffic across different tunnel endpoints. 431 In accordance with our design goals, all of the logic for selecting a 432 tunnel endpoint for a multihomed DN is contained within default 433 mappers. Default mappers store full MapSets containing the addresses 434 of all ETRs for a given Daddr prefix, while TRs only store a single 435 MapRec per Daddr prefix. When a TR requests a MapRec for a 436 multihomed DN, it is up to the default mapper to decide which one to 437 return. 439 Many DNs will want to have some control over which tunnel endpoint is 440 used for incoming traffic. Therefore, each MapRec in a MapSet has an 441 associated priority value, which is made available to all default 442 mappers throughout the transit space (see Section 7). The number is 443 to be treated like a ranking -- an ETR with a lower priority value is 444 more preferable. 446 At the same time, a sending TN may have its own preference regarding 447 which of the ETRs to use for a given Daddr prefix. Default mappers 448 can use a combination of locally configured routing policies and 449 MapSet priority information to choose from the set of valid ETR 450 addresses. Going back to Figure 1, assume that ITR1 does not have a 451 MapRec for B in its cache. When A addresses a packet to B, ITR1 will 452 send the packet to M1. If M1 has no preference between ETR1 and 453 ETR2, it will examine the priority values in B's MapSet and select 454 ETR1, B's most preferred ETR. M1 forwards the packet to ETR1 and 455 returns the corresponding MapRec to ITR1, which stores the MapRec in 456 its cache. 458 In the case of a priority value tie, the default mapper can break the 459 tie by picking the ETR to which it has the shortest path. If some 460 ETRs are tied in terms of both lowest priority value and shortest 461 path, the default mapper is free to break the tie arbitrarily. The 462 address of the selected ETR will be used as the destination address 463 when encapsulating the packet. 465 We envision that DNs will be able to manipulate their incoming 466 traffic load by setting appropriate priority values in their MapSet. 467 A DN who wants load balancing can assign the same priority value to 468 all of his MapRecs. A DN who wants to have one TN as a primary 469 provider and another only as a backup can simply assign a higher 470 priority value to his ETR at his backup provider. 472 6.1. Using Alternate ETRs During Failures 474 When a network failure has rendered an ETR unable to perform its 475 duties, an affected multihomed user will expect his traffic to be 476 temporarily routed through an alternate ETR. There are three general 477 types of failures that would require an ITR to use an alternate ETR: 478 (1) an ITR may have discovered via BGP that it can no longer reach 479 the Taddr prefix containing the address of the intended ETR, (2) the 480 ETR itself may go down or lose connectivity, and (3) the link between 481 a DN and its TR may be down, a new problem introduced by the 482 tunneling architecture. This section will explain how each type of 483 failure is handled, using Figure 1 as a reference. We assume that, 484 at the time of failure, all TNs are using ETR1 to reach B. 486 To assist in handling these failures, default mappers store a time 487 before retry (TBR) for each MapRec. Normally, the TBR for each 488 MapRec is set to zero, indicating that it is usable. Any MapRec with 489 a non-zero TBR value is considered invalid. We will refer to the 490 action of setting a MapRec's TBR to a non-zero value as "invalidating 491 a MapRec." MapRecs that map to unroutable destinations are also 492 considered invalid. So long as a MapRec is invalid, default mappers 493 will not use this entry as a destination address or include it in 494 mapping responses. The role of the TBR in handling failures will 495 become clear in the explanations below. 497 6.1.1. Handling Taddr Prefix Failures 499 For failures of type (1), ITR1 has no route to ETR1. Assume a host 500 in network A attempts to send a packet to a host in network B. If 501 ITR1 does not have a MapRec for B in its cache, it will forward the 502 packet to M1 (see Section 5.1). If ITR1 does have a MapRec for B in 503 its cache, it will see that it has no route to ETR1, and forward the 504 packet to its default mapper, M1. M1 will also see that it has no 505 route to ETR1, and thus select the next-most-preferred ETR for B, 506 ETR2. If it has a route to ETR2, it sends the packet with ETR2 as 507 the destination Taddr and replies to ITR1 with the corresponding 508 MapRec. M1 can assign a relatively short TTL to the MapRec in its 509 response. Once this TTL expires, ITR1 will forward the next packet 510 for B to the default mapper, which will respond with the most- 511 preferred MapRec that is routable at that time. This allows ITRs to 512 quickly revert to using ETR1 once it becomes reachable again. 514 6.1.2. Handling Single-ETR Failures 516 In the second case, the Taddr prefix containing ETR1 is still 517 routable from ITR1, but ETR1 has failed or is otherwise unreachable. 518 Since this failure is confined to TN2, all routers in TN2 should be 519 able to detect that ETR1 is unreachable via TN2's IGP. In order to 520 prepare for this situation, M2 announces a very high-cost link to all 521 of the TRs it serves (in this case, ETR1 and ETR2) via IGP. When 522 ETR1 fails, since the normal IGP path to ETR1 will no longer be 523 valid, all packets addressed to ETR1 will be forwarded to M2 instead. 525 When M2 receives a data packet addressed to one of the TRs it serves 526 (ETR1, in this case), it will assume the TR is unreachable, 527 invalidate the corresponding MapRec, and broadcast a Cache Drop 528 Message (see Section 11.3) to all of the TRs it serves. Using the 529 default mapper address in the APT header (see Section 11), it will 530 also reply to the sender's default mapper (in this case, M1) with an 531 ETR Unreachable Message (see Section 11.4). M1 can then also 532 invalidate the corresponding MapRec and broadcast a Cache Drop 533 Message to its TRs. 535 In order to minimize packet losses, M2 should not simply drop data 536 packets addressed to ETR1. Instead, M2 should attempt to reroute the 537 packet to an alternate ETR, even if that ETR is in a different TN. 538 It can do this by simply decapsulating the packet, looking up the 539 MapSet for the Daddr prefix, and re-encapsulating the packet with a 540 valid ETR as the destination Taddr according to the normal ETR- 541 selection guidelines. 543 6.1.3. Handling TR-to-DN Link Failures 545 The final case involves a failure of the link connecting ETR1 to B. 546 When ETR1 discovers it cannot reach B, it will send packets destined 547 for B to its default mapper, M2, setting the APT message type to ETR- 548 to-DN Link Failure (see Section 11.6) when encapsulating the packet. 549 M2 will see that the packet's APT message type is ETR-to-DN Link 550 Failure, and handle this situation in the same way as situation 2 551 (see Section 6.1.2), except that the message it sends to M1 will be a 552 DN Unreachable Message (see Section 11.5) instead of an ETR 553 Unreachable Message. 555 DN Unreachable and ETR Unreachable Messages are handled the same way. 556 However, we have kept them as separate notification types in order to 557 allow for divergent behavior in the future. 559 7. Exchanging MapSets Between TNs 561 To avoid introducing latency or packet loss when encapsulating 562 packets, the default mappers must have all MapSets available locally. 563 In order for default mappers to store a full, global mapping table, 564 there must be some way for them to receive MapSets from other TNs. 565 However, only default mappers should receive MapSets. In this 566 section, we propose a method for MapSet dissemination. The APT 567 design in general does not depend on this particular method; it only 568 requires that SOME method exists for secure, up-to-date, lightweight 569 MapSet dissemination. 571 7.1. MapSet Dissemination via DM-BGP 573 MapSet dissemination can be accomplished using a separate BGP 574 instance that is only run between default mappers. We refer to this 575 new BGP instance as 'DM-BGP'. As a protocol, DM-BGP is identical to 576 BGP, but it serves a different purpose. DM-BGP is used to 577 disseminate MapSets, not as a reachability protocol. It is simply 578 run on a different TCP port and is only used by default mappers so as 579 not to affect the RIB-In of other nodes. 581 When a default mapper wishes to distribute his TN's mapping 582 information to other default mappers, he sends out a DM-BGP update 583 with the mapping information included as an optional, transitive BGP 584 attribute with a new type. The NLRI included MUST be a prefix that 585 uniquely identifies the source TN. When other default mappers 586 receive DM-BGP updates, they store this information in their MapSet 587 tables, replacing any existing MapSets. BGP policy knobs can still 588 be tuned as desired by each TN. Upon receiving mapping updates, TNs 589 can choose whether to forward the update to each of their peers, so 590 long as their actions are in accordance with the BGP protocol. 592 A default mapper may receive the same mapping update more than once. 593 This will occur when there is more than one DM-BGP path from the 594 source default mapper's TN to the receiving default mapper's TN. 595 Along with the mapping information, the new attribute should include 596 a sequence number to allow receivers to detect duplicate mapping 597 updates. Default mappers MUST regularly announce MapSets to the rest 598 of the network for all of the DNs to which their TN connects. As a 599 precaution, however, these DM-BGP updates should be infrequent and 600 rate-limited. 602 7.2. Regular MapSet Refresh 604 Regardless of the protocol used to disseminate MapSets, MapSets are 605 not transient data. In order for default mappers to prevent their 606 MapSet tables from strictly increasing in size without bound, they 607 must be able to remove stale MapSets. For this reason, each MapSet 608 entry MUST contain a time to live (TTL). A default mapper MAY remove 609 a MapSet from its table at any time after this TTL has expired. In 610 order to avoid premature removal from the global mapping table, 611 default mappers MUST (1) regularly re-announce all MapSets for DNs 612 they connect to and (2) set the TTL for each MapSet to no less than 613 three times their refresh interval. 615 8. Security and Reliability 617 Using DM-BGP to distribute mapping announcements guarantees that they 618 are only accepted from manually configured DM-BGP peers. This 619 ensures that mapping updates are no less secure than routing updates 620 are today. However, mapping updates have the potential to cause far 621 more damage; with no security measures in place, a mapping update 622 could direct ALL traffic for an entire Daddr prefix to an arbitrary 623 Taddr. APT strives to prevent attacks and misconfigurations from 624 having adverse effects outside of the TN in which they occur. 625 Therefore, mapping updates will require some level of security. 627 8.1. Authenticating the Originator of Mapping Updates 629 Our first step towards authenticating mapping updates is to 630 authenticate an update's originator. For this reason, each default 631 mapper MUST cryptographically sign the mapping data in any update it 632 originates. All default mappers within a single TN SHOULD use the 633 same key pair, but default mappers in different TNs MUST use 634 different key pairs. When a default mapper receives a mapping 635 update, it MUST verify this signature before processing or forwarding 636 the update. Default mappers MUST NOT remove or alter this signature 637 when forwarding the update. 639 Clearly, this scheme can only work if there is a secure way to 640 distribute all public keys to all default mappers. This should be a 641 relatively straightforward problem to solve. We describe one simple, 642 appropriate method for secure key distribution in a network of 643 manually configured peers in a separate document (forthcoming). 645 8.2. Detecting MapSet Misconfigurations 647 Though the scheme outlined in Section 8.1 allows for secure 648 authentication of the originator of a mapping update, it does not 649 guarantee the correctness of the data. Since DM-BGP peerings are 650 manually configured and therefore form a relatively closed network, 651 misconfigurations are far more likely than attacks to be the cause of 652 inaccurate mapping data. 654 The types of misconfigurations that could potentially be harmful are 655 those that result in one TN accidentally interfering with the MapSet 656 for a DN that it is not connected to. This can happen whenever a 657 provider accidentally announces a MapSet for the wrong Daddr prefix. 658 These types of accidental conflicts fall into three categories: (1) a 659 TN announces a MapSet for the wrong Daddr prefix when that prefix 660 already has a MapSet in the global mapping table, (2) a TN announces 661 a MapSet for a Daddr prefix that subsumes a longer Daddr prefix that 662 already has a MapSet, and (3) a TN announces a MapSet for a Daddr 663 prefix that is a subset of a shorter Daddr prefix that already has a 664 MapSet. 666 The first category of conflicts is the only one that we intend to 667 actively prevent. Clearly, the DN that owns a particular Daddr 668 prefix should be the ultimate authority for his mapping information. 669 However, DNs do not announce their MapSet to the network directly, 670 but rather through the TNs they connect to. In order to ensure a 671 mapping update for a Daddr prefix is approved by its rightful owner, 672 we must first include some sort of prefix owner identification in 673 each MapSet. To this end, we introduce a DN key field into each 674 mapping. This field SHOULD contain a cryptographically valid public 675 key, but it is not currently used as such. When a default mapper 676 receives a new MapSet that would replace an existing one, it only 677 needs to ensure that the DN key has not changed. (This scheme is 678 similar in spirit to the way that OpenSSH uses its 'known_hosts' 679 file.) Note that DN keys are different from the keys used by default 680 mappers to authenticate DM-BGP updates. 682 For the other two categories, it is less clear that such an 683 announcement is the result of a misconfiguration. It is possible, 684 for example, that the owner of a /16 Daddr prefix has resold some of 685 the /24 prefixes it contains to other DNs. In such a case, only the 686 administrators will know if the announcement is valid. It is for 687 this reason that (in the spirit of PHAS [PHAS]) we do not attempt to 688 prevent such changes, but only detect and notify interested parties. 689 Since legitimate MapSet changes are infrequent, notifying interested 690 parties of MapSet changes via e-mail is a perfectly viable option. 691 These notifications could also prove useful in debugging the mapping 692 service, or a particular TN's configuration. 694 8.3. APT Control Messages 696 APT never requires Cache Drop and Cache Add Messages to traverse AS 697 boundaries. Any such message that does traverse an AS boundary must 698 be an error or an attack. Therefore, TRs MUST ignore Cache Drop and 699 Cache Add messages with a source Taddr outside of their TN. Since 700 ISPs already generally drop packets from an external source when they 701 contain a local source address, this simple policy should be 702 sufficient to prevent TR cache poisoning, whether accidental or 703 intentional. 705 Since any APT control message that may need to travel between ASes 706 can also affect traffic flow, such control messages MUST be 707 cryptographically signed. This currently includes ETR Unreachable 708 Messages (see Section 11.4) and DN Unreachable Messages (see 709 Section 11.5). Recall that the infrastructure required to generate 710 and verify cryptographic signatures is already required for mapping 711 update dissemination (see Section 8.1). When a default mapper 712 receives such a control message, it MAY choose to verify this 713 signature. 715 9. Scalability through Recursion 717 It is conceivable that the global mapping table could eventually grow 718 large enough that it would no longer be possible to store it in a 719 single default mapper. Theoretically, the global mapping table could 720 grow to contain a separate MapSet for every Daddr prefix. In the 721 case of IPv6 prefixes, the total number of MapSets would be on the 722 order of 10^18, far more than we can expect to be able to store on a 723 single device. If the global mapping table were to approach such 724 gargantuan proportions, APT can simply be applied recursively. 726 In the recursive case, the terms "transit" and "delivery" are only 727 meaningful relative to a particular depth of recursion, or number of 728 times the packet has been encapsulated. We will refer to the non- 729 recursive deployment of APT as the global level (G). What we have up 730 until now referred to as delivery space and transit space are in fact 731 G delivery space and G transit space. 733 At one level of recursion, G transit space is split into two address 734 spaces: recursion depth 1 (R1) delivery space and R1 transit space. 735 R1 delivery space is just another name for G transit space. Which 736 name is used will depend on the context. R1 transit space can be 737 further split into two R2 spaces, and so on. Using this terminology, 738 all protocols and concepts in APT can be understood to apply 739 generally at any level of recursion. 741 This figure shows the layout of a packet while being tunneled at an 742 APT recursion depth of two. 744 ________________________________________ 745 | R2 transit header | 746 |--------------------------------------| 747 | R2 delivery a.k.a. R1 transit header | 748 |--------------------------------------| 749 | R1 delivery a.k.a. G transit header | 750 |--------------------------------------| 751 | G delivery header | 752 |--------------------------------------| 753 | | 754 | payload | 755 | | 756 |______________________________________| 758 Figure 2 760 10. Mapping Announcements 762 Each mapping announcement has the following fields: 764 o Address Type - This field specifies the type of Daddrs used in the 765 announcement. All Daddr prefixes in a single mapping announcement 766 MUST be of the same address type. Currently, this is expected to 767 be either IPv4 or IPv6, but other address types are also allowed. 769 o Total Length - This field specifies the total number of bytes used 770 by all MapSets in the announcement. Each mapping announcement can 771 contain MapSets for multiple prefixes, each with multiple MapRecs. 773 o Sequence Number - This field reflects the freshness of an update. 774 Default mappers can avoid processing updates with old sequence 775 numbers. 777 o Signature - The message should be cryptographically signed using 778 the private key of the sending default mapper. 780 These fields are followed by one or more MapSets. Each MapSet in the 781 announcement is described by the following fields: 783 o Daddr Prefix - This is the Daddr prefix for the MapSet. 785 o Time To Live (TTL) - This is the amount of time in hours that this 786 MapSet should persist in default mappers before being considered 787 obsolete and erased. This value MUST be set to at least three 788 times the sender's regular refresh interval. The TTL is specified 789 in hours to prevent misconfigurations from causing excessive 790 mapping updates. 792 o ETR Count - This is the total number of ETRs that the 793 corresponding Daddr prefix maps to. 795 o Each ETR in a MapSet is described by the following fields: 797 * Taddr - The address of this ETR. 799 * Priority - Priorities are arbitrary integers that only have 800 meaning in reference to each other. Taddrs with lower priority 801 values are considered more preferable. 803 * DN Public Key - This public key SHOULD uniquely identify the DN 804 that owns this MapSet. It can be used to help identify 805 configuration errors, and possibly for authoritative, 806 cryptographic authentication of MapSet data in the future. 808 11. APT Header and Control Messages 810 Delivery space packets are encapsulated with a UDP header by an ITR. 811 The UDP header should specify a well-known port reserved for APT, and 812 the UDP payload MUST begin with an APT header. For regular data, a 813 layer-3 header immediately follows the APT header. For other message 814 types, we describe the fields that follow below. 816 11.1. APT Header Fields 818 The APT header contains the following fields: 820 o Version - The version of APT that should be used to interpret the 821 header information. 823 o Tag - Extra field reserved for future use. 825 o Type - Determines the type of message being sent. Appropriate 826 values are as follows: 828 0: Regular Data 830 1: Cache Add (Section 11.2) 832 2: Cache Drop (Section 11.3) 833 3: ETR Unreachable (Section 11.4) 835 4: DN Unreachable (Section 11.5) 837 5: ETR-to-DN link failure (Section 11.6) 839 o Default Mapper Taddr - The address of the default mapper for the 840 ITR that generated this header. This is the Taddr where any 841 failure notifications from the destination TN will be sent. If 842 this header was generated by a default mapper, this field SHOULD 843 contain the same address as the source address in the 844 encapsulating IP header. 846 11.2. Cache Add Messages 848 Cache Add Messages are only sent by default mappers to TRs within 849 their own TNs, most notably in response to data packets. When a TR 850 receives a Cache Add Message, it simply adds the enclosed MapRec to 851 its cache, replacing any existing cache entry. 853 o Daddr Prefix - This is the Daddr prefix portion of the MapRec to 854 be added to the receiving TR's cache. 856 o ETR Taddr - This is the Taddr portion of the MapRec to be added to 857 the receiving TR's cache. It is the address of the ETR that can 858 reach the Daddr prefix in the previous field. 860 o TTL - The TTL specifies the amount of time in seconds before the 861 added cache entry should expire. Expired cache entries should be 862 deleted from the TR's cache. 864 11.3. Cache Drop Messages 866 Cache Drop Messages are only sent by default mappers to TRs within 867 their own TNs. When a TR receives a Cache Drop Message, it simply 868 removes the cache entry corresponding to the enclosed Daddr prefix 869 from its cache, if such an entry exists. 871 o Daddr Prefix - This is the Daddr prefix of the MapRec to be 872 dropped. 874 11.4. ETR Unreachable Messages 876 ETR Unreachable Messages are sent by default mappers to other default 877 mappers to notify them of failures. 879 o Transit Address - This is the Taddr of the ETR that cannot be 880 reached. 882 o Signature - The message should be cryptographically signed using 883 the private key of the sending default mapper. 885 11.5. DN Unreachable Messages 887 DN Unreachable Messages are sent by default mappers to other default 888 mappers to notify them of failures. 890 o Daddr Prefix - This is the Daddr prefix of the DN that cannot be 891 reached. 893 o Signature - The message should be cryptographically signed using 894 the private key of the sending default mapper. 896 11.6. The ETR-to-DN Link Failure Message Type 898 This message type is used by an ETR for two purposes: (1) to indicate 899 to its default mapper that its direct link to the DN for the enclosed 900 data packet is down, and (2) to preserve that data packet so that the 901 ETR's default mapper might deliver it to the DN by way of a different 902 ETR. 904 12. Incremental Deployment 906 Incremental deployment methods and incentives for APT will be 907 discussed in a separate draft (forthcoming). 909 13. IANA Considerations 911 This memo includes no request to IANA. 913 14. Security Considerations 915 Security considerations for APT are discussed in Section 8. 917 15. References 919 15.1. Normative References 921 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 922 Requirement Levels", BCP 14, RFC 2119, March 1997. 924 15.2. Informative References 926 [AddrAlloc] 927 Meng, X., Xu, Z., Zhang, B., Huston, G., Lu, S., and L. 928 Zhang, "IPv4 Address Allocation and BGP Routing Table 929 Evolution", ACM SIGCOMM Computer Communication Review 930 (CCR) special issue on Internet Vital Statistics, Volume 931 35, Issue 1, p71-80. 933 [CRIO] Zhang, X., Francis, P., Wang, J., and K. Yoshida, "Scaling 934 IP Routing with the Core Router-Integrated Overlay", Proc. 935 International Conference on Network Protocols , 11 2005. 937 [EFIT] Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Scalable 938 Routing System Design for Future Internet", SIGCOMM IPv6 939 Workshop , 8 2007. 941 [EFIT-ID] Massey, D., Wang, L., Zhang, B., and L. Zhang, "A Proposal 942 for Scalable Internet Routing and Addressing", Internet Dr 943 aft, http://www.ietf.org/internet-drafts/ 944 draft-wang-ietf-efit-00.txt, 2 2007. 946 [LISP] Farinacci, D., Fuller, V., Oran, D., and D. Meyer, 947 "Locator/ID Separation Protocol (LISP)", Internet Draft, h 948 ttp://www.ietf.org/internet-drafts/ 949 draft-farinacci-lisp-05.txt, 2007. 951 [PHAS] Lad, M., Massey, D., Pei, D., Wu, Y., Zhang, B., and L. 952 Zhang, "PHAS: A Prefix Hijack Alert System", USENIX 953 Security . 955 [RAWS] Meyer, D., Zhang, L., and K. Fall, "Report from the IAB 956 Workshop on Routing and Addressing", Internet Draft, http: 957 //www.ietf.org/internet-drafts/ 958 draft-iab-raws-report-02.txt, 2007. 960 [SixOne] Vogt, C., "Six/One: A Solution for Routing and Addressing 961 in IPv6", Internet Draft, http://www.ietf.org/ 962 internet-drafts/draft-vogt-rrg-six-one-00.txt. 964 Appendix A. Open Issues 966 MapSets contain a priority field for each ETR, but this does not 967 allow for uneven distribution of traffic across ETRs with the same 968 priority, e.g. a 75/25 split. To provide a mechanism for DNs to 969 request such traffic distributions, we should also include a weight 970 field for each ETR. 972 If a TN sends out inaccurate mapping announcements, other TNs can 973 identify and respond to the misbehaving source TN. However, there 974 are no preventative security measures in place. Is detection and 975 response enough of a security measure? 977 We are considering automating customer-DN-to-provider-TN mapping 978 updates. Under our current design, whenever a DN needs to update its 979 mapping information (it may add, subtract, or change providers, or 980 change its priority values), the DN must contact its provider TNs 981 offline and request that they announce the updated mapping 982 information. It is then up to the provider TNs to update the mapping 983 information. As we have seen with DNS updates, human involvement 984 introduces the possibility of human error and delay. We hope to 985 provide DNs with an automated way to manage their mapping 986 information. 988 Is it too much to ask ISPs to change all of their PE routers into 989 TRs? We suspect that TR implementation should involve only software 990 changes. Existing router hardware can do everything required by a 991 TR. Thus, we suspect the cost should be reasonable. 993 Authors' Addresses 995 Dan Jen 997 Email: jenster@cs.ucla.edu 999 Michael Meisel 1001 Email: meisel@cs.ucla.edu 1003 Dan Massey 1005 Email: massey@cs.colostate.edu 1007 Lan Wang 1009 Email: lanwang@memphis.edu 1011 Beichuan Zhang 1013 Email: bzhang@cs.arizona.edu 1014 Lixia Zhang 1016 Email: lixia@cs.ucla.edu 1018 Full Copyright Statement 1020 Copyright (C) The IETF Trust (2007). 1022 This document is subject to the rights, licenses and restrictions 1023 contained in BCP 78, and except as set forth therein, the authors 1024 retain all their rights. 1026 This document and the information contained herein are provided on an 1027 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1028 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1029 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1030 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1031 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1032 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1034 Intellectual Property 1036 The IETF takes no position regarding the validity or scope of any 1037 Intellectual Property Rights or other rights that might be claimed to 1038 pertain to the implementation or use of the technology described in 1039 this document or the extent to which any license under such rights 1040 might or might not be available; nor does it represent that it has 1041 made any independent effort to identify any such rights. Information 1042 on the procedures with respect to rights in RFC documents can be 1043 found in BCP 78 and BCP 79. 1045 Copies of IPR disclosures made to the IETF Secretariat and any 1046 assurances of licenses to be made available, or the result of an 1047 attempt made to obtain a general license or permission for the use of 1048 such proprietary rights by implementers or users of this 1049 specification can be obtained from the IETF on-line IPR repository at 1050 http://www.ietf.org/ipr. 1052 The IETF invites any interested party to bring to its attention any 1053 copyrights, patents or patent applications, or other proprietary 1054 rights that may cover technology that may be required to implement 1055 this standard. Please address the information to the IETF at 1056 ietf-ipr@ietf.org. 1058 Acknowledgment 1060 Funding for the RFC Editor function is provided by the IETF 1061 Administrative Support Activity (IASA).