idnits 2.17.1 draft-ietf-ngtrans-6bone-multi-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-23) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 13 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** There are 19 instances of too long lines in the document, the longest one being 1 character in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2374 (ref. 'AGGR') (Obsoleted by RFC 3587) ** Obsolete normative reference: RFC 1771 (ref. 'BGP') (Obsoleted by RFC 4271) ** Downref: Normative reference to an Informational RFC: RFC 2260 (ref. 'MULT') -- Possible downref: Non-RFC (?) normative reference: ref. 'SRCA' -- Possible downref: Non-RFC (?) normative reference: ref. 'RENUM' ** Obsolete normative reference: RFC 2462 (ref. 'ADDRC') (Obsoleted by RFC 4862) -- Possible downref: Non-RFC (?) normative reference: ref. 'MOB' Summary: 11 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Francis Dupont 2 INTERNET DRAFT GIE DYADE 3 Expires in December 1999 June 25. 1999 5 Multihomed routing domain issues for IPv6 aggregatable scheme 7 9 Status of this Memo 11 This document is an Internet Draft and is in full conformance with 12 all provisions of Section 10 of RFC 2026. 14 This document is an Internet-Draft. Internet-Drafts are working 15 documents of the Internet Engineering Task Force (IETF), its 16 areas, and its working groups. Note that other groups may also 17 distribute working documents as Internet-Drafts. 19 Internet-Drafts are draft documents valid for a maximum of six 20 months and may be updated, replaced, or obsoleted by other 21 documents at any time. It is inappropriate to use Internet- 22 Drafts as reference material or to cite them other than as 23 "work in progress." 25 The list of current Internet Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 Distribution of this memo is unlimited. 33 Abstract 35 This document exposes some issues for multihomed routing domains using 36 the aggregatable addressing and routing scheme. A routing domain is 37 multihomed when it uses two or more providers of the upper level. Most 38 of these issues are not specific to IPv6 but are consequences of the 39 addressing and routing scheme. 41 1. Introduction 43 The aggregatable addressing and routing scheme [AGGR] defines an IPv6 44 aggregatable global unicast address format for use in the Internet and 45 the associated routing. 47 The address assignment and allocation mechanism is fully hierarchical, 48 a prefix of a given level (ie. of a given length) denotes all the 49 destinations in the prefix ie. aggregates them. The customers of an 50 Internet service provider are in its prefix (as a consequence a 51 multihomed routing domain has several prefixes). 53 The routing is standard datagram routing, hop by hop, on destination 54 address only (as in IPv4). But it is a prefix routing, ie. forwarding 55 decisions are based on a "longest prefix match" algorithm on arbitrary 56 bit boundaries without any knowledge of the internal structure of 57 addresses. 59 When there are two routes for the same prefix with the same length 60 then the best is caught for the inter-domain routing protocol [BGP]: 62 o policy rules; 64 o shortest path, the path being the list of routing domains 65 to cross; 67 o protocol metric. 69 The aggregation idea is the bet that in most of the cases a 70 single-homed Internet service provider at a given level should know 71 (ie. has routes to) only: 73 o its upper provider (ie. a shorter prefix, used as a default) 74 if it is not a top-level provider; 76 o its customers (ie. longer routes in its prefix); 78 o some routes to other customers of its upper provider (ie. 79 sibling prefixes, at the same level). 81 With addresses this gives (with P1:P2/x for the concatenation of 82 prefixes P1 and P2 with the length x): 84 o T/t for the upper provider; 86 o T:P/t+p for the provider itself; 88 o T:P1/t+p1, T:P2/t+p2, ..., T:Pn/t+pn for siblings; 90 o T:P:C1/t+p+c1, T:P:C2/t+p+c2, ..., T:P:Cn/t+p+cn for customers. 92 The routing information for siblings is only needed for top-level 93 providers. For an other provider it is only an optimization 94 (ie. a backdoor) because any destination, including sibling, not 95 in its own prefix, is reachable through the upper provider. 97 Usual routing exchanges for P at prefix T:P/t+p are: 99 o from the upper provider the route to T/t which can be used as 100 a default (ie. <>/0); 102 o from a customer the route to T:P:C/t+p+c; 103 o from a sibling the route to T:Q/t+q; 105 o to anybody the route for T:P/t+p (and nothing else). 107 The scheme is with arrows for route (and traffic) exchange: 109 +-----+ 110 Upper Level | T | 111 +-----+ 112 | ^ 113 T/t | | T:P/t+p 114 V | 115 +-------+ +-----+ 116 | |------ T:P/t+p --->| | 117 Siblings | P | | Q | 118 | |<---- T:Q/t+q -----| | 119 +-------+ +-----+ 120 ^ | ^ | 121 | | | | 122 | | | +-------- T:P/t+p ----+ 123 | | | | 124 | | +---- T:P:Cn/t+p+cn --+ | 125 | | | | 126 T:P:C1/t+p+c1 | | | | 127 | | T:P/t+p | | 128 | V | V 129 +-----+ +-----+ 130 | | | | 131 Customers | C 1 | | C n | 132 | | | | 133 +-----+ +-----+ 135 The aggregation is shown by the fact one announces only the route 136 to its own "aggregated" prefix and masks routes to longer prefixes. 137 Upper levels should not know the details of lower levels, this 138 transparency property should be kept. 140 A top-level provider has no upper provider (ie. no default) and must 141 exchange routes with all the other top-level providers (ie. full 142 routing with its siblings is mandatory). In order to avoid routing 143 table explosion, the length of top-level prefixes is bounded 144 (therefore the number of top-level providers is bounded too). 146 2. Multihomed Routing Domains 148 A multihomed routing domain has more than one provider then it has 149 more than one prefix (usually a prefix per provider). 151 There are several reasons to be multihomed: 153 o the "two coasts" case where the routing domain is split into 154 sub-domains in different locations, each domain using a local 155 provider: 157 +-----+ +-----+ 158 | | | | 159 | T w | | T e | 160 | | | | 161 +-----+ +-----+ 162 ^ | ^ | 163 | | | | 164 +---------|-|--------------------|-|--------+ 165 | S | V | V | 166 | +-----+ +-----+ | 167 | | |--------------->| | | 168 | | S w | | S e | | 169 | | |<---------------| | | 170 | +-----+ +-----+ | 171 | | 172 +-------------------------------------------+ 174 But in fact this comes down to two routing domains with a backdoor 175 between them. The extra routes can be hidden and there is no 176 further matter. 178 o reliable service: to be able to use another provider in 179 case of a connectivity problem. Of course the purpose 180 is to limit trouble to the only case when all the 181 providers fail (and NOT when at least one fails!). 183 +-----+ +-----+ 184 | | | | 185 | T 1 | | T 2 | 186 | | | | 187 +-----+ +-----+ 188 ^ | ^ | 189 | | | | 190 | +--------+ +--------+ | 191 | | | | 192 +--------+ | | +--------+ 193 | | | | 194 | V | V 195 +--------+ 196 | | 197 | S | 198 | | 199 +--------+ 201 A given host of a such routing domain may (and should if 202 reliable connectivity is needed) have two different addresses, 203 one for each prefix (T1:S1:H in T1:S1/t1+s1 and T2:S2:H in 204 T2:S2/t2+s2). 206 This document mainly covers this case. 208 3. The Transparency Issue 210 If a domain prefix is announced at an upper level, it has to be 211 announced to this whole level. 213 ^ A/x ^ B/x and A:S/x+y 214 | | 215 +-----+ +-----+ 216 | | | | 217 | A | | B | 218 | | | | 219 +-----+ +-----+ 220 ^ | ^ | 221 | | | | 222 | +--------+ +--------+ | 223 | | | | 224 +--------+ | | +--------+ 225 | | | | 226 | V | V 227 +--------+ 228 | | 229 | S | 230 | | 231 +--------+ 233 If the provider B tries to announce the prefix A:S/x+y in order to be 234 able to route the traffic for S with both prefixes A:S/x+y and B:S/x+y 235 then B will catch the whole traffic for S because the prefix A:S/x+y 236 is longer than the prefix A/x (x+y > x) so it is a better match... 238 In this case the only solution is that both A and B announce routes 239 to prefixes A:S/x+y and B:S/x+y which breaks the transparency property 240 and obviously does not scale. 242 The [MULT] document proposes to announce the prefix A:S/x+y by B only 243 when the path through A (then announces by A) is not available. This 244 makes transparency problems less important but a route for a long 245 prefix is liable to filtering or flap damping mechanisms and should 246 be avoid. 248 A second solution proposed by [MULT] is to use tunnels in order to 249 keep connectivity even a path is not available: 251 ttttttttttttttttttttt 252 t t 253 +-----+ t +-----+ 254 | | t | | 255 | A | t | B | 256 | | t | | 257 +-----+ t +-----+ 258 ^ | tttttttt X ^ | 259 | | t X | | 260 | +--------+ t +----X---+ | 261 | | t | X | 262 +--------+ | t | +----X---+ 263 | | t | | X 264 | V t | V X 265 +----------+ 266 | S | 267 +----------+ 269 This uses a hairy configuration of EBGP and is limited by the tunnel 270 technology. We shall try to explore other kinds of solutions. 272 4. Upper Level Routing 274 At upper levels the structure looks like: 276 +--------+ 277 | NLAx | 278 +--------+ 279 | | 280 / \ 281 / \ 282 / \ 283 +-------+ +-------+ 284 | NLAy1 | | NLAy2 | 285 +-------+ +-------+ 286 | | 287 . . 288 . . 289 . . 290 | | 291 +-------+ +-------+ 292 | NLAz1 | | NLAz2 | 293 +-------+ +-------+ 294 \ / 295 \ / 296 \ / 297 | | 298 +--------+ 299 | S | 300 +--------+ 302 For an optimal routing S should have routes for any NLAi1 or NLAi2 up 303 to NLAx, the first common upper provider. For destinations outside the 304 diagram any provider (NLAz1 or NLAz2) can be used, usually the choice 305 of the provider is managed by internal policy rules. 307 The source address selection for S nodes should be coherent with the 308 upper level routing and the policy in order to avoid asymmetrical 309 routing. There is some proposals [SRCA] for source address selection 310 (and the dual problem, destination address selection) but a selection 311 service should: 313 o be synchronized with (external) routing, ie. there should be an 314 interaction between border routers and the service; 316 o be used by applications which can have more information; 318 o be used as the same time than DNS resolution which makes the 319 destination address selection easy to intergrate in the service, 320 ie. the list of addresses returned by the resolver can be converted 321 in a partial ordered list of source / destination address pairs. 323 The address selection problem should be addressed in other documents. 325 5. Mutual Backup 327 There is a case where the transparency property is kept, routing 328 is as reliable as possible and is optimal in almost all the cases. 330 ^ A/x and B/x ^ B/x and A/x 331 | | 332 +-----+ +-----+ 333 | |------ A/x ---->| | 334 | A | | B | 335 | |<------ B/x ----| | 336 +-----+ +-----+ 337 ^ | B:S/x+y ^ | 338 | | A:S/x+y | | 339 | +-- A/x -+ +--------+ | 340 | | | | 341 +--------+ | | +- B/x --+ 342 A:S/x+y | | | | 343 B:S/x+y | V | V 344 +--------+ 345 | | 346 | S | 347 | | 348 +--------+ 350 For a provider T in an upper level or the same one than providers A 351 and B, routes for the prefix A/x are not equivalent because the prefix 352 A/x announced by A is direct (one element (A) in the path) and the 353 prefix A/x announced by B is indirect (two elements (B and A) in the 354 path). Then traffic for A will go to A directly. The same thing 355 applies for B. 357 The prefix A:S/x+y is longer (ie. better) than the prefix A/x then 358 for A the whole traffic for S will go directly, same for B. 360 If the path through A is not available then the whole traffic for S, 361 including the one to or from addresses in the prefix A:S/x+y will go 362 through B. 364 This case supposes a mutual backup agreement between A and B which 365 can be the case if A and B are not in competition, for instance A is 366 a mission provider and B a geographical one. But it is a real 367 constraint... 369 This still works if announces between A and B do not carry full 370 prefixes (but they should include (ie. be shorter than) the prefix 371 *:S/x+y). The backup will work only for a part of A and B (with a dark 372 hole in case of failure for customers not implied in the backup 373 agreement). Unfortunately this does not work in more complex cases: 375 ^ A/x and B/x ^ B/x, A/x and C/x ^ C/x and B/x 376 | | | 377 +-----+ +--------+ +-----+ 378 | |--- A:S/x+y --->| |--- B:R/x+y --->| | 379 | A | | B | | C | 380 | |<--- B:S/x+y ---| |<--- C:R/x+y ---| | 381 +-----+ +--------+ +-----+ 382 ^ | B:S/x+y ^ | ^ | C:R/x+y ^ | 383 | | A:S/x+y | | | | B:R/x+y | | 384 | +-- A/x -+ +--------+ | | +-- B/x -+ +--------+ | 385 | | | | | | | | 386 +--------+ | | +- B/x --+ +--------+ | | +- C/x --+ 387 A:S/x+y | | | | B:R/x+y | | | | 388 B:S/x+y | V | V C:R/x+y | V | V 389 +--------+ +--------+ 390 | | | | 391 | S | | R | 392 | | | | 393 +--------+ +--------+ 395 The backup is not transitive in this case, if something goes wrong 396 in the B path for S the traffic can try to cross C which knows 397 nothing about S and will drop packets... 399 6. Broken Path 401 Consider the standard multihomed case when a link is broken: 403 +-----+ +-----+ 404 | | | | 405 | A | | B | 406 | | | | 407 +-----+ +-----+ 408 ^ | X ^ | 409 | | X | | 410 | +--------+ +---X----+ | 411 | | | X | 412 +--------+ | | +---X----+ 413 | | | | X 414 | V | V X 415 +--------+ 416 | | 417 | S | 418 | | 419 +--------+ 421 If we look inside the routing domain S: 423 +-----+ +-----+ 424 | | | | 425 | A | | B | 426 | | | | 427 +-----+ +-----+ 428 +---+ ^ | X ^ | 429 | X | | | X | | 430 +---+ | +--------+ +---X----+ | 431 | | | X | 432 +--------+ | | +---X----+ 433 | | | | X 434 | V | V X 435 +-----+ +-----+ 436 +---| BRA |---| BRB |---+ 437 | +-----+ +-----+ | 438 | | | | 439 | ------------------- | 440 | | | 441 | +---+ | 442 | | R | | 443 | +---+ | 444 | | | 445 | ------- | 446 | | | 447 | +---+ | 448 | | H | | 449 | S +---+ | 450 | | 451 +-----------------------+ 453 The host H has two addresses, A:S:H and B:S:H, and the path through B 454 is broken. 456 An external host X will use A:S:H because B:S:H does not work. The DNS 457 will return both addresses but the applications should try all of them 458 (on BSD 4.4 derived Unixes we have found only one standard application 459 trying only the first returned address). We can try to play on address 460 order in the DNS but the DNS caching mechanism makes this difficult 461 (but it is not necessary). In conclusion new connections from X to H 462 will work. 464 For new connections from H to X the problem is to force the choice of 465 the good source address (A:S:H) by H. The proposal is to encode the 466 "broken path" state in prefix information in router advertisement in 467 order to inform nodes that addresses in a given prefix should not be 468 used. The border router BRB knows there is a problem and should 469 send this information to all the routers of S using for instance 470 the router renumbering protocol [RENUM]. 472 The best choice for the signaling of a "broken path" is to set 473 the preferred lifetime of all the prefixes associated with the 474 "broken path" to zero. This condition is very easy to recognize and 475 its standard effect is to deprecate associated source addresses 476 (ie. source addresses using the "broken path" are still valid but 477 should not be used in new communications [ADDRC] what is exactly the 478 intended behavior). 480 In the last case, existing (ie. established before the failure) 481 connections between H (using B:S:H) and X are dealt with in the next 482 section. 484 7. Use Of Mobility Mechanisms 486 The idea is to use some mechanisms of IPv6 mobility [MOB] (home 487 address and binding update but not home-agent nor (in fact) true 488 mobility) in order to make critical connections resilient to provider 489 failures. 491 There is a connection between H and X (using addresses B:S:H and X) 492 with a security association for authentication (necessary for mobility 493 and not a real constraint for a critical connection because it is easy 494 to mess an unauthentic connection, for instance with junk RST TCP 495 packets). 497 +---+ 498 aaaaaaaaaaaaaaaaaaa| X | 499 a +---+ 500 a b 501 a b 502 +-----+ b +-----+ 503 | | b | | 504 | A | bb| B | 505 | | | | 506 +-----+ +-----+ 507 ^ | X ^ | 508 | | X | | 509 | +--------+ +---X----+ | 510 | | | X | 511 +--------+ | | +---X----+ 512 | | | | X 513 | V | V X 514 +--------------+ 515 | a b | 516 | a b | 517 | a +---+ | 518 | aaaaa| H | | 519 | +---+ | 520 | S | 521 +--------------+ 523 After the (used) path through B fails, the broken path condition is 524 set in the prefix B:S information in router advertisements then H is 525 informed of the problem. 527 H uses a home address B:S:H destination option in each packet for 528 X in order to use A:S:H as the source address: for each router the 529 source is in A's prefix and only X replaces the source address by 530 B:S:H before looking up the PCB of the connection. 532 H sends a binding update with A:S:H as the care-of address to X in a 533 packet with an Authentication Header. X receives and processes it, 534 sends a binding acknowledgement and uses a routing header with A:S:H 535 as the (first) destination and B:S:H as the final destination. 537 Summary: 539 o packets from H to X: 540 source = A:S:H 541 destination = X 542 home-address = B:S:H 543 binding-update (in first packets, should be acknowledged): 544 care-of = A:S:H 546 o packets from X to H: 547 source = X 548 destination = A:S:H 549 routing-header: one address = B:S:H 551 While X must implement the full mobile correspondent node operation, 552 H must implement only the binding management (no movement 553 detection, no new care-of address acquisition, no operation with a 554 home agent). In fact H does not move, it only changes its address 555 choice. 557 8. Security Considerations 559 A better reliability in Internet connectivity can only improve 560 security. Critical connection should be authenticated and binding 561 updates must be carried in authenticated packets (see [MOB] for the 562 discussion). IPSEC is mandatory for compliant IPv6 implementations. 564 9. ACKNOWLEDGEMENTS 566 All these ideas were discussed or found at the 40th IETF meeting at 567 Washington during lunch-time 6bone BOFs. The transparency issue was 568 well-known (and presented by Ben Crosby). The mutual backup scheme was 569 built by the author for a regional/organization dual-homing at a G6 570 meeting. 571 The non-transitive issue was presented by Alain Durand. The 572 diversion of mobility mechanisms appeared in the discussion between 573 the author and Matt Crawford who proposed the broken path stuff. 574 Erik Nordmark has proposed to implement the "broken path" condition 575 as the instantaneous deprecation of addresses using the "broken path". 576 The author would like to acknowledge inputs of the G6, the 6bone and 577 the RIPE communities. 579 10. Changes From Draft -00 581 - Update the "Status" section (add a reference to RFC 2026, ...). 583 - Add a reference about address selection problem. 585 - Change the "broken bit" in the "broken path" condition. 587 - Update the "Acknowledgements" and "References" sections. 589 11. References 591 [AGGR] Hinden, R., O'Dell, M. and Deering, S., "An IPv6 592 Aggregatable Global Unicast Address Format", RFC 2374, 593 July 1998. 595 [BGP] Rekhter, Y. and Li, T., "A Border Gateway Protocol 4 (BGP-4)", 596 RFC 1771, March 1995. 598 [MULT] Bates, T. and Rekhter, Y., "Scalable Support for Multi-homed 599 Multi-provider Connectivity", RFC 2260, January 1998. 601 [SRCA] Draves, R., "Simple Source Address Selection for IPv6", 602 Internet Draft, , 603 April 1999. 605 [RENUM] Crawford, M., "Router Renumbering for IPv6", Internet Draft, 606 , February 1999. 608 [ADDRC] Thomson, S. and Narten, T., "IPv6 Stateless Address 609 Autoconfiguration", RFC 2462, December 1998. 611 [MOB] Johnson, D. B., Perkins, C., "Mobility Support in IPv6", 612 Internet Draft, , 613 November 1998. 615 12. Author's Address 617 Francis Dupont 618 GIE DYADE 619 INRIA Rocquencourt 620 Domaine de Voluceau 621 B.P. 105 622 78153 Le Chesnay CEDEX 623 FRANCE 625 Fax: +33 1 39 63 58 66 626 EMail: Francis.Dupont@inria.fr 628 Expire in 6 months (December 25, 1999)