idnits 2.17.1 draft-bashandy-rtgwg-bgp-pic-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 144 has weird spacing: '... be inter...' == Line 842 has weird spacing: '...al peer as un...' == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 9, 2015) is 3091 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3107 (ref. '4') (Obsoleted by RFC 8277) == Outdated reference: A later version (-15) exists of draft-ietf-idr-add-paths-10 == Outdated reference: A later version (-22) exists of draft-ietf-spring-segment-routing-mpls-02 Summary: 1 error (**), 0 flaws (~~), 6 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group A. Bashandy, Ed. 2 Internet Draft C. Filsfils 3 Intended status: Informational Cisco Systems 4 Expires: May 2016 P. Mohapatra 5 Sproute Networks 6 November 9, 2015 7 BGP Prefix Independent Convergence 8 draft-bashandy-rtgwg-bgp-pic-02.txt 10 Abstract 12 In the network comprising thousands of iBGP peers exchanging millions 13 of routes, many routes are reachable via more than one path. Given 14 the large scaling targets, it is desirable to restore traffic after 15 failure in a time period that does not depend on the number of BGP 16 prefixes. In this document we proposed an architecture by which 17 traffic can be re-routed to ECMP or pre-calculated backup paths in a 18 timeframe that does not depend on the number of BGP prefixes. The 19 objective is achieved through organizing the forwarding chains in a 20 hierarchical manner and sharing forwarding elements among the maximum 21 possible number of routes. The proposed technique achieves prefix 22 independent convergence while ensuring incremental deployment, 23 complete transparency and automation, and zero management and 24 provisioning effort. It is noteworthy to mention that the benefits of 25 BGP-PIC are hinged on the existence of more than one path whether as 26 ECMP or primary-backup. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 This document may contain material from IETF Documents or IETF 34 Contributions published or made publicly available before November 35 10, 2008. The person(s) controlling the copyright in some of this 36 material may not have granted the IETF Trust the right to allow 37 modifications of such material outside the IETF Standards Process. 38 Without obtaining an adequate license from the person(s) 39 controlling the copyright in such materials, this document may not 40 be modified outside the IETF Standards Process, and derivative 41 works of it may not be created outside the IETF Standards Process, 42 except to format it for publication as an RFC or to translate it 43 into languages other than English. 45 Internet-Drafts are working documents of the Internet Engineering 46 Task Force (IETF), its areas, and its working groups. Note that 47 other groups may also distribute working documents as Internet- 48 Drafts. 50 Internet-Drafts are draft documents valid for a maximum of six 51 months and may be updated, replaced, or obsoleted by other 52 documents at any time. It is inappropriate to use Internet-Drafts 53 as reference material or to cite them other than as "work in 54 progress." 56 The list of current Internet-Drafts can be accessed at 57 http://www.ietf.org/ietf/1id-abstracts.txt 59 The list of Internet-Draft Shadow Directories can be accessed at 60 http://www.ietf.org/shadow.html 62 This Internet-Draft will expire on May 9, 2016. 64 Copyright Notice 66 Copyright (c) 2015 IETF Trust and the persons identified as the 67 document authors. All rights reserved. 69 This document is subject to BCP 78 and the IETF Trust's Legal 70 Provisions Relating to IETF Documents 71 (http://trustee.ietf.org/license-info) in effect on the date of 72 publication of this document. Please review these documents 73 carefully, as they describe your rights and restrictions with 74 respect to this document. Code Components extracted from this 75 document must include Simplified BSD License text as described in 76 Section 4.e of the Trust Legal Provisions and are provided without 77 warranty as described in the Simplified BSD License. 79 Table of Contents 81 1. Introduction...................................................3 82 1.1. Conventions used in this document.........................3 83 1.2. Terminology...............................................4 84 2. Constructing the Shared Hierarchical Forwarding Chain..........5 85 2.1. Databases.................................................5 86 2.2. Constructing the forwarding chain from a downloaded route.6 87 2.3. Examples..................................................7 88 2.3.1. Example 1: Forwarding Chain for iBGP ECMP............7 89 2.3.2. Example 2: Primary Backup Paths.....................10 90 2.3.3. Example 3: Platforms with Limited Levels of Hierarchy10 91 3. Forwarding Behavior...........................................15 92 4. Forwarding Chain Adjustment at a Failure......................17 93 4.1. BGP-PIC core.............................................17 94 4.2. BGP-PIC edge.............................................18 95 4.2.1. Adjusting forwarding Chain in egress node failure...19 96 4.2.2. Adjusting Forwarding Chain on PE-CE link Failure....19 97 4.3. Handling Failures for Flattended Forwarding Chains.......20 99 5. Properties....................................................21 100 6. Dependency....................................................23 101 7. Security Considerations.......................................24 102 8. IANA Considerations...........................................24 103 9. Conclusions...................................................25 104 10. References...................................................25 105 10.1. Normative References....................................25 106 10.2. Informative References..................................25 107 11. Acknowledgments..............................................26 109 1. Introduction 111 As a path vector protocol, BGP is inherently slow due to the 112 serial nature of reachability propagation. BGP speakers exchange 113 reachability information about prefixes[2][3] and, for labeled 114 address families, namely AFI/SAFI 1/4, 2/4, 1/128, and 2/128, an 115 edge router assigns local labels to prefixes and associates the 116 local label with each advertised prefix such as L3VPN [8], 6PE 117 [9], and Softwire [7] using BGP label unicast technique[4]. A BGP 118 speaker then applies the path selection steps to choose the best 119 path. In modern networks, it is not uncommon to have a prefix 120 reachable via multiple edge routers. In addition to proprietary 121 techniques, multiple techniques have been proposed to allow for 122 more than one path for a given prefix [6][11][12], whether in the 123 form of equal cost multipath or primary-backup. Another more 124 common and widely deployed scenario is L3VPN with multi-homed VPN 125 sites. 127 This document proposes a hierarchical and shared forwarding chain 128 organization that allows traffic to be restored to pre-calculated 129 alternative equal cost primary path or backup path in a time 130 period that does not depend on the number of BGP prefixes. The 131 technique relies on internal router behavior that is completely 132 transparent to the operator and can be incrementally deployed and 133 enabled with zero operator intervention. 135 1.1. Conventions used in this document 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 138 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 139 in this document are to be interpreted as described in RFC-2119 140 [1]. 142 In this document, these words will appear with that interpretation 143 only when in ALL CAPS. Lower case uses of these words are not to 144 be interpreted as carrying RFC-2119 significance. 146 1.2. Terminology 148 This section defines the terms used in this document. For ease of 149 use, we will use terms similar to those used by L3VPN [8] 151 o BGP prefix: It is a prefix P/m (of any AFI/SAFI) that a BGP 152 speaker has a path for. 154 o IGP prefix: It is a prefix P/m (of any AFI/SAFI) that is learnt 155 via an Interior Gateway Protocol, such as OSPF and ISIS, has a 156 path for. The prefix may be learnt directly through the IGP or 157 redistributed from other protocol(s) 159 o CE: It is an external router through which an egress PE can 160 reach a prefix P/m. 162 o Ingress PE, "iPE": It is a BGP speaker that learns about a 163 prefix through another IBGP peer and chooses that IBGP peer as 164 the next-hop for the prefix. 166 o Path: It is the next-hop in a sequence of unique connected 167 nodes starting from the current node and ending with the 168 destination node or network identified by the prefix. 170 o Recursive path: It is a path consisting only of the IP address 171 of the next-hop without the outgoing interface. Subsequent 172 lookups are needed to determine the outgoing interface. 174 o Non-recursive path: It is a path consisting of the IP address 175 of the next-hop and one outgoing interface 177 o Primary path: It is a recursive or non-recursive path that can 178 be used all the time. A prefix can have more than one primary 179 path 181 o Backup path: It is a recursive or non-recursive path that can 182 be used only after some or all primary paths become unreachable 184 o Leaf: A leaf is container data structure for a prefix or local 185 label. Alternatively, it is the data structure that contains 186 prefix specific information. 188 o IP leaf: Is the leaf corresponding to an IPv4 or IPv6 prefix 190 o Label leaf. It is the leaf corresponding to a locally allocated 191 label such as the VPN label on an egress PE [8]. 193 o Pathlist: It is an array of paths used by one or more prefix to 194 forward traffic to destination(s) covered by a IP prefix. Each 195 path in the pathlist carries its "path-index" that identifies 196 its position in the array of paths. A pathlist may contain a 197 mix of primary and backup paths 199 o OutLabel-Array: Each labeled prefix is associated with an 200 OutLabel-Array. The OutLabel-Array is a list of one or more 201 outgoing labels and/or label actions where each label or label 202 action has 1-to-1 correspondence to a path in the pathlist. It 203 is possible that the number of entries in the OutLabel-array is 204 different from the number of paths in the pathlist and the ith 205 Outlabel-Array entry is associated with the path whose path- 206 index is "i". Label actions are: push the label, pop the label, 207 or swap the incoming label with the label in the Outlabel-Array 208 entry. The prefix may be an IGP or BGP prefix 210 o Adjacency: It is the layer 2 encapsulation leading to the layer 211 3 directly connected next-hop 213 o Dependency: An object X is said to be a dependent or Child of 214 object Y if Object Y cannot be deleted unless object X is no 215 longer a dependent/child of object Y 217 o Route: It is a prefix with one or more paths associated with 218 it. Hence the minimum set of objects needed to construct a 219 route is a leaf and a pathlist. 221 2. Constructing the Shared Hierarchical Forwarding Chain 223 2.1. Databases 225 The Forwarding Information Base (FIB) on a router maintains 3 basic 226 databases 228 o Pathlist-DB: A pathlist is uniquely identified by the list of 229 paths. The Pathlist DB contains the set of all shared pathlists 231 o Leaf-DB: A leaf is uniquely identified by the prefix or the label 233 o Adjacency-DB: An adjacency is uniquely identified by the outgoing 234 layer 3 interface and the IP address of the next-hop directly 235 connected to the layer 3 interface. Adjacency DB contains the 236 list of all adjacencies 238 2.2. Constructing the forwarding chain from a downloaded route 240 1. A prefix with a list of paths is downloaded to FIB from BGP. For 241 labeled prefixes, an OutLabel-Array and possibly a local label 242 (e.g. for a VPN [8] prefix on an egress PE) are also downloaded 244 2. If the prefix does not exist, construct a new IP leaf from the 245 downloaded prefix. If a local label is allocated, construct a 246 label leaf from the local label 248 3. Construct an OutLabel-Array and attach the Outlabel array to the 249 IP and label leaf 251 4. The list of paths attached to the route is looked up in the 252 pathlist-DB 254 5. If a pathlist PL is found 256 a. Retrieve the pathlist 258 6. Else 260 a. Construct a new pathlist 262 b. Insert the new pathlist in the pathlist-DB 264 c. Resolve the paths of the pathlist as follows 266 d. Recursive path: 268 i. Lookup the next-hop in the leaf-DB 270 ii. If a leaf with at least one reachable path is found, add 271 the path to the dependency list of the leaf 273 iii. Otherwise the path remains unresolved and cannot be used 274 for forwarding 276 e. Non-recursive path 278 i. Lookup the next-hop and outgoing interface in the 279 adjacency-DB 281 ii. If an adjacency is found, add the path to the dependency 282 list of adjacency 284 iii. Otherwise, create a new adjacency and add the path to 285 its dependency list 287 7. Attach the leaf(s) as (a) dependent(s) of the pathlist 288 As a result of the above steps, a forwarding chain starting with a 289 leaf and ending with one or more adjacency is constructed. It is 290 noteworthy to mention that the forwarding chain is constructed 291 without any operator intervention at all. 293 2.3. Examples 295 This section outlines three examples that we will use for 296 illustration for the rest of the document. The first two examples 297 use a standard multihomed VPN [8] prefix in a BGP-free core running 298 LDP [5] or segment routing on MPLS [14]. The third example uses 299 inter-AS option C [8] with 2 domains running segment routing [14] or 300 LDP [5] in the core 302 The topology for the first two examples is depicted in Figure 1. 304 +-----------------------------------+ 305 | | 306 | LDP/Segment-Routing Core | 307 | | 308 | ePE2 309 | |\ 310 | | \ 311 | | \ 312 | | \ 313 iPE | CE.......VRF "Blue" 314 | | / (VPN-P1) 315 | | / (VPN-P2) 316 | | / 317 | |/ 318 | ePE1 319 | | 320 | | 321 | | 322 +-----------------------------------+ 323 Figure 1 VPN prefix reachable via multiple PEs 325 The first example is an illustration of ECMP while the second 326 example is an illustration of primary-backup paths. The third 327 example illustrate how to handle limited hardware capability. 329 2.3.1. Example 1: Forwarding Chain for iBGP ECMP 331 Consider the case of the ingress PE (iPE) in the multi-homed VPN 332 prefixes depicted in Figure 1. Suppose the iPE receives route 333 advertisements for the VPN prefixes VPN-P1 and VPN-P2 from two 334 egress PEs, ePE1 and ePE2 with next-hop BGP-NH1 and BGP-NH2, 335 respectively. Assume that ePE1 advertise the VPN labels VPN-L11 and 336 VPN-L12 while ePE2 advertise the VPN labels VPN-L21 and VPN-L22 for 337 VPN-P1 and VPN-P2, respectively. Suppose that BGP-NH1 and BGP-NH2 338 are resolved via the IGP prefixes IGP-P1 and IGP-P2, which also 339 happen to have 2 ECMP paths with IGP-NH1 and IGP-NH2 reachable via 340 the interfaces I1 and I2. Suppose that local labels (whether LDP[5] 341 or segment routing [14]) on the downstream LSRs for IGP-P1 and IGP- 342 P2 are assign the LDP labels LDP-L1 and LDP-L2 to the prefixes IGP- 343 P1 and IGP-P2. The forwarding chain on the ingress PE "iPE" for the 344 VPN prefixes is depicted in Figure 2. 346 BGP OutLabel Array 347 +---------+ 348 | VPN-L11 | 349 +--->+---------+ 350 | | VPN-L21 | 351 | +---------+ IGP OutLabel Array 352 | +---------+ 353 | | LDP-L11 | 354 | +-->+---------+ 355 | | | LDP-L21 | 356 VPN-P1------+ | +---------+ 357 | | 358 | | 359 | IGP-P1-----+ 360 | ^ | 361 | | | 362 V | V IGP Pathlist 363 +--------+ | +-------------+ 364 |BGP-NH1 |---------------+ | IGP-NH1, I1 |------>adj1 365 BGP +--------+ +-------------+ 366 Pathlist |BGP-NH2 |----+ | IGP-NH2, I2 |------>adj2 367 +--------+ | +-------------+ 368 ^ | ^ 369 | | | 370 | | | 371 | IGP-P2----------------+ 372 | | 373 | | 374 VPN-P2------+ | +---------+ 375 | | | LDP-L12 | 376 | +--->+---------+ 377 | | LDP-L22 | 378 | +---------+ 379 | +---------+ IGP OutLabel Array 380 | | VPN-L12 | 381 +--->+---------+ 382 | VPN-L22 | 383 +---------+ 384 BGP OutLabel Array 386 Figure 2 Forwarding Chain for VPN Prefixes with iBGP ECMP 388 The structure depicted in Figure 2 illustrates the two important 389 properties discussed in this memo: sharing and hierarchy. We can 390 see that the both the BGP and IGP pathlists are shared among 391 multiple BGP and IGP prefixes, respectively. At the same time, the 392 forwarding chain objects depend on each other in a child-parent 393 relation instead of being collapsed into a single level. 395 2.3.2. Example 2: Primary Backup Paths 397 Consider the egress PE ePE1 in the case of the multi-homed VPN 398 prefixes in the BGP-free LDP core depicted in Figure 1. Suppose ePE1 399 determines that the primary path is the external path but the backup 400 path is the iBGP path to the other PE ePE2 with next-hop BGP-NH2. 401 ePE2 constructs the forwarding chain depicted in Figure 1. We are 402 only showing a single VPN prefix for simplicity. But all prefixes 403 that are multihomed to ePE1 and ePE2 share the BGP pathlist 405 BGP OutLabel Array 406 VPL-L11 +---------+ 407 (Label-leaf)---+---->|Unlabeled| 408 | +---------+ 409 | | VPN-L21 | 410 | | (swap) | 411 | +---------+ 412 | ^ 413 | | BGP Pathlist 414 | | +------------+ Connected route 415 | | | CE-NH |------>(to the CE) 416 | | |path-index=0| 417 | | +------------+ 418 V | | VPN-NH2 | 419 VPN-P1 ------------------+------>| (backup) |------>IGP Leaf 420 (IP prefix leaf) |path-index=1| (Towards ePE2) 421 +-----+------+ 423 Figure 3 : VPN Prefix Forwarding Chain with eiBGP paths on egress PE 425 The example depicted in Figure 3 differs from the example in Figure 426 2 in two main aspects. First as long as the primary path towards the 427 CE (external path) is useable, it will be the only path used for 428 forwarding while the OutLabel-Array contains both the unlabeled 429 label (primary path) and the VPN label (backup path) advertised by 430 the backup path ePE2. The second aspect is presence of the label 431 leaf corresponding to the VPN prefix. This label leaf is used to 432 match VPN traffic arriving from the core. Note that the label leaf 433 shares the OutLabel-Array and the pathlist with the IP prefix. 435 2.3.3. Example 3: Platforms with Limited Levels of Hierarchy 437 This example uses a case of inter-AS option C [8] where there are 3 438 levels of hierarchy. Figure 4 illustrates the sample topology. To 439 force 3 levels of hierarchy, the ASBRs on the ingress domain (domain 440 1) advertise the core routers of the egress domain (domain 2) to the 441 ingress PE (iPE) via BGP-LU [4] instead of redistributing then into 442 the IGP of domain 1. The end result is that the ingress PE (iPE) has 443 2 levels of recursion for the VPN prefix VPN-P1 and VPN2-P2. 445 Domain 1 Domain 2 446 +----------------+ +-------------+ 447 | | | | 448 | LDP/SR Core | | LDP/SR core | 449 | | | | 450 | ASBR11------ASBR21.......PE21\ 451 | | \ / | . . | \ 452 | | \ / | . . | \ 453 | | \/ | .. | \VPN-P1 454 | | /\ | . . | / 455 | | / \ | . . | / 456 | | / \ | . . | / 457 iPE ASBR12------ASBR22.......PE22 458 | | | | \ 459 | | | | \ 460 | | | | \ 461 | | | | /VPN-P2 462 | | | | / 463 | | | | / 464 | ASBR13------ASBR23.......PE23/ 465 | | | | 466 | | | | 467 +----------------+ +-------------+ 468 <============== <========= <============ 469 Advertise PE2x Advertise Redistribute 470 Using iBGP-LU PE2x Using IGP into 471 eBGP-LU BGP 473 Figure 4 Sample 3-level hierarchy topology 475 We will make the following assumptions about connectivity 477 o In "domain 2", both ASBR21 and ASBR22 can reach both PE21 and 478 PE22 using the same distance 480 o In "domain 2", only ASBR23 can reach PE23 482 o In "domain 1", iPE (the ingress PE) can reach ASBR1, ASBR12, and 483 ASBR13 via IGP using the same distance 485 We will make the following assumptions about the labels 487 o The VPN labels advertised by PE21 and PE22 for prefix VPN-P1 are 488 VPN-PE21(P1) and VPN-PE22(P1), respectively 490 o The VPN labels advertised byPE22 and PE23 for prefix VPN-P2 are 491 VPN-PE22(P2) and VPN-PE23(P2), respectively 493 o The labels for advertised to iPE by ASBR11 using BGP-LU [4] for 494 the egress PEs PE21 and PE22 are LASBR11(PE21) and LASBR11(PE22), 495 respectively. 497 o The labels for advertised by ASBR12 to iPE using BGP-LU [4] for 498 the egress PEs PE21 and PE22 are LASBR12(PE21) and LASBR12(PE22), 499 respectively 501 o The label for advertised by ASBR11 to iPE using BGP-LU [4] for 502 the egress PE PE23 is LASBR13(PE23) 504 o The local labels of the next hops from the ingress PE iPE towards 505 ASBR11, ASBR12, and ASBR13 in the core of domain 1 are L11, L12, 506 and L13, respectively. 508 The diagram in Figure 5 illustrates the forwarding chain assuming 509 that the forwarding hardware in iPE supports 3 levels of hierarchy. 510 The leaves corresponding to the ABSRs on domain 1 (ASBR11, ASBR12, 511 and ASBR13) are at the bottom of the hierarchy. There are few 512 important points 514 o Because the hardware supports the required depth of hierarchy, 515 the sizes of a pathlist equal the size of the label array 516 associated with the leaves using this pathlist 518 o The index inside the pathlist entry indicates the label that will 519 be picked from the Outlabel-array if that path is chosen by the 520 forwarding engine hashing function. 522 Outlabel Array Outlabel Array 523 For VPN-P1 For VPN-P2 524 +------------+ +-------+ +-------+ +------------+ 525 |VPN-PE21(P1)|<---| VPN-P1| | VPN-P2|-->|VPN-PE22(P2)| 526 +------------+ +---+---+ +---+---+ +------------+ 527 |VPN-PE22(P1)| | | |VPN-PE23(P2)| 528 +------------+ | | +------------+ 529 | | 530 V V 531 +---+---+ +---+---+ 532 | 0 | 1 | | 0 | 1 | 533 +-|-+-\-+ +-/-+-\-+ 534 | \ / \ 535 | \ / \ 536 | \ / \ 537 | \ / \ 538 v \ / \ 539 +-----+ +-----+ +-----+ 540 +----+ PE21| |PE22 +-----+ | PE23+-----+ 541 | +--+--+ +-----+ | +--+--+ | 542 v | / v | v 543 +-------------+ | / +-------------+ | +-------------+ 544 |LASBR11(PE21)| | / |LASBR11(PE22)| | |LASBR13(PE23)| 545 +-------------+ | / +-------------+ | +-------------+ 546 |LASBR12(PE21)| | / |LASBR12(PE22)| | Outlabel Array 547 +-------------+ | / +-------------+ | For PE23 548 Outlabel Array | / Outlabel Array | 549 For PE21 | / For PE22 | 550 | / | 551 | / | 552 | / | 553 v / v 554 +---+---+ Shared Pathlist +---+ Pathlist 555 | 0 | 1 | For PE21 and PE22 | 0 | For PE23 556 +-|-+-\-+ +-|-+ 557 | \ | 558 | \ | 559 | \ | 560 | \ | 561 v \ v 562 +---+ +------+ +------+ +---+ +------+ +---+ 563 |L11|<--->|ASBR11| |ASBR12+--->|L12| |ASBR13+--->|L13| 564 +---+ +------+ +------+ +---+ +------+ +---+ 566 Figure 5 : Forwarding Chain for hardware supporting 3 Levels 568 Now suppose the hardware on iPE (the ingress PE) supports 2 levels 569 of hierarchy only. In that case, the 3-levels forwarding chain in 570 Figure 5 needs to be "flattended" into 2 levels only. 572 Outlabel Array Outlabel Array 573 For VPN-P1 For VPN-P2 574 +------------+ +-------+ +-------+ +------------+ 575 |VPN-PE21(P1)|<---| VPN-P1| | VPN-P2|--->|VPN-PE22(P2)| 576 +------------+ +---+---+ +---+---+ +------------+ 577 |VPN-PE22(P1)| | | |VPN-PE23(P2)| 578 +------------+ | | +------------+ 579 | | 580 | | 581 | | 582 Flattened | | Flattened 583 pathlist V V pathlist 584 +===+===+ +===+===+===+ +=============+ 585 +--------+ 0 | 1 | | 0 | 0 | 1 +---->|LASBR11(PE22)| 586 | +=|=+=\=+ +=/=+=/=+=\=+ +=============+ 587 v | \ / / \ |LASBR12(PE22)| 588 +=============+ | \ +-----+ / \ +=============+ 589 |LASBR11(PE21)| | \/ / \ |LASBR13(PE23)| 590 +=============+ | /\ / \ +=============+ 591 |LASBR12(PE21)| | / \ / \ 592 +=============+ | / \ / \ 593 | / \ / \ 594 | / + + \ 595 | + | | \ 596 | | | | \ 597 v v v v \ 598 +---+ +------+ +------+ +---+ +------+ +---+ 599 |L11|<--->|ASBR11| |ASBR12+--->|L12| |ASBR13+--->|L13| 600 +---+ +------+ +------+ +---+ +------+ +---+ 602 Figure 6 : Flattening 3 levels to 2 levels of Hierarchy on iPE 604 Figure 6 represents one way to "flatten" a 3 levels hierarchy into 605 two levels. There are few important points. 607 o The flattened pathlists have label arrays associated with them. 608 The size of the label array associated with the flattened 609 pathlist equals the size of the pathlist. Hence it is possible 610 that an implementation includes these label arrays in the 611 flattened pathlist itself 613 o Because of "flattening", the size of a flattened pathlist may not 614 be equal to the size of the label arrays of leaves using the 615 flattened pathlist. 617 o The indices inside a flattened pathlist still indicate the label 618 index in the Outlabel-Arrays of the leaves using that pathlist. 619 Because the size of the flattened pathlist may be different from 620 the size of the label arrays of the leaves, the indices may be 621 repeated 623 o Let's take a look at the flattened pathlist used by the prefix 624 "VPN-P2", The pathlist associated with the prefix "VPN-P2" has 625 three entries. 627 o The first and second entry have index "0". This is because 628 both entries correspond to PE22. Hence when hashing performed 629 by the forwarding engine results in using first or the second 630 entry in the pathlist, the forwarding engine will pick the 631 correct VPN label "VPN-PE22(P2)", which is the label 632 advertised by PE22 for the prefix "VPN-P2" 634 o The third entry has the index "1". This is because the third 635 entry corresponds to PE23. Hence when the hashing is 636 performed by the forwarding engine results in using the third 637 entry in the flattened pathlist, the forwarding engine will 638 pick the correct VPN label "VPN-PE22(P2)", which is the label 639 advertised by "PE23" for the prefix "VPN-P2" 641 3. Forwarding Behavior 643 When a packet arrives at a router, it matches a leaf. A labeled 644 packet matches a label leaf while an IP packet matches an IP prefix 645 leaf. The forwarding engines walks the forwarding chain starting 646 from the leaf until the walk terminates on an adjacency. Thus when a 647 packet arrives, the chain is walked as follows: 649 1. Lookup the leaf based on the destination address or the label at 650 the top of the packet 652 2. Retrieve the parent pathlist of the leaf 654 3. Pick the outgoing path from the list of resolved paths in the 655 pathlist. The method by which the outgoing path is picked is 656 beyond the scope of this document (i.e. flow-preserving hash 657 exploiting entropy within the MPLS stack and IP header). Let the 658 "path-index" of the outgoing path be "i". 660 4. If the prefix is labeled, use the "path-index" "i" to retrieve 661 the ith label "Li" stored the ith entry in the OutLabel-Array and 662 apply the label action of the label on the packet (e.g. for VPN 663 label on the ingress PE, the label action is "push"). 665 5. Move to the parent of the chosen path "i" 667 6. If the chosen path "i" is recursive, move to its parent prefix 668 and go to step 2 670 7. If the chosen path "i" is non-recursive move to its parent 671 adjacency 673 8. Encapsulate the packet in the L2 string specified by the 674 adjacency and send the packet out. 676 Let's applying the above forwarding steps to the example described 677 in Figure 1 Section 2.3.1. Suppose a packet arrives at ingress PE 678 iPE from an external neighbor. Assume the packet matches the VPN 679 prefix VPN-P1. While walking the forwarding chain, the forwarding 680 engine applies a hashing algorithm to choose the path and the 681 hashing at the BGP level yields path 0 while the hashing at the IGP 682 level yields path 1. In that case, the packet will be sent out of 683 interface I1 with the label stack "LDP-L12,VPN-L21". 685 Now let's try and apply the above steps to the flattened forwarding 686 chain illustrated in Figure 6. 688 o Suppose a packet arrives at "iPE" and matches the VPN prefix 689 "VPN-P2" 691 o The forwarding engine walks to the parent of the "VPN_P2", whiuch 692 is the flattened pathlist and applies a hashing algorithm to pick 693 a path 695 o Suppose the hashing by the forwarding engine picks the second 696 entry in the flattened pathlist associated with the leaf "VPN- 697 P2". 699 o Because the second entry has the index "0", the label "VPN- 700 PE22(P2)" is pushed on the packet 702 o At the same time, the forwarding engine picks the second label 703 from the Outlabel-Array associated with the flattened pathlist. 704 Hence the next label that is pushed is "LASBR12(PE22)" 706 o The forwarding engine now moves to the parent of the flattened 707 pathlist corresponding tgo the second entry. The parent is the 708 IGP label leaf corresponding to "ASBR12" 710 o So the packet is forwarded towards the ASBR "ASBR12" and the 711 SR/LDP label at the top will be "L12" 713 The packet is arriving at iPE reaches its destination as follows 715 o iPE sends the packet along the shortest path towards ASBR12 with 716 the following label stack starting from the top: {L12, 717 LASBR12(PE22), VPN-PE22(P2)}. 719 o The penultimate hop of ASBR12 pops the top label "L12". Hence the 720 packet arrives at ASBR12 with the label stack {LASBR12(PE22), 721 VPN-PE22(P2)} where "LASBR12(PE22)" is the top label. 723 o ASBR12 swaps "LASBR12(PE22)" with the label "LASBR22(PE22)", 724 which is the label advertised by ASBR22 for the PE22 (the egress 725 PE). 727 o ASBR22 receives the packet with "LASBR22(PE22)" at the top. 729 o Hence ASBR22 swaps "LASBR22(PE22)" with the LDP/SR label of PE22, 730 pushes the label of the next-hop towards PE22 in domain 2, and 731 sends the packet along the shortest path towards PE22. 733 o The penultimate hop of PE22 pops the top label. Hence PE22 734 receives the packet with the top label VPN-PE22(P2) at the top 736 o PE22 pops "VPN-PE22(P2)" and sends the packet as a pure IP packet 737 towards the destination VPN-PE22. 739 4. Forwarding Chain Adjustment at a Failure 741 The hierarchical and shared structure of the forwarding chain 742 explained in Section 2 allows modifying a small number of 743 forwarding chain objects to re-route traffic to a pre-calculated 744 equal-cost or backup path without the need to modify the possibly 745 very large number of BGP prefixes. In this section, we go over 746 various core and edge failure scenarios to illustrate how FIB 747 manager can utilize the forwarding chain structure to achieve prefix 748 independent convergence. 750 4.1. BGP-PIC core 752 This section describes the adjustments to the forwarding chain when 753 a core link or node fails but the BGP next-hop remains reachable. 755 There are two case: remote link failure and attached link failure. 756 Node failures are treated as link failures. 758 When a remote link or node fails, IGP on the ingress PE receives 759 advertisement indicating a topology change so IGP re-converges to 760 either find a new next-hop and outgoing interface or remove the path 761 completely from the IGP prefix used to resolve BGP next-hops. IGP 762 and/or LDP download the modified IGP leaves with modified outgoing 763 labels for labeled core. FIB manager modifies the existing IGP leaf 764 by executing the steps outlined in Section 2.2. 766 When a local link fails, FIB manager detects the failure almost 767 immediately. The FIB manager marks the impacted path(s) as unuseable 768 so that only useable paths are used to forward packets. Note that in 769 this particular case there is actually no need even to backwalk to 770 IGP leaves to adjust the OutLabel-Arrays because FIB can rely on the 771 path-index stored in the useable paths in the loadinfo to pick the 772 right label. 774 It is noteworthy to mention that because FIB manager modifies the 775 forwarding chain starting from the IGP leaves only, BGP pathlists 776 and leaves are not modified. Hence traffic restoration occurs within 777 the time frame of IGP convergence, and, for local link failure, 778 within the timeframe of local detection. Thus it is possible to 779 achieve sub-50 msec convergence as described in [10] for local link 780 failure 782 Let's apply the procedure to the forwarding chain depicted in Figure 783 2 Section 2.3.1. Suppose a remote link failure occurs and impacts 784 the first ECMP IGP path to the remote BGP nhop. Upon IGP 785 convergence, the IGP pathlist of the BGP nhop is updated to reflect 786 the new topology (one path instead of two). As soon as the IGP 787 convergence is effective for the BGP nhop entry, the new forwarding 788 state is immediately available to all dependent BGP prefixes. The 789 same behavior would occur if the failure was local such as an 790 interface going down. As soon as the IGP convergence is complete for 791 the BGP nhop IGP route, all its BGP depending routes benefit from 792 the new path. In fact, upon local failure, if LFA protection is 793 enabled for the IGP route to the BGP nhop and a backup path was pre- 794 computed and installed in the pathlist, upon the local interface 795 failure, the LFA backup path is immediately activated (sub-50msec) 796 and thus protection benefits all the depending BGP traffic through 797 the hierarchical forwarding dependency between the routes. 799 4.2. BGP-PIC edge 801 This section describes the adjustments to the forwarding chains as a 802 result of edge node or edge link failure 804 4.2.1. Adjusting forwarding Chain in egress node failure 806 When an edge node fails, IGP on neighboring core nodes send route 807 updates indicating that the edge node is no longer reachable. IGP 808 running on the iBGP peers instructs FIB to remove the IP and label 809 leaves corresponding to the failed edge node from FIB. So FIB 810 manager performs the following steps: 812 o FIB manager deletes the IGP leaf corresponding to the failed edge 813 node 815 o FIB manager backwalks to all dependent BGP pathlists and marks 816 that path using the deleted IGP leaf as unresolved 818 o Note that there is no need to modify BGP leaves because each path 819 in the pathlist carries its path index and hence the correct 820 outgoing label will be picked. So for example the forwarding 821 chain depicted in Figure 2, if the 1st path becomes unresolved, 822 then the forwarding engine will only use the second path path for 823 forwarding. Yet the pathindex of that single resolved path will 824 still be 1 and hence the label VPN-L21 or VPN-L22 will be pushed 826 4.2.2. Adjusting Forwarding Chain on PE-CE link Failure 828 Suppose the link between an edge router and its external peer fails. 829 There are two scenarios (1) the edge node attached to the failed 830 link performs next-hop self and (2) the edge node attached to the 831 failure advertises the IP address of the failed link as the next-hop 832 attribute to its iBGP peers. 834 In the first case, the rest of iBGP peers will remain unaware of the 835 link failure and will continue to forward traffic to the edge node 836 until the edge node attached to the failed link withdraws the BGP 837 prefixes. If the destination prefixes are multi-homed to another 838 iBGP peer, say ePE2, then FIB manager on the edge router detecting 839 the link failure performs the following tasks 841 o FIB manager backwalks to the BGP pathlists marks the path through 842 the failed link to the external peer as unresolved 844 o Hence traffic will be forwarded used the backup path towards ePE2 846 o For labeled traffic 848 o The Outlabel-Array attached to the BGP leaves already 849 contains an entry corresponding to the path towards ePE2. 851 o The label entry in OutLabel-Arrays corresponding to the 852 internal path to ePE2 has swap action and the label 853 advertised by ePE2 855 o For an arriving label packet (e.g. VPN), the top label is 856 swapped with the label advertised by ePE2 858 o For unlabeled traffic, packets is simply redirected towards ePE2. 859 To avoid loops, ePE2 MUST treat any core facing path as a backup 860 path, otherwise ePE2 may redirect traffic arriving from the core 861 back to ePE1 causing a loop. 863 In the second case where the edge router uses the IP address of the 864 failed link as the BGP next-hop, the edge router will still perform 865 the previous steps. But, unlike the case of next-hop self, IGP on 866 failed edge node informs the rest of the iBGP peers that IP address 867 of the failed link is no longer reachable. Hence the FIB manager on 868 iBGP peers will delete the IGP leaf corresponding to the IP prefix 869 of the failed link. The behavior of the iBGP peers will be identical 870 to the case of edge node failure outlined in Section 4.2.1. 872 It is noteworthy to mention that because the edge link failure is 873 local to the edge router, sub-50 msec convergence can be achieved as 874 described in [10]. 876 Let's try to apply the case of next-hop self to the forwarding chain 877 depicted in Figure 3. After failure of the link between ePE1 and CE, 878 the forwarding engine will route traffic arriving from the core 879 towards VPN-NH2 with path-index=1. A packet arriving from the core 880 will contain the label VPN-L11 at top. The label VPN-L11 is swaped 881 with the label VPN-L21 and the packet is forwarded towards ePE2 883 4.3. Handling Failures for Flattended Forwarding Chains 885 As explained in the Example in Section 2.3.3, if the number of 886 hierarchy levels of a platform cannot support the number of 887 hierarchy levels of a recursive dependency, the instantiated 888 forwarding chain is constructed by flattening two or more levels. 889 Hence a 3 levels chain in Figure 5 is flattened into the 2 levels 890 chain in Figure 6. 892 While reducing the benefits of BGP-PIC, flattening one hierarchy 893 into a shallower hierarchy does not always result in a complete loss 894 of the benefits of the BGP-PIC. To illustrate this fact suppose 895 ASBR12 is no longer reachable. If the platform supports the full 896 hierarchy depth, the forwarding chain is depicted in Figure 5 and 897 hence the FIB manager needs to backwalk one level to the pathlist 898 shared by "PE21" and "PE222" and adjust it. If the platform supports 899 2 levels of hierarchy, then a useable forwarding chain is the one 900 depicted in Figure 6. In that case, if ASBR12 is no longer 901 reachable, the FIB manager has to backwalk to the two flattened 902 pathlists and update both of them. 904 Hence if the platform supports the "unflattened" forwarding chain, 905 then a single pathlist needs to be updated while if the platform 906 supports a shallower forwarding chain, then two pathlists need to be 907 updated. In the latter case, convergence is still independent of the 908 number of leaves due to the fact that the flattened pathlists 909 continue to be shared among possibly a large number of leaves 911 5. Properties 913 5.1 Coverage 915 All the possible failures, except CE node failure, are covered, 916 whether they impact a local or remote IGP path or a local or remote 917 BGP nhop as described in Section 4. This section provides details 918 for each failure and now the hierarchical and shared FIB structure 919 proposed in this document allows recovery that does not depend on 920 number of BGP prefixes 922 5.1.1 A remote failure on the path to a BGP nhop 924 Upon IGP convergence, the IGP leaf for the BGP nhop is updated upon 925 IGP convergence and all the BGP depending routes leverage the new 926 IGP forwarding state immediately. 928 This BGP resiliency property only depends on IGP convergence and is 929 independent of the number of BGP prefixes impacted. 931 5.1.2 A local failure on the path to a BGP nhop 933 Upon LFA protection, the IGP leaf for the BGP nhop is updated to use 934 the precomputed LFA backup path and all the BGP depending routes 935 leverage this LFA protection. 937 This BGP resiliency property only depends on LFA protection and is 938 independent of the number of BGP prefixes impacted. 940 5.1.3 A remote iBGP nhop fails 942 Upon IGP convergence, the IGP leaf for the BGP nhop is deleted and 943 all the depending BGP Path-Lists are updated to either use the 944 remaining ECMP BGP best-paths or if none remains available to 945 activate precomputed backups. 947 This BGP resiliency property only depends on IGP convergence and is 948 independent of the number of BGP prefixes impacted. 950 5.1.4 A local eBGP nhop fails 951 Upon local link failure detection, the adjacency to the BGP nhop is 952 deleted and all the depending BGP Path-Lists are updated to either 953 use the remaining ECMP BGP best-paths or if none remains available 954 to activate precomputed backups. 956 This BGP resiliency property only depends on local link failure 957 detection and is independent of the number of BGP prefixes impacted. 959 5.2 Performance 961 When the failure is local (a local IGP nhop failure or a local eBGP 962 nhop failure), a pre-computed and pre-installed backup is activated 963 by a local-protection mechanism that does not depend on the number 964 of BGP destinations impacted by the failure. Sub-50msec is thus 965 possible even if millions of BGP routes are impacted. 967 When the failure is remote (a remote IGP failure not impacting the 968 BGP nhop or a remote BGP nhop failure), an alternate path is 969 activated upon IGP convergence. All the impacted BGP destinations 970 benefit from a working alternate path as soon as the IGP convergence 971 occurs for their impacted BGP nhop even if millions of BGP routes 972 are impacted. 974 5.2.1 Perspective 976 The following table puts the BGP PIC benefits in perspective 977 assuming 979 o 1M impacted BGP prefixes 981 o IGP convergence ~ 500 msec 983 o local protection ~ 50msec 985 o FIB Update per BGP destination ~ 100usec conservative, 987 ~ 10usec optimistic 989 o BGP Convergence per BGP destination ~ 200usec conservative, 991 ~ 100usec optimistic 993 Without PIC With PIC 995 Local IGP Failure 10 to 100sec 50msec 997 Local BGP Failure 100 to 200sec 50msec 998 Remote IGP Failure 10 to 100sec 500msec 1000 Local BGP Failure 100 to 200sec 500msec 1002 Upon local IGP nhop failure or remote IGP nhop failure, the existing 1003 primary BGP nhop is intact and usable hence the resiliency only 1004 depends on the ability of the FIB mechanism to reflect the new path 1005 to the BGP nhop to the depending BGP destinations. Without BGP PIC, 1006 a conservative back-of-the-envelope estimation for this FIB update 1007 is 100usec per BGP destination. An optimistic estimation is 10usec 1008 per entry. 1010 Upon local BGP nhop failure or remote BGP nhop failure, without the 1011 BGP PIC mechanism, a new BGP Best-Path needs to be recomputed and 1012 new updates need to be sent to peers. This depends on BGP processing 1013 time that will be shared between best-path computation, RIB update 1014 and peer update. A conservative back-of-the-envelope estimation for 1015 this is 200usec per BGP destination. An optimistic estimation is 1016 100usec per entry. 1018 5.3 Automated 1020 The BGP PIC solution does not require any operator involvement. The 1021 process is entirely automated as part of the FIB implementation. 1023 The salient points enabling this automation are: 1025 o Extension of the BGP Best Path to compute more than one primary 1026 ([11]and [12]) or backup BGP nhop ([6] and [13]). 1028 o Sharing of BGP Path-list across BGP destinations with same 1029 primary and backup BGP nhop 1031 o Hierarchical indirection and dependency between BGP Path-List and 1032 IGP-Path-List 1034 5.4 Incremental Deployment 1036 As soon as one router supports BGP PIC solution, it benefits from 1037 all its benefits without any requirement for other routers to 1038 support BGP PIC. 1040 6. Dependency 1042 This section describes the required functionality in the forwarding 1043 and control planes to support BGP-PIC described in this document 1044 6.1 Hierarchical Hardware FIB 1046 BGP PIC requires a hierarchical hardware FIB support: for each BGP 1047 forwarded packet, a BGP leaf is looked up, then a BGP Pathlist is 1048 consulted, then an IGP Pathlist, then an Adjacency. 1050 An alternative method consists in "flattening" the dependencies when 1051 programming the BGP destinations into HW FIB resulting in 1052 potentially eliminating both the BGP Path-List and IGP Path-List 1053 consultation. Such an approach decreases the number of memory 1054 lookup's per forwarding operation at the expense of HW FIB memory 1055 increase (flattening means less sharing hence duplication), loss of 1056 ECMP properties (flattening means less pathlist entropy) and loss of 1057 BGP PIC properties. 1059 6.2 Availability of more than one primary or secondary BGP next-hops 1061 When the primary BGP next-hop fails, BGP PIC depends on the 1062 availability of a pre-computed and pre-installed secondary BGP next- 1063 hop in the BGP Pathlist. 1065 The existence of a secondary next-hop is clear for the following 1066 reason: a service caring for network availability will require two 1067 disjoint network connections hence two BGP nhops. 1069 The BGP distribution of the secondary next-hop is available thanks 1070 to the following BGP mechanisms: Add-Path [11], BGP Best-External 1071 [6], diverse path [12], and the frequent use in VPN deployments of 1072 different VPN RD's per PE. It is noteworthy to mention that the 1073 availability of another BGP path does not mean that all failure 1074 scenarios can be covered by simply forwarding traffic to the 1075 available secondary path. The discussion of how to cover various 1076 failure scenarios is beyond the scope of this document 1078 6.3 Pre-Computation of a secondary BGP nhop 1080 [13] describes how a secondary BGP next-hop can be precomputed on a 1081 per BGP destination basis. 1083 7. Security Considerations 1085 No additional security risk is introduced by using the mechanisms 1086 proposed in this document 1088 8. IANA Considerations 1090 No requirements for IANA 1092 9. Conclusions 1094 This document proposes a hierarchical and shared forwarding chain 1095 structure that allows achieving prefix independent convergence, 1096 and in the case of locally detected failures, sub-50 msec 1097 convergence. A router can construct the forwarding chains in a 1098 completely transparent manner with zero operator intervention. It 1099 supports incremental deployment. 1101 10. References 1103 10.1. Normative References 1105 [1] Bradner, S., "Key words for use in RFCs to Indicate 1106 Requirement Levels", BCP 14, RFC 2119, March 1997. 1108 [2] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway Protocol 1109 4 (BGP-4), RFC 4271, January 2006 1111 [3] Bates, T., Chandra, R., Katz, D., and Rekhter Y., 1112 "Multiprotocol Extensions for BGP", RFC 4760, January 2007 1114 [4] Y. Rekhter and E. Rosen, " Carrying Label Information in BGP- 1115 4", RFC 3107, May 2001 1117 [5] Andersson, L., Minei, I., and B. Thomas, "LDP Specification", 1118 RFC 5036, October 2007 1120 10.2. Informative References 1122 [6] Marques,P., Fernando, R., Chen, E, Mohapatra, P., Gredler, H., 1123 "Advertisement of the best external route in BGP", draft-ietf- 1124 idr-best-external-05.txt, January 2012. 1126 [7] Wu, J., Cui, Y., Metz, C., and E. Rosen, "Softwire Mesh 1127 Framework", RFC 5565, June 2009. 1129 [8] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1130 Networks (VPNs)", RFC 4364, February 2006. 1132 [9] De Clercq, J. , Ooms, D., Prevost, S., Le Faucheur, F., 1133 "Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider 1134 Edge Routers (6PE)", RFC 4798, February 2007 1136 [10] O. Bonaventure, C. Filsfils, and P. Francois. "Achieving sub- 1137 50 milliseconds recovery upon bgp peering link failures, " 1138 IEEE/ACM Transactions on Networking, 15(5):1123-1135, 2007 1140 [11] D. Walton, E. Chen, A. Retana, J. Scudder, "Advertisement of 1141 Multiple Paths in BGP", draft-ietf-idr-add-paths-10.txt, 1142 October 2014 1144 [12] R. Raszuk, R. Fernando, K. Patel, D. McPherson, K. Kumaki, 1145 "Distribution of diverse BGP paths", RFC 6774.txt, November 1146 2012 1148 [13] P. Mohapatra, R. Fernando, C. Filsfils, and R. Raszuk, "Fast 1149 Connectivity Restoration Using BGP Add-path", draft-pmohapat- 1150 idr-fast-conn-restore-03, Jan 2013 1152 [14] C. Filsfils, S. Previdi, A. Bashandy, B. Decraene, S. 1153 Litkowski, M. Horneffer, R. Shakir, J. Tansura, E. Crabbe 1154 "Segment Routing with MPLS data plane", draft-ietf-spring- 1155 segment-routing-mpls-02 (work in progress), October 2015 1157 11. Acknowledgments 1159 Special thanks to Neeraj Malhotra, Yuri Tsier for the valuable 1160 help 1162 Special thanks to Bruno Decraene for the valuable comments 1164 This document was prepared using 2-Word-v2.0.template.dot. 1166 Authors' Addresses 1168 Ahmed Bashandy 1169 Cisco Systems 1170 170 West Tasman Dr, San Jose, CA 95134, USA 1171 Email: bashandy@cisco.com 1173 Clarence Filsfils 1174 Cisco Systems 1175 Brussels, Belgium 1176 Email: cfilsfil@cisco.com 1178 Prodosh Mohapatra 1179 Sproute Networks 1180 Email: mpradosh@yahoo.com