idnits 2.17.1 draft-ietf-mpls-seamless-mpls-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 14, 2014) is 3717 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'I-D.ietf-rtgwg-bgp-pic' is defined on line 1746, but no explicit reference was found in the text == Unused Reference: 'RFC7032' is defined on line 1785, but no explicit reference was found in the text == Outdated reference: A later version (-03) exists of draft-minto-2547-egress-node-fast-protection-02 -- Obsolete informational reference (is this intentional?): RFC 3107 (Obsoleted by RFC 8277) Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 MPLS Working Group N. Leymann, Ed. 3 Internet-Draft Deutsche Telekom AG 4 Intended status: Informational B. Decraene 5 Expires: August 18, 2014 Orange 6 C. Filsfils 7 M. Konstantynowicz, Ed. 8 Cisco Systems 9 D. Steinberg 10 Steinberg Consulting 11 February 14, 2014 13 Seamless MPLS Architecture 14 draft-ietf-mpls-seamless-mpls-06 16 Abstract 18 This documents describes an architecture which can be used to extend 19 MPLS networks to integrate access and aggregation networks into a 20 single MPLS domain ("Seamless MPLS"). The Seamless MPLS approach is 21 based on existing and well known protocols. It provides a highly 22 flexible and a scalable architecture and the possibility to integrate 23 100.000 of nodes. The separation of the service and transport plane 24 is one of the key elements; Seamless MPLS provides end to end service 25 independent transport. Therefore it removes the need for service 26 specific configurations in network transport nodes (without end to 27 end transport MPLS, some additional services nodes/configurations 28 would be required to glue each transport domain). This draft defines 29 a routing architecture using existing standardized protocols. It 30 does not invent any new protocols or defines extensions to existing 31 protocols. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 18, 2014. 50 Copyright Notice 52 Copyright (c) 2014 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 69 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 70 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 5 71 2.1. Why Seamless MPLS . . . . . . . . . . . . . . . . . . . . 6 72 2.2. Use Case #1 . . . . . . . . . . . . . . . . . . . . . . . 7 73 2.2.1. Description . . . . . . . . . . . . . . . . . . . . . 7 74 2.2.2. Typical Numbers . . . . . . . . . . . . . . . . . . . 10 75 2.3. Use Case #2 . . . . . . . . . . . . . . . . . . . . . . . 10 76 2.3.1. Description . . . . . . . . . . . . . . . . . . . . . 10 77 2.3.2. Typical Numbers . . . . . . . . . . . . . . . . . . . 12 78 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 12 79 3.1. Overall . . . . . . . . . . . . . . . . . . . . . . . . . 13 80 3.1.1. Access . . . . . . . . . . . . . . . . . . . . . . . 13 81 3.1.2. Aggregation . . . . . . . . . . . . . . . . . . . . . 13 82 3.1.3. Core . . . . . . . . . . . . . . . . . . . . . . . . 14 83 3.2. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 14 84 3.3. Availability . . . . . . . . . . . . . . . . . . . . . . 14 85 3.4. Scalability . . . . . . . . . . . . . . . . . . . . . . . 15 86 3.5. Stability . . . . . . . . . . . . . . . . . . . . . . . . 15 87 4. Architecture . . . . . . . . . . . . . . . . . . . . . . . . 15 88 4.1. Overall . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 4.2. Multi-Domain MPLS networks . . . . . . . . . . . . . . . 15 90 4.3. Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . 16 91 4.4. Intra-Domain Routing . . . . . . . . . . . . . . . . . . 16 92 4.5. Inter-Domain Routing . . . . . . . . . . . . . . . . . . 17 93 4.6. Access . . . . . . . . . . . . . . . . . . . . . . . . . 17 94 5. Deployment Scenarios . . . . . . . . . . . . . . . . . . . . 17 95 5.1. Deployment Scenario #1 . . . . . . . . . . . . . . . . . 17 96 5.1.1. Overview . . . . . . . . . . . . . . . . . . . . . . 18 97 5.1.2. General Network Topology . . . . . . . . . . . . . . 18 98 5.1.3. Hierarchy based on recursive BGP labeled route lookup 19 99 5.1.4. Intra-Area Routing . . . . . . . . . . . . . . . . . 19 100 5.1.4.1. Core . . . . . . . . . . . . . . . . . . . . . . 19 101 5.1.4.2. Aggregation . . . . . . . . . . . . . . . . . . . 19 102 5.1.5. Access . . . . . . . . . . . . . . . . . . . . . . . 20 103 5.1.5.1. LDP Downstream-on-Demand (DoD) . . . . . . . . . 20 104 5.1.6. Inter-Area Routing . . . . . . . . . . . . . . . . . 21 105 5.1.7. Labeled iBGP next-hop handling . . . . . . . . . . . 22 106 5.1.8. Network Availability . . . . . . . . . . . . . . . . 22 107 5.1.8.1. IGP Convergence . . . . . . . . . . . . . . . . . 23 108 5.1.8.2. Per-Prefix LFA FRR . . . . . . . . . . . . . . . 23 109 5.1.8.3. Hierarchical Dataplane and BGP Prefix Independent 110 Convergence . . . . . . . . . . . . . . . . . . . 24 111 5.1.8.4. BGP Egress Node FRR . . . . . . . . . . . . . . . 24 112 5.1.8.5. Assessing loss of connectivity upon any failure . 25 113 5.1.8.6. Network Resiliency and Simplicity . . . . . . . . 29 114 5.1.8.7. Conclusion . . . . . . . . . . . . . . . . . . . 30 115 5.1.9. BGP Next-Hop Redundancy . . . . . . . . . . . . . . . 30 116 5.2. Scalability Analysis . . . . . . . . . . . . . . . . . . 31 117 5.2.1. Control and Data Plane State for Deployment Scenario 118 #1 . . . . . . . . . . . . . . . . . . . . . . . . . 31 119 5.2.1.1. Introduction . . . . . . . . . . . . . . . . . . 31 120 5.2.1.2. Core Domain . . . . . . . . . . . . . . . . . . . 31 121 5.2.1.3. Aggregation Domain . . . . . . . . . . . . . . . 33 122 5.2.1.4. Summary . . . . . . . . . . . . . . . . . . . . . 34 123 5.2.1.5. Numerical application for use case #1 . . . . . . 35 124 5.2.1.6. Numerical application for use case #2 . . . . . . 35 125 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 126 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 127 8. Security Considerations . . . . . . . . . . . . . . . . . . . 36 128 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 37 129 9.1. Normative References . . . . . . . . . . . . . . . . . . 37 130 9.2. Informative References . . . . . . . . . . . . . . . . . 37 131 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39 133 1. Introduction 135 MPLS as a mature and well known technology is widely deployed in 136 today's core and aggregation/metro area networks. Many metro area 137 networks are already based on MPLS delivering Ethernet services to 138 residential and business customers. Until now those deployments are 139 usually done in different domains; e.g. core and metro area networks 140 are handled as separate MPLS domains. 142 Seamless MPLS extends the core domain and integrates aggregation and 143 access domains into a single MPLS domain ("Seamless MPLS"). This 144 enables a very flexible deployment of an end to end service delivery. 145 In order to obtain a highly scalable architecture Seamless MPLS takes 146 into account that typical access devices (DSLAMs, MSAN) are lacking 147 some advanced MPLS features, and may have more scalability 148 limitations. Hence access devices are kept as simple as possible. 150 Seamless MPLS is not a new protocol suite but describes an 151 architecture by deploying existing protocols like BGP, LDP and ISIS. 152 Multiple options are possible and this document aims at defining a 153 single architecture for the main function in order to ease 154 implementation prioritization and deployments in multi vendor 155 networks. Yet the architecture should be flexible enough to allow 156 some level of personalization, depending on use cases, existing 157 deployed base and requirements. Currently, this document focus on 158 end to end unicast LSP. 160 1.1. Requirements Language 162 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 163 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 164 document are to be interpreted as described in RFC 2119 [RFC2119]. 166 1.2. Terminology 168 This document uses the following terminology 170 o Access Node (AN): An access node is a node which processes 171 customers frames or packets at Layer 2 or above. This includes 172 but is not limited to DSLAMs or OLTs (in case of (G)PON 173 deployments). Access nodes have only limited MPLS functionalities 174 in order to reduce complexity in the access network. 176 o Aggregation Node (AGN): An aggregation node (AGN) is a node which 177 aggregates several access nodes (ANs). 179 o Area Border Router (ABR): Router between aggregation and core 180 domain. 182 o Deployment Scenario: Describes which an implementation of Seamless 183 MPLS in order to fullfil the requirements derived from one or more 184 use cases. 186 o Seamless MPLS Domain: A set of MPLS equipments which can set MPLS 187 LSPs between them. 189 o Transport Node (TN): Transport nodes are used to connect access 190 nodes to service nodes, and services nodes to services nodes. 191 Transport nodes ideally have no customer or service state and are 192 therefore decoupled from service creation. 194 o Seamless MPLS (S-MPLS): Used as a generic term to describe an 195 architecture which integrates access, aggregation and core network 196 in a single MPLS domain. 198 o Service Node (SN): A service node is used to create services for 199 customers and is connected to one or more transport nodes. 200 Typical examples include Broadband Network Gateways (BNGs), video 201 servers 203 o Transport Pseudo Wire (T-PW): A transport pseudowire provides 204 service independent transport mechanisms based on Pseudo-Wires 205 within the Seamless MPLS architecture. 207 o Use Case: Describes a typical network including service creation 208 points in order to describe the requirments, typical numbers etc. 209 which need to be taken into account when applying the Seamless 210 MPLS architecture. 212 2. Motivation 214 MPLS is deployed in core and aggregation network for several years 215 and provides a mature and stable basis for large networks. In 216 addition MPLS is already used in access networks, e.g. such as mobile 217 or DSL backhaul. Today MPLS as technology is being used on two 218 different layers: 220 o the Transport Layer and 222 o the Service Layer (e.g. for MPLS VPNs) 224 In both cases the protocols and the encapsulation are identical but 225 the use of MPLS is different especially concerning the signalling, 226 the control plane, the provisioning, the scalability and the 227 frequency of updates. On the service layer only service specific 228 information is exchanged; every service can potentially deploy it's 229 own architecture and individual protocols. The services are running 230 on top of the transport layer. Nevertheless those deployments are 231 usually isolated, focussed on a single use case and not integrated 232 into an end-to-end manner. 234 The motivation of Seamless MPLS is to provide an architecture which 235 supports a wide variety of different services on a single MPLS 236 platform fully integrating access, aggregation and core network. The 237 architecture can be used for residential services, mobile backhaul, 238 business services and supports fast reroute, redundancy and load 239 balancing. Seamless MPLS provides the deployment of service creation 240 points which can be virtually everywhere in the network. This 241 enables network and service providers with a flexible service and 242 service creation. Service creation can be done based on the existing 243 requirements without the needs for dedicated service creation areas 244 on fixed locations. With the flexibility of Seamless MPLS the 245 service creation can be done anywhere in the network and easily moved 246 between different locations. 248 2.1. Why Seamless MPLS 250 Multiple Service Providers plan to deploy networks with 10k to 100k 251 MPLS nodes, with varying levels of MPLS LSP connectivity between 252 those nodes - sparse-mesh in access, partial-mesh in aggregation and 253 full-mesh in core. This is typically at least one order of magnitude 254 higher than current deployments and may require a new architecture. 255 Multiple options are possible and it makes sense for the industry 256 (both vendors and SP) to restrict the options in order to ease the 257 first deployments (e.g. restrict the number of options to implement 258 and/or scales for vendors, reduce interoperability and debugging 259 issues for SP). 261 Many aggregation networks are already deploying MPLS but are limited 262 to the use of MPLS per aggregation area. Those MPLS based 263 aggregation domains are connected to a core network running MPLS as 264 well. Nevertheless most of the services are not limited to an 265 aggregation domain but running between several aggregation domains 266 crossing the core network. In the past it was necessary to provide 267 connectivity between the different domains and the core on a per 268 service level and not based on MPLS (e.g. by deploying native IP- 269 Routing or Ethernet based technologies between aggregation and core). 270 In most cases service specific configurations on the border nodes 271 between core and aggregation were required. New services led to 272 additional configurations and changes in the provisioning tools (see 273 Figure 1). 275 With Seamless MPLS there are no technology boundaries and no topology 276 boundaries for the services. Network (or region) boundaries are for 277 scaling and manageability, and do not affect the service layer, since 278 the Transport Pseudowire that carries packets from the AN to the SN 279 doesn't care whether it takes two hops or twenty, nor how many region 280 boundaries it needs to cross. The network architecture is about 281 network scaling, network resilience and network manageability; the 282 service architecture is about optimal delivery: service scaling, 283 service resilience (via replicated SNs) and service manageability. 284 The two are decoupled: each can be managed separately and changed 285 independently. 287 +--------------+ +--------------+ +--------------+ 288 | Aggregation | | Core | | Aggregation | 289 | Domain #1 +---------+ Domain +---------+ Domain #2 | 290 | MPLS | ^ | MPLS | ^ | MPLS | 291 +--------------+ | +--------------+ | +--------------+ 292 | | 293 +------ service specific ------+ 294 configuration 296 Figure 1: Service Specific Configurations 298 One of the main motivations of Seamless MPLS is to get rid of service 299 specific configurations between the different MPLS islands. Seamless 300 MPLS connects all MPLS domains on the MPLS transport layer providing 301 a single transport layer for all services - independent of the 302 service itself. The Seamless MPLS architecture therefore decuples 303 the service and transport layer and integrates access, aggregation 304 and core into a single platform. One of the big advantages is that 305 problems on the transport layer only need to be solved once (and the 306 solutions are available to all services). With Seamless MPLS it is 307 not necessary to use service specific configurations on intermediate 308 nodes; all services can be deployed in an end to end manner. 310 2.2. Use Case #1 312 2.2.1. Description 314 In most cases at least residential and business services need to be 315 supported by a network. This section describes a Seamless MPLS use 316 case which supports such a scenario. The use case includes point to 317 point services for business customers as well as typical service 318 creation for residential customers. 320 +-------------+ 321 | Service | 322 | Creation | 323 | Residential | 324 | Customers | 325 +------+------+ 326 | 327 | 328 | 329 PW1 +-------+ +---+---+ 330 ######################### | 331 # +--+ AGN11 +---+ AGN21 + +------+ 332 # / | | /| |\ | | +--------+ 333 +--#-+/ +-------+\/ +-------+ \| | | remote | 334 | AN | /\ + CORE +---......--+ AN | 335 +--#-+\ +-------+ \+-------+ /| | ####### | 336 # \ | | | |/################### +--------+ 337 # +--+ AGN12 +---+ AGN22 +##+------+ P2P Business Service 338 ############################## 339 PW2 +-------+ +-------+ 341 Figure 2: Use Case #1: Service Creation 343 Figure 2 shows the different service creation points and the 344 corresponding pseudowires between the access nodes and the service 345 creation points. The use case does not show all PWs (e.g. not the 346 PWs needed to support redundancy) in order to keep the figure simple. 347 Node and link failures are handled by rerouting the PWs (based on 348 standard mechanisms). End customers (either residential or business 349 customers) are connected to the access nodes using a native 350 technology like Ethernet. The access nodes terminates the PW(s) 351 carrying the traffic for the end customers. The link between the 352 access node (AN) and the aggregation node (AGN) is the first MPLS 353 enabled link. 355 Residential Services: The service creation for all residential 356 customers connected to the Access Nodes in an aggregation domain 357 is located on an Service Node connected to the AGN2x. The PW (PW1) 358 originated at the AN and terminates at the AGN2. A second PW is 359 deployed in the case where redundancy is needed on the AN (the 360 figure shows redundancy but this might not be the case for all ANs 361 in this Use Case). Additonal PWs can be deployed as well in case 362 more than a single service creation is needed for the residential 363 service (e.g. one service creation point for Internet access and a 364 second service creation point for IPTV services). 366 Business Sercvices: For business services the use cases shows point 367 to point connections between two access nodes. PW2 originates at 368 the AN and terminates on the remote AN crossing two aggregation 369 areas and the core network. If the access node needs connections 370 to several remote ANs the corresponding number of PWs will be 371 originated at the AN. Nevertheless taking the number of ports 372 available and the number of business customers on a typical access 373 node the number of PWs will be relatively small. 375 +-------+ +-------+ +------+ +------+ 376 | | | | | | | | 377 +--+ AGN11 +---+ AGN21 +---+ ABR1 +---+ LSR1 +--> to AGN 378 / | | /| | | | | | 379 +----+/ +-------+\/ +-------+ +------+ /+------+ 380 | AN | /\ \/ 381 +----+\ +-------+ \+-------+ +------+/\ +------+ 382 \ | | | | | | \| | 383 +--+ AGN12 +---+ AGN22 +---+ ABR2 +---+ LSR2 +--> to AGN 384 | | | | | | | | 385 +-------+ +-------+ +------+ +------+ 387 static route ISIS L1 LDP ISIS L2 LDP 389 <-Access-><--Aggregation Domain--><---------Core---------> 391 Figure 3: Use Case #1: Redundancy 393 Figure 3 shows the redundancy at the access and aggregation network 394 deploying a two stage aggregation network (AGN1x/AGN2x). 395 Nevertheless redundancy is not a must in this use case. It is also 396 possible to use non redundant connection between the ANs and AGN1 397 stage and/or between the AGN1 and AGN2 stages. The AGN2x stage is 398 used to aggregate traffic from several AGN1x pairs. In this use case 399 an aggregation domain is not limited to the use of a single pair of 400 AGN2x; the deployment of several AGN2 pairs within the domain is also 401 supported. As design goal for the scalability of the routing and 402 forwarding within the Seamless MPLS architecture the following 403 numbers are used: 405 o Number of Aggregation Domains: 100 407 o Number of Backbone Nodes: 1.000 409 o Number of Aggregation Nodes: 10.000 411 o Number of Access Nodes: 100.000 413 The access nodes (AN) are dual homed to two different aggregation 414 nodes (AGN11 and AGN12) using static routing entries on the AN. The 415 ANs are always source or sink nodes for MPLS traffic but not transit 416 nodes. This allows a light MPLS implementation in order to reduce 417 the complexity in the AN. The aggregation network consists of two 418 stages with redundant connections between the stages (AGN11 is 419 connected to AGN21 and AGN22 as well as AGN12 to AGN21 and AGN22). 420 The gateway between the aggregation and core network is realized 421 using the Area Border Routers (ABR). From the perspective of the 422 MPLS transport layer all systems are clearly identified using the 423 loopback address of the system. An ingress node must be able to 424 establish a service to an arbitrary egress system by using the 425 corresponding MPLS transport label 427 2.2.2. Typical Numbers 429 Table 1 shows typical numbers which are expected for Use Case #1 430 (access node). 432 +--------------------+---------------+ 433 | Parameter | Typical Value | 434 +--------------------+---------------+ 435 | IGP RIB Entries | 2 | 436 | IP FIB Entries | 2 | 437 | LDP LIB Entries | 200 | 438 | MPLS NHLFE Entries | 200 | 439 | MPLS ILM Entries | 0 | 440 | BGP RIB Entries | 0 | 441 | BGP FIB Entries | 0 | 442 +--------------------+---------------+ 444 Table 1: Use Case #1: Typical Numbers for Access Node 446 2.3. Use Case #2 448 2.3.1. Description 450 In most cases, residential, wholesales and business services need to 451 be supported by the network. 453 +-------------+ 454 | Service | 455 | platforms | 456 |(VoIP, VoD..)| 457 | Residential | 458 | Customers | 459 +------+------+ 460 | 461 | 462 +---+ +-----+ +--+--+ +-----+ 463 |AN1|----+AGN11+--+AGN21+---+ ABR | 464 +---+ +--+--+ +--+--+ +--+--+ 465 | | | 466 +---+ +--+--+ | | +----+ 467 |AN2|----+AGN12+ | | --+ PE | 468 +---+ +--+--+ | | +----+ 469 | | | 470 . | | 471 . | | 472 . | | 473 | | | 474 +---+ +---+ +--+--+ +--+--+ +--+--+ 475 |AN4+---+AN3|----+AGN1x+--+AGN22+---+ ABR | 476 +---+ +---+ +-----+ +-----+ +-----+ 478 <-Access-><--Aggregation Domain--><---------Core---------> 480 Figure 4: Use Case #2 482 The above topology (see Figure 4) is subject to evolutions, depending 483 on AN types and capacities (in terms of number of customers and/or 484 aggregated bandwidth). For examples, AGN1x connection toward AGN2y 485 currently forms a ring but may latter evolve in a square or triangle 486 topology; AGN2y nodes may not be present... 488 Most access nodes (AN) are single attached on one aggregation node 489 using static routing entries on the AN and AGN. Some AN, are dual 490 attached on two different AGN using static routes. Some AN are used 491 as transit by some lower level AN. Static routes are expected to be 492 used between those AN. 494 IPv4, IPv6 and MPLS interconnection between the aggregation and core 495 network is realized using the Area Border Routers (ABR). Any ingress 496 node must be able to establish IPv4, IPv6 and MPLS connections to any 497 egress node in the seamless MPLS domain. 499 Regarding MPLS connectivity requirements, a full mesh of MPLS LSPs is 500 required between the ANs of an aggregation area, at least for 6PE 501 purposes. Some additional LSPs are needed between ANs and some PE in 502 the aggregation area or in the core area for access to services, 503 wholesale and enterprises services. In short, a meshing of LSP is 504 required between the AGN of the whole seamless MPLS domain. Finally, 505 LSP between any node to any node should be possible. 507 From a scalability standpoint, the following numbers are the targets: 509 o Number of Aggregation Domains: 30 511 o Number of Backbone Nodes: 150 513 o Number of Aggregation Nodes: 1.500 515 o Number of Access Nodes: 40.000 517 2.3.2. Typical Numbers 519 Table 2 shows typical numbers which are expected for Use Case #2 for 520 the purpose of establishing the transport LSPs. They do not take 521 into account the services built in addition. (e.g. 6PE will require 522 additional IPv6 routes). 524 +--------------------+---------------+ 525 | Parameter | Typical Value | 526 +--------------------+---------------+ 527 | IGP RIB Entries | 2 | 528 | IP FIB Entries | 2 | 529 | LDP LIB Entries | 1,400 | 530 | MPLS NHLFE Entries | 1,400 | 531 | MPLS ILM Entries | 1,400 | 532 +--------------------+---------------+ 534 Table 2: Use Case #2: Typical Numbers for Access Node 536 3. Requirements 538 The following section describes the overall requirements which need 539 to be fulfilled by the Seamless MPLS architecture. Beside the 540 general requirements of the architecture itself there are also 541 certain requirements which are related to the different network 542 nodes. 544 o End to End Transport LSP: MPLS based services (pseudowire based, 545 L3-VPN or IP) SHALL be provided by the Seamless MPLS based 546 infrastructure between any nodes. 548 o Scalability: The network SHALL be scalable to the minimum of 549 100.000 nodes. 551 o Fast convergence (sub second resilience) SHALL be supported. Fast 552 reroute (LFA) SHOULD be supported. 554 o Flexibility: The Seamless MPLS architecture SHALL be applied to a 555 wide variety of existing MPLS deployments. It SHALL use a 556 flexible approach deploying building blocks with the possiblity to 557 use certain features only if those features are needed (e.g. dual 558 homing ANs or fast reroute mechanisms). 560 o Service independence: Service and transport layer SHALL be 561 decoupled. The architecture SHALL remove the need for service 562 specific configurations on intermediate nodes. 564 o Native Multicast support: P2MP MPLS LSPs SHOULD be supported by 565 the Seamless MPLS architecture. 567 o Interoperable end to end OAM mechanisms SHALL be implemented 569 3.1. Overall 571 3.1.1. Access 573 In respect of MPLS functionality the access network should be kept as 574 simple as possible. Compared to the aggregation and/or core network 575 within Seamless MPLS a typical access node is less powerful. The 576 control plane and the forwarding should be as simple as possible. To 577 reduce the complexity and the costs of an access node not the full 578 MPLS functionality need to be supported (control and data plane). 579 The use of an IGP should be avoided. Static routing should be 580 sufficient. Required functionality to reach the required scalability 581 should be moved out of the access node. The number of access nodes 582 can be very high. The support of load balancing for layer 2 services 583 should be implemented. 585 3.1.2. Aggregation 587 The aggregation network aggregates traffic from access nodes. The 588 aggregation Node must have functionalities that enlarge the 589 scalability of the simple access nodes that are connected. The IGP 590 must be link state based. Each aggregation area must be a separated 591 area. All routes that are interarea should use an EGP to keep the 592 IGP small. The aggregation node must have the full scalability 593 concerning control plane and forwarding. The support of load 594 balancing for layer 2 services must be implemented. 596 3.1.3. Core 598 The core connects the aggregation areas. The core network elements 599 must have the full scalability concerning control plane and 600 forwarding. The IGP must be link state based. The core area must 601 not include routes from aggregation areas. All routes that are 602 interarea should use an EGP to keep the IGP small. Each area of the 603 link state based IGP should have less than 2000 routes. The support 604 of load balancing for layer 2 services must be implemented. 606 3.2. Multicast 608 Compared with unicast connectivity Multicast is more dynamic. User 609 generated messages - like joining or leaving multicast groups - are 610 interacting directly with network components in the access and 611 aggregation network (in order to build the corresponding forwarding 612 states). This leads to the need for a highly dynamic handling of 613 messages on access and aggregation nodes. Nevertheless the core 614 network SHOULD be stable and state changes triggered by user 615 generated messages SHOULD be minimized. This rises the need for an 616 hierarchy for the P2MP support in Seamless MPLS hiding the dynamic 617 behaviour of the access and aggregation nodes 619 o mLDP 621 o P2MP RSVP-TE 623 3.3. Availability 625 All network elements should be high available (99.999% availability). 626 Outage times should be as low as possible. A repair time of 50 627 milliseconds or less should be guarantied at all nodes and lines in 628 the network that are redundant. Fast convergence features SHOULD be 629 used in all control plane protocols. Local Repair functions SHOULD 630 be used wherever possible. Full redundancy is required at all 631 equipment that is shared in a network element. 633 o Power Supply 635 o Switch Fabric 637 o Routing Processor 639 A change from an active component to a standby component SHOULD 640 happen without effecting customers traffic. The Influence of 641 customer traffic MUST be as low as possible. 643 3.4. Scalability 645 The network must be highly scalable. Based on the use cases 646 described in Sections 2.2 and 2.3, as a minimum requirement the 647 following scalability figures should be met: 649 o Number of aggregation domains: 100 651 o Number of backbone nodes: 1.000 653 o Number of aggregation nodes: 10.000 655 o Number of access nodes: 100.000 657 3.5. Stability 659 o The platform should be stable under certain circumstances (e.g. 660 missconfiguration within one area should not cause instability in 661 other areas). 663 o Differentiate between "All Loopbacks and Link addresses should be 664 ping able from every where." Vs. "Link addresses are not 665 necessary ping able from everywhere". 667 4. Architecture 669 4.1. Overall 671 One of the key questions that emerge when designing an architecture 672 for a seamless MPLS network is how to handle the sheer size of the 673 necessary routing and MPLS label information control plane and 674 forwarding plane state resulting from the stated scalability goals 675 especially with respect to the total number of access nodes. This 676 needs to be done without overwhelming the technical scaling limits of 677 any of the involved nodes in the network (access, aggregation and 678 core) and without introducing too much complexity in the design of 679 the network while at the same time still maintaining good convergence 680 properties to allow for quick MPLS transport and service restoration 681 in case of network failures. 683 4.2. Multi-Domain MPLS networks 685 The key design paradigm that leads to a sound and scalable solution 686 is the divide and conquer approach, whereby the large problem is 687 decomposed into many smaller problems for which the solution can be 688 found using well-known standard architectures. 690 In the specific case of seamless MPLS the overall MPLS network SHOULD 691 be decomposed into multiple MPLS domains, each well within the 692 scaling limits of well-known architectures and network node 693 implementations. From an organizational and operational point of 694 view it MAY make sense to define the boundaries of such domains along 695 the pre-existing boundaries of aggregation networks and the core 696 network. 698 Examples of how networks can be decomposed include using IGP areas as 699 well as using multiple BGP autonomous systems. 701 4.3. Hierarchy 703 These MPLS domains SHOULD then be then be connected into an MPLS 704 multi-domain network in a hierarchical fashion that enables the 705 seamless exchange of loopback addresses and MPLS label bindings for 706 transport LSPs across the entire MPLS internetwork while at the same 707 time preventing the flooding of unnecessary routing and label binding 708 information into domains or parts of the network that do not need 709 them. Such a hierarchical routing and forwarding concept allows a 710 scalability in different dimensions and allows to hide the complexity 711 and size of the aggregation and access networks. 713 4.4. Intra-Domain Routing 715 The intra-domain routing within each of the MPLS domains (i.e. 716 aggregation domains and core) SHOULD utilize standard IGP protocols 717 like OSPF or ISIS. By definition, each of these domains is small 718 enough so that there are no relevant scaling limits within each IGP 719 domain, given well-known state-of-the-art IGP design principles and 720 recent router technology. 722 The intra-domain MPLS LSP setup and label distribution SHOULD utilize 723 standard protocols like LDP or RSVP. 725 Note that this document describes the design based on LDP, LDP 726 Downstream-on-Demand and labeled BGP due to the higher degree of out- 727 of-the-box automation and operational simplicity as well as 728 compatibility with the existing backbone and backhaul designs & 729 deployments which use LDP and not RSVP-TE. It also assumes 730 relatively simple MPLS implementations on access nodes. The protocol 731 choices for the design described in this document have been driven by 732 the actual SP deployments. Design based on the hierarchy of RSVP-TE 733 LSPs may be an alternative, but has not been considered in this 734 document. 736 4.5. Inter-Domain Routing 738 The inter-domain routing is responsible for establishing connectivity 739 between and across all MPLS domains. The inter-domain routing SHOULD 740 establish a routing and forwarding hierarchy in order to achieve the 741 scaling goals of seamless MPLS. Note that the IP aggregation usually 742 performed between region (IGP areas/AS) in IP routing does not work 743 for MPLS as MPLS is not capable of aggregating FEC (because MPLS 744 forwarding use an exact match lookup, while IP uses longest match). 746 Therefore it is RECOMMENDED to utilize protocols that support 747 indirect next-hops ( e.g. using BGP to carry MPLS label information 748 [RFC3107] ). The mechanism for the LSP forwarding hierarchy is 749 described in Section 5.3. 751 4.6. Access 753 Compared to the aggregation and core parts of the Seamless MPLS 754 network the access part is special in two respects: 756 o The number of nodes in the access is at least one order of 757 magnitude higher than in any other part of the network. 759 o Because of the large quantity of access nodes, the cost of these 760 nodes is extremely relevant for the overall costs of the entire 761 network, i.e. acess nodes are very cost sensitive. 763 This makes it desirable to design the architecture such that the AN 764 functionality can be kept as simple as possible. This should always 765 be kept in mind when evaluating different seamless MPLS 766 architectures. The goal is to limit both the number of different 767 protocols needed on the AN as well as the scale to which each 768 protocol must perform to the absolute minimum. 770 5. Deployment Scenarios 772 This section describes the deployment scenarios based on the use 773 cases and the generic architecture above. 775 5.1. Deployment Scenario #1 777 Section describing the Seamless MPLS implementation of a large 778 european ISP. 780 5.1.1. Overview 782 This deployment scenario describes one way to implement a seamless 783 MPLS architecture. Specific to this implementation is the choice of 784 intra- and inter-domain routing and label distribution protocols, as 785 well as the details of the interworking of these protocols to achieve 786 the overall scalable hierarchical architecture. 788 5.1.2. General Network Topology 790 There are multiple aggregation domains (in the order of up to 100) 791 connected to the core in a star topology, i.e. aggregation domains 792 are never connected among themselves, but only to the core. The core 793 has its own domain. 795 +-------+ +-------+ +------+ +------+ 796 | | | | | | | | 797 +--+ AGN11 +---+ AGN21 +---+ ABR1 +---+ LSR1 +--> to AGN 798 / | | /| | | | | | 799 +----+/ +-------+\/ +-------+ +------+ /+------+ 800 | AN | /\ \/ | 801 +----+\ +-------+ \+-------+ +------+/\ +------+ 802 \ | | | | | | \| | 803 +--+ AGN12 +---+ AGN22 +---+ ABR2 +---+ LSR2 +--> to AGN 804 | | | | | | | | 805 +-------+ +-------+ +------+ +------+ 807 static route ISIS L1 LDP ISIS L2 LDP 809 <-Access-><--Aggregation Domain--><---------Core---------> 811 Figure 5: Deployment Scenario #1 813 As shown in Figure 5, the access nodes (AN) are connected to the 814 aggregation network via aggregation nodes called AGN1x, either to a 815 single AGN1x or redundantly to two AGN1x. Each AGN1x has redundant 816 uplinks to a pair of second-level aggregation nodes called AGN2x. 818 Each aggregation domain is connected to the core via exactly two 819 border routers (ABR) on the core side. There can be multiple AGN2 820 pairs per aggregation domain, but only one ABR pair for each 821 aggregation domain. Each of the AGN2 in an AGN2 pair connects to one 822 of the ABRs in the ABR pair responsible for that aggregation domain. 824 The ABRs on the core side have redundant connections to a pair of LSR 825 routers. 827 The LSR pair is also connected via a direct link. 829 The core LSR are connected to other core LSR in a partly meshed 830 topology so that there are disjunct, redundant paths from each LSR to 831 each other LSR. 833 5.1.3. Hierarchy based on recursive BGP labeled route lookup 835 Inline with the explanation in section 4.5, LSP hierarchy is key to a 836 scalable seamless MPLS architecture. 838 The LSP hierarchy in this design is achieved by: 840 o Forming separate MPLS domains for aggregation and core areas. 842 o Intra-domain LSP connectivity provided by combination of IS-IS (as 843 the intra-domain link-state routing protocol) and LDP (used for 844 MPLS label distribution for intra-domain LSPs). 846 o Inter-domain LSP connectivity provided by labeled BGP [RFC3107] 847 (used for MPLS label distribution for inter-domain LSP FECs) and 848 relying on IS-IS and LDP for intra-domain LSP connectivity between 849 the LSR labeled BGP speakers (AGNs and ABRs). Note that the MPLS 850 core notes are not carrying the labeled BGP routes. 852 The aggregation and core MPLS domains are mapped to IS-IS areas as 853 follows: Aggregation domains are mapped to IS-IS L1 areas. The core 854 is configured as IS-IS L2. The border routers connecting aggregation 855 and core are IS-IS L1L2 and are referred to as ABRs. From a 856 technical and operational point of view these ABRs are part of the 857 core, although they also belong to the respective aggregation domain 858 purely from a routing protocol point of view. 860 5.1.4. Intra-Area Routing 862 5.1.4.1. Core 864 The core uses ISIS L2 to distribute routing information for the 865 loopback addresses of all core nodes. The border routers (ABR) that 866 connect to the aggregation domains are also part of the respective 867 aggregation ISIS L1 area and hence ISIS L1L2. 869 LDP is used to distribute MPLS label binding information for the 870 loopback addresses of all core nodes. 872 5.1.4.2. Aggregation 874 The aggregation domains uses ISIS L1 as intra-domain routing 875 protocol. All AGN loopback addresses are carried in ISIS. 877 As in the core, the aggregation also uses LDP to distribute MPLS 878 label bindings for the loopback addresses. 880 5.1.5. Access 882 Access nodes do not have their own domain or IGP area. Instead, they 883 directly connect to the AGN1 nodes in the aggregation domain. To 884 keep access devices as simple as possible, ANs do not participate in 885 ISIS. 887 Instead, each AN has two static default routes pointing to each of 888 the AGN1 it is connected to. Appropriate techniques SHOULD be 889 deployed to make sure that a given default route is invalidated when 890 the link to an AGN1 or that node itself fails. Examples of such 891 techniques include monitoring the pysical link state for loss of 892 light/loss of frame, or using Ethernet link OAM or BFD [RFC5881]. 894 The AGN1 MUST have a configured static route to the loopback address 895 of each of the ANs it is connected to, because it cannot learn the AN 896 loopback address in any other way. These static routes have to be 897 monitored and invalidated if necessary using the same techniques as 898 described above for the static default routes on the AN. 900 The AGN1 redistributes these routes into ISIS for intra-domain 901 reachability of all AN loopback addresses. 903 LDP is used for MPLS label distribution between AGN1 and AN. In 904 order to keep the AN control plane as lightweight as possible, and to 905 avoid the necessity for the AN to store 100.000 MPLS label bindings 906 for each upstream AGN1 peer, LDP is deployed in downstream-on-demand 907 (DoD) mode, described below. 909 To allow the label bindings received via LDP DoD to be installed into 910 the LFIB on the AN without having the specific host route to the 911 destination loopback address, but only a default route, use of the 912 LDP Extension for Inter-Area Label Switched Paths [RFC5283] is made. 914 5.1.5.1. LDP Downstream-on-Demand (DoD) 916 LDP downstream-on-demand mode is specified in [RFC5036]. In this 917 mode the upstream LSR will explicitly ask the downstream LSR for a 918 label binding for a particular FEC when needed. 920 The assumption is that a given AN will only have a limited number of 921 services configured to an even more limited number of destinations, 922 or egress LER. Instead of learning and storing all label bindings 923 for all possible loopback addresses within the entire Seamless MPLS 924 network, the AN will use LDP DoD to only request the label bindings 925 for the FECs corresponding to the loopback addresses of those egress 926 nodes to which it has services configured. 928 More detailed description of LDP DoD use cases for MPLS access and 929 list of required LDP DoD procedures in the context of Seamless MPLS 930 design is included in [I-D.ietf-mpls-ldp-dod]. 932 5.1.6. Inter-Area Routing 934 The inter-domain MPLS connectivity from the aggregation domains to 935 and across the core domain is realized primarily using BGP with MPLS 936 labels ("labled BGP/SAFI4" [RFC3107]). A very limited amount of 937 route leaking from ISIS L2 into L1 is also used. 939 All ABR and PE nodes in the core are part of the labeled iBGP mesh, 940 which can be either full mesh or based on route reflectors. These 941 nodes advertise their respective loopback addresses (which are also 942 carried in ISIS L2) into labeled BGP. 944 Each ABR node has labeled iBGP sessions with all AGN1 nodes inside 945 the aggregation domain that they connect to the core. Since there 946 are two ABR nodes per aggregation domain, this leads to each AGN1 947 node having an iBGP sessions with each of the two ABR. Note that the 948 use of iBGP implies that the entire seamless MPLS internetwork is 949 just a single AS to which all core and aggregation nodes belong. The 950 AGN1 nodes advertise their own loopback addresses into labeled BGP, 951 in addition to these loopbacks also being in ISIS L1. 953 Additionally the AGN1 nodes also redistribute all the statically 954 configured routes to the AN loopback addresses into labeled BGP. 955 Note that as stated obove, the AGN1 MUST ask the AN for label 956 bindings for the AN loopback FECs via LDP DoD in order to have a 957 valid labeled route with a non-null label. 959 This architecture results in carrying all loopbacks of all nodes 960 except pure P nodes (AN, AGN, ABR and core PE) in labeled BGP, e.g. 961 there will be in the order of 100.000 routes in labeled BGP when 962 approaching the stated scalability goal. Note that this only affects 963 the BGP RIB size and does not necessarily imply that any node needs 964 to actually have active forwarding state (LFIB) in the same order of 965 magnitude. In fact, as will be discussed in the scalability 966 analysis, no single node needs to install all labeled BGP routes into 967 the LFIB, but each node only needs a small percentage of the RIB as 968 active forwarding state in the LFIB. And from a RIB point of view, 969 BGP is known to scale to hundreds of thousands of routes. 971 5.1.7. Labeled iBGP next-hop handling 973 The ABR nodes run labeled iBGP both to the core mesh as well as to 974 the AGN1 nodes of their respective aggregation domains. Therefore 975 they operate as iBGP route reflectors, reflecting labeled routes from 976 the aggregation into the core and vice versa. 978 When reflecting routes from the core into the aggregation domain, the 979 ABR SHOULD NOT change the BGP NEXT-HOP addresses (next-hop- 980 unchanged). This is the usual behaviour for iBGP route reflection. 981 In order to make these routes resolvable to the AGN1 nodes inside the 982 aggregation domain, the ABR MUST leak all other ABR and core PE 983 loopback addresses from ISIS L2 into ISIS L1 of the aggregation 984 domain. Note that the number of leaked addresses is limited so that 985 the overall scalability of the seamless MPLS architecture is not 986 impacted. In the worst case all core loopback addresses COULD be 987 leaked into ISIS L1, but even that would not be a scalability 988 problem. 990 When reflecting routes from the aggregation into the core, the ABR 991 MUST set then BGP NEXT-HOP to its own loopback addresses (next-hop- 992 self). This is not the default behaviour for iBGP route reflection, 993 but requires special configuration on the ABR. Note that this also 994 implies that the ABR MUST allocate a new local MPLS label for each 995 labeled iBGP FEC that it reflects from the aggregation into the core. 996 This special next-hop handling is essential for the scalability of 997 the overall seamless MPLS architecture since it creates the required 998 hierarchy and enables the hiding of all aggregation and access 999 addresses behind the ABRs from an IGP point of view. Leaking of 1000 aggregation ISIS L1 loopback addresses into ISIS L2 is not necessary 1001 and MUST NOT be allowed. 1003 The resulting hierarchical inter-domain MPLS routing structure is 1004 similar to the one described in [RFC4364] section 10c, only that we 1005 use one AS with route reflection instead of using multiple ASes. 1007 5.1.8. Network Availability 1009 The seamless mpls architecture guarantees a sub-second loss of 1010 connectivity upon any link or node failures. Furthermore, in the 1011 vast majority of cases, the loss of connectivity is limited to sub- 1012 50msec. 1014 These network availability properties are provided without any 1015 degradation on scale and simplicity. This is a key achievement of 1016 the design. 1018 In the remainder of this section, we first introduce the different 1019 network availability technologies and then review their applicability 1020 for each possible failure scenario. 1022 5.1.8.1. IGP Convergence 1024 IGP convergence can be modelled as a linear process with an initial 1025 delay and a linear FIB update [ACM01]. 1027 The initial delay could conservatively be assumed to be 260msec: 1028 50msec to detect failures with BFD (most failures would be detected 1029 faster with loss of light for example or with faster BFD timers), 1030 50msec to throttle the LSP generation, 150msec to throttle the SPF 1031 computation (making sure than all the required LSP's are received 1032 even in case of SRLG failures) and 10msec for shortest-path-first 1033 tree computation. 1035 Assuming 250usec per update (conservative), this allows for 1036 (1000-260)/0.250= 2960 prefixes update within a second following the 1037 outage. More precisely, this allows for 2960 important IGP prefixes 1038 updates. Important prefixes are automatically classified by the 1039 router implementation through simple heuristic (/32 is more important 1040 than non-/32). 1042 The number of IGP important routes (loopbacks) in deployment case 1043 study 1 is much smaller than 2960, and hence sub-second IGP 1044 convergence is conservative. 1046 IGP convergence is a simple technology for the operator provided that 1047 the router vendor optimizes the default IGP behavior (no need to tune 1048 any arcane knob). 1050 5.1.8.2. Per-Prefix LFA FRR 1052 A per-prefix LFA for a destination D is a precomputed backup IGP 1053 nexthop for that destination. This backup IGP nexthop can be link 1054 protecting or node protecting [RFC5286]. 1056 The analysis of the applicability of Per-Prefix LFA in the deployment 1057 model 1 of Seamless MPLS architecture is straightforward thanks to 1058 [RFC6571]. 1060 In deployment model 1, each aggregation network either follows the 1061 triangle or full-mesh topology. Further more, the backbone region 1062 implements a dual-plane. As a consequence, the failure of any link 1063 or node within an aggregation domain is protected by LFA FRR (sub- 1064 50msec) for all impacted IGP prefixes, whether intra-area or inter- 1065 area. No uloop may form as a result of these failures [RFC6571]. 1067 Per-Prefix LFA FRR is generally assessed as a simple technology for 1068 the operator [RFC6571]. It certainly is in the context of deployment 1069 case study 1 as the designer enforced triangle and full-mesh 1070 topologies in the aggregation network as well as a dual-plane core 1071 network. 1073 5.1.8.3. Hierarchical Dataplane and BGP Prefix Independent Convergence 1075 In a hierarchical dataplane, the FIB used by the packet processing 1076 engine reflects recursions between the routes. For example, a BGP 1077 route B recursing on IGP route I whose best path is via interface O 1078 is encoded as a hierarchy of FIB entry B pointing to a FIB entry I 1079 pointing to a FIB entry 0. 1081 BGP Prefix Independent Convergence [BGP-PIC] extends the hierarchical 1082 dataplane with the concept of a BGP Path-List. A BGP path-list may 1083 be abstracted as a set of primary multipath nhops and a backup nhop. 1084 When the primary set is empty, packets destined to the BGP 1085 destinations are rerouted via the backup nhop. 1087 For complete description of BGP-PIC technology and its applicability 1088 please refer to [BGP-PIC] and [ABR-FRR]. 1090 Hierarchical data plane and BGP-PIC are very simple technologies to 1091 operate. Their applicability to any topology, any routing policy and 1092 any BGP unicast address family allows router vendors to enable this 1093 behavior by default. 1095 5.1.8.4. BGP Egress Node FRR 1097 BGP egress node FRR is a Fast ReRoute solution and hence relies on 1098 local protection and the precomputation and preinstallation of the 1099 backup path in the FIB. BGP egress node FRR relies on a transit LSR 1100 ( Point of Local Repair, PLR ) adjacent to the failed protected BGP 1101 router to detect the failure and re-route the traffic to the backup 1102 BGP router. Number of BGP egress node FRR schemes are being 1103 investigated: [PE-FRR], [ABR-FRR], 1104 [I-D.minto-2547-egress-node-fast-protection], 1105 [I-D.bashandy-bgp-edge-node-frr], 1106 [I-D.bashandy-idr-bgp-repair-label], [I-D.bashandy-mpls-ldp-bgp-frr], 1107 [I-D.bashandy-bgp-frr-mirror-table], 1108 [I-D.bashandy-bgp-frr-vector-label], 1109 [I-D.bashandy-isis-bgp-edge-node-frr]. 1111 Differences between these schemes relate to the way backup and 1112 protected BGP routers get associated, how the protected router's BGP 1113 state is signalled to the backup BGP router(s) and if any other state 1114 is required on protected, backup and PLR routers. The schemes also 1115 differ in compatibility with IP-FRR and TE-FRR schemes to enable PLR 1116 to switch traffic towards the backup BGP router in case of protected 1117 BGP router failure. 1119 In the Seamless MPLS design, BGP egress node FRR schemes can protect 1120 against the failures of PE, AGN and ABR nodes with no requirements on 1121 ingress routers. 1123 5.1.8.5. Assessing loss of connectivity upon any failure 1125 We select two typical traffic flows and analyze the loss of 1126 connectivity (LoC) upon each possible failure in the Seamless MPLS 1127 design in the deployment scenario #1. 1129 o Flow F1 starts from an AN1 in a left aggregation region and ends 1130 on an AN2 in a right aggregation region. Each AN is dual-homed to 1131 two AGN's. 1133 o Flow F2 starts from a CE1 homed on L3VPN PE1 connected to the core 1134 LSRs and ends at CE2 dual-homed to L3VPN PE2 and PE3, both 1135 connected to the core LSRs. 1137 Note that due to the symmetric network topology in case study 1, uni- 1138 directional flows F1' and F2', associated with F1 and F2 and 1139 forwarded in the reversed direction (AN2 to AN1 right-to-left and PE2 1140 to PE1, respectively), take advantage of the same failure restoration 1141 mechanisms as F1 and F2. 1143 5.1.8.5.1. AN1-AGN link failure or AGN node failure 1145 F1 is impacted but LoC <50msec is possible assuming fast BFD 1146 detection and fast-switchover implementation on the AN. F2 is not 1147 impacted. 1149 5.1.8.5.2. Link or node failure within the left aggregation region 1151 F1 is impacted but LoC <50msec thanks to LFA FRR. No uloop will 1152 occur during the IGP convergence following the LFA protection. Note: 1153 if LFA is not available (other topology then case study one) or if 1154 LFA is not enabled, then the LoC would be < second as the number of 1155 impacted important IGP route in a seamless architecture is much 1156 smaller than 2960. 1158 F2 is not impacted. 1160 5.1.8.5.3. ABR node failure between left region and the core 1162 F1 is impacted but LoC <50msec thanks to LFA FRR. No uloop will 1163 occur during the IGP convergence following the LFA protection. 1165 Note: This case is also called "Local ABR failure" as the ABR which 1166 fails is the one connected to the aggregation region at the source of 1167 flow F1. 1169 Note: remember that the left region receives the routes to all the 1170 remote ABR's and that the labelled BGP routes are reflected from the 1171 core to the left region with next-hop unchanged. This ensures that 1172 the loss of the (local) ABR between the left region and the core is 1173 seen as an IGP route impact and hence can be addressed by LFA. 1175 Note: if LFA is not available (other topology then case study one) or 1176 if LFA is not enabled, then the LoC would be < second as the number 1177 of impacted important IGP routes in a seamless architecture is much 1178 smaller than 2960 routes. 1180 F2 is not impacted. 1182 5.1.8.5.4. Link or node failure within the core region 1184 F1 and F2 are impacted but LoC <50msec thanks to LFA FRR. 1186 This is specific to the particular core topology used in deployment 1187 case study 1. The core topology has been optimized [RFC6571] for LFA 1188 applicability. 1190 As explained in [RFC6571], another alternative to provide <50msec in 1191 this case consists in using an MPLS-TE full-mesh and MPLS-TE FRR. 1192 This is required when the designer is not able or does not want to 1193 optimize the topology for LFA applicability and he wants to achieve 1194 <50msec protection. 1196 Alternatively, simple IGP convergence would ensure a LoC < second as 1197 the number of impacted important IGP routes in a seamless 1198 architecture is much smaller than 2960 routes. 1200 5.1.8.5.5. PE2 failure 1202 F1 is not impacted. 1204 F2 is impacted and the LoC is sub-300msec thanks to IGP convergence 1205 and BGP PIC. 1207 The detection of the primary nhop failure (PE2 down) is performed by 1208 a single-area IGP convergence. 1210 In this specific case, the convergence should be much faster than 1211 90% of the IGP/BGP3107 footprint at least). 1338 If the guidelines cannot be met, then either the designer will rely 1339 on (1) augmenting native LFA coverage with remote LFA 1340 [I-D.ietf-rtgwg-remote-lfa], or (2) augmenting native LFA coverage 1341 with RSVP, or (3) a full-mesh TE FRR model, or (4) IGP convergence. 1342 The first option provides an automatic and fairly simple sub-50msec 1343 protection as LFA without introducing any additional protocols. The 1344 second option provides the same sub-50msec protection as LFA, but 1345 introduces additional RSVP LSPs. The thrid option optimizes for sub- 1346 50msec protection, but implies a more complex operational model. The 1347 fourth option optimizes for simple operation but only provides <1 sec 1348 protection. Up to each designer to arbitrate between these three 1349 options versus the possibility to engineer the topology for native 1350 LFA protection. 1352 A similar choice involves protection against ABR node failure and 1353 L3VPN PE node failure. The designer can either use BGP PIC or BGP 1354 egress node FRR. Up to each designer to asssess the trade-off 1355 between the valuation of sub-50msec instead of sub-1sec versus 1356 additional operational considerations related to BGP egress node FRR. 1358 5.1.8.7. Conclusion 1360 The Seamless MPLS architecture illustrated in deployment case study 1 1361 guarantees sub-50msec for majority of link and node failures by using 1362 LFA FRR, except ABR and L3PE node failures, and PE-CE link failure. 1364 L3VPN PE-CE link failure can be protected with sub-50msec 1365 restoration, by using hierarchical data plane and local-repair fast- 1366 reroute to the backup BGP nhop PE. 1368 ABR and L3PE node failure can be protected with sub-50msec 1369 restoration, by using BGP egress node FRR. 1371 Alternatively, ABR and L3PE node failure can be protected with sub- 1372 1sec restoration using BGP PIC. 1374 5.1.9. BGP Next-Hop Redundancy 1376 An aggregation domain is connected to the core network using two 1377 redundant area boarder routers, and MPLS hierarchy is applied on 1378 these ABRs. MPLS hierarchy helps scale the FIB but introduces 1379 additional complexity for the rerouting in case of ABR failure. 1380 Indeed ABR failure requires a BGP converge to update the inner MPLS 1381 hierarchy, in addition to the IGP converge to update the outer MPLS 1382 hierarchy. This is also expected to take more time as BGP 1383 convergence is performed after the IGP convergence and because the 1384 number of prefixes to update in the FIB can be significant. This is 1385 clearly a drawback, but the architecture allows for two "local 1386 protection" solutions which restore the traffic before the BGP 1387 convergence takes place. 1389 BGP PIC would be required on all edge LSR involved in the inner (BGP) 1390 MPLS hierarchy. Namely all routers except the AN which are not 1391 involved in the inner MPLS hierarchy. It involves pre-computing and 1392 pre-installing in the FIB the BGP backup path. Such back up path are 1393 activated when the IGP advertise the failure of the primary path. 1394 For specification see [BGP-PIC1, 2##]. 1396 BGP egress node FRR would be required on the egress LSR involved in 1397 the inner (BGP) MPLS hierarchy, namely AGN, ABR and L3VPN PEs. For 1398 specification see [PE-FRR], [ABR-FRR], [BGP-edge-FRR##]. 1400 Both approaches have their pros and cons, and the choice is left to 1401 each Service Provider or deployment based on the different 1402 requirements. The key point is that the seamless MPLS architecture 1403 can handle fast restoration time, even for ABR failures. 1405 5.2. Scalability Analysis 1407 5.2.1. Control and Data Plane State for Deployment Scenario #1 1409 5.2.1.1. Introduction 1411 Let's call: 1413 o #AN the number of Access Node (AN) in the seamless MPLS domain 1415 o #AGN the number of AGgregation Node (AGN) in the seamless MPLS 1416 domain 1418 o #Core the number of Core (Core) in the core network 1420 o #Area the number of aggregation routing domains. 1422 Let's take the following assumptions: 1424 o Aggregation equipments are equally spread across aggregation 1425 routing domains 1427 o the number of IGP links is three times the number of IGP nodes 1429 o the number of IGP prefixes is five times the number of IGP nodes 1430 (links prefixes + 2 loopbacks) 1432 o Each Access Node needs to have up to 1,000 (1k) LSPs. This is 1433 driven by the expected AN access line capacity, and a sum of LSPs 1434 required for connectivity to PE routers providing edge services as 1435 well as a remote ANs. 100 LSPs per AN (10% of total) are FECs 1436 which are outside of their routing domain. Those 100 remote FEC 1437 are the same for all Access Nodes of a given AGN. 1439 The following sections roughly evaluate the scalability, both in 1440 absolute numbers and relatively with the number of Access Node which 1441 is the biggest scalability factor. 1443 5.2.1.2. Core Domain 1445 The IGP & LDP core domain are not affected by the number of access 1446 nodes: 1448 IGP: 1450 node : #Core ~ o(1) 1452 links : 3*#Core ~ o(1) 1454 IP prefixes : 5*#Core ~ o(1) 1456 LDP FEC: 1458 #Core ~ o(1) 1460 Core TN FIBs grows linearly with the number of node in the core 1461 domain. In other word, they are not affected by AGN and AN nodes: 1463 Core TN: 1465 IP FIB : 5*#Core ~ o(1) 1467 MPLS ILM (LFIB) : #Core ~ o(1) 1469 BGP carries all AN routes which is significant. However, all AN 1470 routes are only needed in the control plane, possibly in a dedicated 1471 BGP Route Reflector (just like for BGP/MPLS VPNs) and not in the 1472 forwarding plane. The number of routes (100k) is smaller than the 1473 number of number of routes in the Internet (300k and rising) or in 1474 major VPN SP (>500k and rising) so the target can be handled with 1475 current implementations. In addition, AN routes are internal routes 1476 whose churn and instability is smaller and more under control than 1477 external routes. 1479 BGP Route Reflector (RR) 1481 NLRI : #AN ~ o(n) 1483 path : 2*#AN ~ o(2n) 1485 ABR handles both the core and aggregations routes. They do not 1486 depend on the total number of AN nodes, but only on the number of AN 1487 in their aggregation domain. 1489 ABR: 1491 IP FIB : 5*#Core + (5*#AGN + #AN) / #Area ~ o(#AN /#Area) 1493 MPLS ILM (LFIB) : #Core + (#AGN + #AN) / #Area ~ o(#AN / #Area) 1495 5.2.1.3. Aggregation Domain 1497 In the aggregation domain, IGP & LDP are not affected by the number 1498 of access nodes outside of their domain. They are not affected by 1499 the total number of AN nodes: 1501 IGP: 1503 node : #AGN / #Area ~ o(1) 1505 links : 3*#AGN / #Area ~ o(1) 1507 IP prefixes : #Core + #Area + (5*#AGN + #AN) / #Area ~ o(#AN *5 1508 / #Area) 1510 + + 1 loopback per core node + one aggregate per area + 5 1511 prefixes per AGN in the area + 1 prefix per AN in the area. 1513 LDP FEC: 1515 Core + (#AGN + #AN) / #Area ~ o(#AN / #Area) 1517 + + 1 loopback per core node + 1 loopback per AGN & AN node in 1518 the area. 1520 AGN FIBs grows with the number of node in the core area, in their 1521 aggregation area, plus the number of inter domain LSP required by the 1522 AN attached to them. They do not depend on the total number of AN 1523 nodes. In the BGP control plane, AGN also needs to handle all the AN 1524 routes. 1526 AGN: 1528 IP FIB : #Core + #Area + (5*#AGN + #AN) / #Area ~ o(#AN *5/ 1529 #Area) 1531 MPLS ILM (LFIB) : #Core + (#AGN + #AN) / #Area + 100 ~ o(#AN / 1532 #Area) 1534 AN FIBs grows with its connectivity requirement. They do not depend 1535 on the number of AN, AGN, SN or any others nodes. 1537 AN: 1539 IP RIB : 1 ~ o(1) 1541 MPLS LIB : 1k ~ o(1) 1543 IP FIB : 1 ~ o(1) 1545 MPLS ILM (LFIB) : 1k ~ o(1) 1547 5.2.1.4. Summary 1549 AN requirements are kept to a minimum. BGP is not required on ANs 1550 and the size of their FIB is driven only by their own connectivity 1551 requirements. In the FIB scale analysis described in sections 1552 5.2.1.x, it was assumed that any single AN will need no more than 1553 1,000 LSPs. This assumption is based on the expected AN access line 1554 capacity and LSPs required for connectivity to PE routers providing 1555 edge services as well as a sparse mesh of connectivity between ANs. 1557 In the core area, IGP and LDP are not affected by the nodes in the 1558 aggregation domains. In particular they do not grow with the number 1559 of AGNs or ANs. 1561 In the aggregation areas, IGP and LDP are affected by the number of 1562 core nodes and the number of AGNs and ANs in their area. They are 1563 not affected by the total number of AGNs or ANs in the seamless MPLS 1564 domain. 1566 No FIB of any node is required to handle the total number of AGNs or 1567 ANs in the Seamless MPLS domain. In other words, the number of AGNs 1568 and ANs in the Seamless MPLS domain is not a limiting factor, and the 1569 design can be scaled by growing the number of areas. The main 1570 limitation is the MPLS connectivity requirements on the AN, i.e. 1571 mainly the number of LSP needed per AN. Another limitation may be 1572 the number of different LSPs required by ANs attached to specific 1573 AGN. However, given the foreseen deployments and current AGN 1574 capabilities, this is not expected to be a limiting factor. 1576 In the control plane, BGP will typically handle all AN routes. This 1577 is expected to be substantial, but again the target deployment scale 1578 are well within the capabilities of current equipment . In addition, 1579 if required, additional techniques could be used to improve the 1580 scalability, based on the experience gained with scaling BGP/MPLS VPN 1581 (e.g. route partitioning between RR planes, route filtering (static 1582 or dynamic with ORF or route refresh) between ANs and on AGN to 1583 improve AGN scalability. 1585 5.2.1.5. Numerical application for use case #1 1587 As a recap, targets for deployment scenario 1 are: 1589 o Number of Aggregation Domains 100 1591 o Number of Backbone Nodes 1.000 1593 o Number of AGgregation Nodes 10.000 1595 o Number of Access Nodes 100.000 1597 This gives the following scaling numbers for each category of nodes: 1599 o AN IP FIB 1 1601 o AN MPLS ILM (LFIB) 1 000 1603 o AGN IP FIB 2 600 1605 o AGN MPLS ILM (LFIB) 2 200 1607 o ABR IP FIB 7 600 1609 o ABR MPLS ILM (LFIB) 2 100 1611 o TN IP FIB 5 000 1613 o TN MPLS ILM (LFIB) 1 000 1615 o RR BGP NLRI 100 000 1617 o RR BGP paths 200 000 1619 5.2.1.6. Numerical application for use case #2 1621 As a recap, targets for deployment scenario 1 are: 1623 o Number of Aggregation Domains 30 1625 o Number of Backbone Nodes 150 1627 o Number of AGgregation Nodes 1.500 1629 o Number of Access Nodes 40.000 1631 This gives the following scaling numbers for each category of nodes: 1633 o AN IP FIB 1 1635 o AN MPLS ILM (LFIB) 1 000 1637 o AGN IP FIB 1 700 1639 o AGN MPLS ILM (LFIB) 1 800 1641 o ABR IP FIB 3 700 1643 o ABR MPLS ILM (LFIB) 1 600 1645 o TN IP FIB 750 1647 o TN MPLS ILM (LFIB) 150 1649 o RR BGP NLRI 40 000 1651 o RR BGP paths 80 000 1653 6. Acknowledgements 1655 Many people contributed to this document. The authors would like to 1656 thank Wim Henderickx, Robert Raszuk, Thomas Beckhaus, Wilfried Maas, 1657 Roger Wenner, Kireeti Kompella, Yakov Rekhter, Mark Tinka and Simon 1658 DeLord for their suggestions and review. 1660 7. IANA Considerations 1662 This memo does not include any requests to IANA. 1664 8. Security Considerations 1666 The Seamless MPLS Architecture is subject to similar security threats 1667 as any MPLS LDP deployment. It is recommended that baseline security 1668 measures are considered as described in Security Framework for MPLS 1669 and GMPLS networks [RFC5920], in the LDP specification RFC5036 1670 [RFC5036] and in [I-D.ietf-karp-routing-tcp-analysis]including 1671 ensuring authenticity and integrity of LDP messages, as well as 1672 protection against spoofing and Denial of Service attacks. Some 1673 deployments may require increased measures of network security if a 1674 subset of Access Nodes are placed in locations with lower levels of 1675 physical security e.g. street cabinets ( common practice for VDSL 1676 access ). In such cases it is the responsibility of the system 1677 designer to take into account the physical security measures ( 1678 environmental design, mechanical or electronic access control, 1679 intrusion detection ), as well as monitoring and auditing measures 1680 (configuration and Operating System changes, reloads, routes 1681 advertisements ). 1683 Security aspects specific to the MPLS access network based on LDP DoD 1684 in the context of Seamless MPLS design are described in the security 1685 section of [I-D.ietf-mpls-ldp-dod]. 1687 9. References 1689 9.1. Normative References 1691 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1692 Requirement Levels", BCP 14, RFC 2119, March 1997. 1694 9.2. Informative References 1696 [ABR-FRR] Rekhter, Y., "Local Protection for LSP tail-end node 1697 failure, MPLS World Congress 2009", . 1699 [ACM01] , , , and , "Archieving sub-second IGP convergence in 1700 large IP networks, ACM SIGCOMM Computer Communication 1701 Review, v.35 n.3", July 2005. 1703 [BGP-PIC] "BGP PIC, Technical Report", November 2007. 1705 [I-D.bashandy-bgp-edge-node-frr] 1706 Bashandy, A., Pithawala, B., and K. Patel, "Scalable BGP 1707 FRR Protection against Edge Node Failure", draft-bashandy- 1708 bgp-edge-node-frr-03 (work in progress), July 2012. 1710 [I-D.bashandy-bgp-frr-mirror-table] 1711 Bashandy, A., Konstantynowicz, M., and N. Kumar, "BGP FRR 1712 Protection against Edge Node Failure Using Table Mirroring 1713 with Context Labels", draft-bashandy-bgp-frr-mirror- 1714 table-00 (work in progress), October 2012. 1716 [I-D.bashandy-bgp-frr-vector-label] 1717 Bashandy, A., Kumar, N., and M. Konstantynowicz, "BGP FRR 1718 Protection against Edge Node Failure Using Vector Labels", 1719 draft-bashandy-bgp-frr-vector-label-00 (work in progress), 1720 July 2012. 1722 [I-D.bashandy-idr-bgp-repair-label] 1723 Bashandy, A., Pithawala, B., and J. Heitz, "Scalable, 1724 Loop-Free BGP FRR using Repair Label", draft-bashandy-idr- 1725 bgp-repair-label-04 (work in progress), May 2012. 1727 [I-D.bashandy-isis-bgp-edge-node-frr] 1728 Bashandy, A., "IS-IS Extension for BGP FRR Protection 1729 against Edge Node Failure", draft-bashandy-isis-bgp-edge- 1730 node-frr-01 (work in progress), September 2012. 1732 [I-D.bashandy-mpls-ldp-bgp-frr] 1733 Bashandy, A. and K. Raza, "LDP Extension for FRR Edge Node 1734 Protection in BGP-Free LDP Core", draft-bashandy-mpls-ldp- 1735 bgp-frr-00 (work in progress), March 2012. 1737 [I-D.ietf-karp-routing-tcp-analysis] 1738 . 1740 [I-D.ietf-mpls-ldp-dod] 1741 Beckhaus, T., Decraene, B., Tiruveedhula, K., 1742 Konstantynowicz, M., and L. Martini, "LDP Downstream-on- 1743 Demand in Seamless MPLS", draft-ietf-mpls-ldp-dod-09 (work 1744 in progress), July 2013. 1746 [I-D.ietf-rtgwg-bgp-pic] 1747 . 1749 [I-D.ietf-rtgwg-remote-lfa] 1750 . 1752 [I-D.minto-2547-egress-node-fast-protection] 1753 Jeganathan, J., Gredler, H., and B. Decraene, "2547 egress 1754 PE Fast Failure Protection", draft-minto-2547-egress-node- 1755 fast-protection-02 (work in progress), July 2013. 1757 [PE-FRR] Le Roux, J., Decraene, B., and Z. Ahmad, "Fast Reroute in 1758 MPLS L3VPN Networks - Towards CE-to-CE Protection, MPLS 1759 2006 Conference", . 1761 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 1762 BGP-4", RFC 3107, May 2001. 1764 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1765 Networks (VPNs)", RFC 4364, February 2006. 1767 [RFC5036] Andersson, L., Minei, I., and B. Thomas, "LDP 1768 Specification", RFC 5036, October 2007. 1770 [RFC5283] Decraene, B., Le Roux, JL., and I. Minei, "LDP Extension 1771 for Inter-Area Label Switched Paths (LSPs)", RFC 5283, 1772 July 2008. 1774 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 1775 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 1777 [RFC5881] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1778 (BFD) for IPv4 and IPv6 (Single Hop)", RFC 5881, June 1779 2010. 1781 [RFC5920] . 1783 [RFC6571] . 1785 [RFC7032] . 1787 Authors' Addresses 1789 Nicolai Leymann (editor) 1790 Deutsche Telekom AG 1791 Winterfeldtstrasse 21 1792 Berlin 10781 1793 DE 1795 Phone: +49 30 8353-92761 1796 Email: n.leymann@telekom.de 1798 Bruno Decraene 1799 Orange 1800 38-40 rue du General Leclerc 1801 Issy Moulineaux cedex 9 92794 1802 FR 1804 Email: bruno.decraene@orange.com 1806 Clarence Filsfils 1807 Cisco Systems 1808 Brussels 1809 Belgium 1811 Email: cfilsfil@cisco.com 1813 Maciek Konstantynowicz (editor) 1814 Cisco Systems 1815 London 1816 United Kingdom 1818 Email: maciek@cisco.com 1819 Dirk Steinberg 1820 Steinberg Consulting 1821 Ringstrasse 2 1822 Buchholz 53567 1823 DE 1825 Email: dws@steinbergnet.net