idnits 2.17.1 draft-gredler-idr-bgplu-prefix-sid-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 9, 2015) is 3335 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3107 (Obsoleted by RFC 8277) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-25) exists of draft-ietf-isis-segment-routing-extensions-03 == Outdated reference: A later version (-27) exists of draft-ietf-ospf-segment-routing-extensions-04 == Outdated reference: A later version (-11) exists of draft-ietf-rtgwg-bgp-routing-large-dc-01 == Outdated reference: A later version (-11) exists of draft-kompella-mpls-larp-02 Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Inter-Domain Routing H. Gredler 3 Internet-Draft Juniper Networks, Inc. 4 Intended status: Standards Track March 9, 2015 5 Expires: September 10, 2015 7 Prefix-SID extensions for BGP-LU 8 draft-gredler-idr-bgplu-prefix-sid-00 10 Abstract 12 The MPLS source routing paradigm provides path control for both 13 intra- and inter- Autonomous System (AS) traffic. In most MPLS 14 deployments the ingress of a MPLS tunnel is an IP router. 15 Availability of MPLS forwarding stacks for host operating systems is 16 extending the MPLS perimeter to Hypervisors and Servers. Recent Data 17 Center designs are using an IGP-less routing paradigm based on 18 massive ECMP multi path using external BGP. This documents outlines 19 how Hypervisors and Servers may interact with the MPLS control- and 20 data plane using extensions to the BGP labeled unicast protocol (BGP- 21 LU). 23 Requirements Language 25 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 26 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 27 document are to be interpreted as described in RFC 2119 [RFC2119]. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on September 10, 2015. 46 Copyright Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 64 2. Motivation, Rationale and Applicability . . . . . . . . . . . 3 65 3. Deployment Considerations . . . . . . . . . . . . . . . . . . 4 66 3.1. Control plane restart . . . . . . . . . . . . . . . . . . 4 67 3.2. BGP-LU as Server Control Plane . . . . . . . . . . . . . 5 68 3.3. Labeled-ARP as Server Control Plane . . . . . . . . . . . 5 69 3.4. Static Labels and Controller as Server Control Plane . . 5 70 4. BGP Prefix-SID Attribute . . . . . . . . . . . . . . . . . . 5 71 4.1. Label Index TLV . . . . . . . . . . . . . . . . . . . . . 6 72 4.2. Label Base TLV . . . . . . . . . . . . . . . . . . . . . 7 73 4.3. Label Range TLV . . . . . . . . . . . . . . . . . . . . . 7 74 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 75 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 76 7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 77 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 78 8.1. Normative References . . . . . . . . . . . . . . . . . . 8 79 8.2. Informative References . . . . . . . . . . . . . . . . . 8 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 9 82 1. Introduction 84 Recent Datacenter routing designs are modeled like shown in 85 Figure Figure 1. Rather than using an IGP plus internal BGP (iBGP), 86 an IGP-less design is favored for disseminating routing information. 87 See [I-D.ietf-rtgwg-bgp-routing-large-dc] for rationale and detailed 88 information why and how to do so. Today BGP-LU [RFC3107] is used 89 both as an intra-AS [I-D.ietf-mpls-seamless-mpls] and inter-AS 90 routing protocol. Because of the IGP-less routing paradigm topology 91 information gets lost. Particularly the ability to direct traffic to 92 a specific node and hence the ability to do construct explicit paths 93 denominated by a set of nodes for traffic-engineering is of interest. 95 BGP-LU today may advertise a MPLS transport path between Autonomous 96 Systems. This document describes extensions to the BGP-LU protocol 97 such that in addition to the advertised MPLS label-switched paths 98 (LSP) all potential MPLS label-switched paths of any given node in 99 the Data Center are exposed to ingress nodes. 101 The protocol extensions In this document are in full compliance with 102 the MPLS Architecture documented in [RFC3031]. 104 +------+ +------+ 105 | | | | 106 | |--| | Tier-1 / AS 651xx 107 | | | | 108 +------+ +------+ 109 | | | | 110 +---------+ | | +----------+ 111 | +-------+--+------+--+-------+ | 112 | | | | | | | | 113 +----+ +----+ +----+ +----+ 114 | | | | | | | | 115 | |-----| | | |-----| | Tier-2 / AS 652xx 116 | | | | | | | | 117 +----+ +----+ +----+ +----+ 118 | | | | 119 | | | | 120 | +-----+ | | +-----+ | 121 +-| |-+ +-| |-+ Tier-3 / AS 653xx 122 +-----+ +-----+ 123 | | | | | | 124 <- Servers -> <- Servers -> Servers / AS 65534 126 Figure 1: eBGP-centric Data Center routing 128 2. Motivation, Rationale and Applicability 130 The specifications for Segment Routing ( 131 [I-D.ietf-isis-segment-routing-extensions] and 132 [I-D.ietf-ospf-segment-routing-extensions] ) provide extensions for 133 setting up hop-by-hop shortest path routed MPLS LSPs. The used 134 Protocol semantics are: 136 o Domain-wide Index 138 o Local Label-Base 140 o Local Label Range 141 advertised by any router in an IGP domain. This not only sets up 142 MPLS sink-trees to each egress router in a domain, but also allows to 143 steer traffic using stacks of node labels. The chosen protocol 144 semantics are essentially a compression scheme to advertise all MPLS 145 SPT paths in a domain. 147 The ability to do explicit path routing based on stacked labels, 148 constructed at the Hypervisors/Servers, without running conventional 149 TE-protocols like for example RSVP-TE is a lightweight way to scale 150 the Data Center Fabric. 152 In order to support deployments of Segment Routing across routing 153 protocol boundaries it is required to keep a common set of semantics 154 across all routing protocols. This document specifies BGP-LU 155 extensions to be able to address Node-SIDs across routing-protocol 156 boundaries. 158 3. Deployment Considerations 160 Depending on the Sophistication of the MPLS stack at the Hypervisor / 161 Server there are various levels of considerations for deployment. 163 3.1. Control plane restart 165 In case a restart of the first-hop router needs to be performed there 166 may be some forwarding state churn at the Hypervisor / Server. It 167 would be desirable that upon control-plane restart the Network node 168 uses the same label-allocations than in the previous incarnation. 169 Unfortunately none of the BGP graceful restart extensions allows to 170 re-aquire previous incarnations label-mapping state from the network. 171 Therefore a restarting node will be allocating FECs to labels in 172 temporal incoming order. This degrades to pseudo-random, non- 173 predictable label allocations. It is desirable that a BGP-LU 174 implementation allocates the labels in a deterministic way, such that 175 temporal control-plane loss does not impact forwarding between the 176 Hypervisor / Server and the network. 178 A BGP-LU Prefix SID speaking networking node MUST therefore implement 179 a MPLS label-allocation strategy which produces a deterministic, 180 local allocated label-block for all of its Prefix SIDs. 182 For example an Implementation MAY statically allocate a Label Base of 183 800000 and a block-size of 16000 labels and delegate that label block 184 exclusively to BGP-LU Prefix SID allocations, such that the same 185 label-base is being used across control-plane restarts. 187 3.2. BGP-LU as Server Control Plane 189 In this case the Hypervisor / Server has a "client-only" BGP-LU stack 190 in order to interface to the network. This is the most distributed 191 way of building label switched paths across the network. As soon as 192 there is a reachability change then all of the Hypervisors / Servers 193 get notified instantly. There is almost no time-lag for updating 194 servers due to the inherent PUSH model of the BGP Protocol. 196 Most of the implementation complexity of a BGP implementation comes 197 from the BGP Update generation subsystem. For a client-only BGP 198 implementation this is fortunately negligible as typically one or two 199 (for redundancy reasons) BGP sessions are required. So the BGP 200 Update Generation complexity stays limited. 202 3.3. Labeled-ARP as Server Control Plane 204 The Labeled ARP Protocol [I-D.kompella-mpls-larp] may be used as a 205 lightweight alternative to the BGP-LU protocol. Labeled ARP is a 206 soft-state protocol and therefore needs special consideration for e.g 207 Refresh-timers, Labels in the network etc. needs to be taken. Yet 208 it is a distributed variant of LSP state propagation and hence re- 209 acts immediately to network topology changes / label to FEC changes. 211 3.4. Static Labels and Controller as Server Control Plane 213 Static labels do not need a control-plane sessions between 214 Hypervisors / Servers and the network. The assumption is that an 215 external controller transfers the routing/label information into the 216 Hypervisor / Server. The main disadvantage of that model is that the 217 update process is not distributed and hence a controller needs to 218 have excellent horizontal scaling abilities in order to update order 219 of 100K routes/labels to order of 100K servers. 221 4. BGP Prefix-SID Attribute 223 In order to facilitate dense packing of Network nodes and Node labels 224 to a deterministic label-range like described in Section 3.1 a new 225 Protocol extension called the "BGP Prefix SID Attribute" is proposed. 227 The BGP Prefix SID is a new optional, transitive BGP path attribute. 228 The attribute type code for BGP Prefix SID attribute is to be 229 assigned by IANA. 231 The value field of the BGP Prefix SID attribute is defined here to be 232 a set of elements encoded as "Type/Length/Value" (i.e., a set of 233 TLVs). Each such TLV is encoded as shown in Figure Figure 2. 235 0 1 2 3 236 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 237 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 238 | Type | Length | | 239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 240 ~ ~ 241 | Value (variable) | 242 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 244 Figure 2: TLV format 246 o Type: A single octet encoding the TLV Type. Unrecognized Types 247 are preserved and propagated. In order to compare NLRIs with 248 unknown TLVs all TLVs MUST be ordered in ascending order by TLV 249 Type. If there are more TLVs of the same type, then the TLVs MUST 250 be ordered in ascending order of the TLV value within the TLVs 251 with the same type. All TLVs that are not specified as mandatory 252 are considered optional. 254 o Length: Two octets encoding the length of the value portion in 255 octets (thus a TLV with no value portion would have a length of 256 zero). The TLV is not padded to four-octet alignment. 258 o Value: A field containing zero or more octets. 260 The following TLV types are defined in this document: 262 +------+-------------+ 263 | Type | Name | 264 +------+-------------+ 265 | 1 | Label Index | 266 | 2 | Label Base | 267 | 3 | Label Range | 268 +------+-------------+ 270 Table 1: Prefix SID TLVs 272 Use of other TLV types is outside the scope of this document. 274 4.1. Label Index TLV 276 o Type: 1 278 o Length: 4 280 o Value: Label Index 282 Only one Label Index TLV per Prefix SID Attribute is allowed. 284 4.2. Label Base TLV 286 o Type: 2 288 o Length: 3 290 o Value: Label Base 292 One or more occurences of the Label Base TLV are allowed. A Label 293 Base TLV MUST be followed by a Label Range TLV. 295 4.3. Label Range TLV 297 o Type: 3 299 o Length: 3 301 o Value: Label Range 303 One or more occurences of the Label Range TLV are allowed. A Label 304 Range TLV MUST be preceeded by a Label Range TLV. 306 5. Acknowledgements 308 Many thanks to TBD for their detailed review and insightful comments. 310 6. IANA Considerations 312 This document requests a code point from the BGP Path Attributes 313 registry named 'Prefix SID' 315 This document requests creation of a new registry for BGP Prefix SID 316 TLVs. Value 0 is reserved. The maximum value is 255. The registry 317 will be initialized as shown in Table 1. Allocations within the 318 registry will require documentation of the proposed use of the 319 allocated value (=Specification required) and approval by the 320 Designated Expert assigned by the IESG (see [RFC5226]). 322 7. Security Considerations 324 This document does not introduce any change in terms of BGP security. 326 8. References 327 8.1. Normative References 329 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 330 Requirement Levels", BCP 14, RFC 2119, March 1997. 332 [RFC3031] Rosen, E., Viswanathan, A., and R. Callon, "Multiprotocol 333 Label Switching Architecture", RFC 3031, January 2001. 335 [RFC3107] Rekhter, Y. and E. Rosen, "Carrying Label Information in 336 BGP-4", RFC 3107, May 2001. 338 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 339 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 340 May 2008. 342 8.2. Informative References 344 [I-D.ietf-isis-segment-routing-extensions] 345 Previdi, S., Filsfils, C., Bashandy, A., Gredler, H., 346 Litkowski, S., Decraene, B., and J. Tantsura, "IS-IS 347 Extensions for Segment Routing", draft-ietf-isis-segment- 348 routing-extensions-03 (work in progress), October 2014. 350 [I-D.ietf-mpls-seamless-mpls] 351 Leymann, N., Decraene, B., Filsfils, C., Konstantynowicz, 352 M., and D. Steinberg, "Seamless MPLS Architecture", draft- 353 ietf-mpls-seamless-mpls-07 (work in progress), June 2014. 355 [I-D.ietf-ospf-segment-routing-extensions] 356 Psenak, P., Previdi, S., Filsfils, C., Gredler, H., 357 Shakir, R., Henderickx, W., and J. Tantsura, "OSPF 358 Extensions for Segment Routing", draft-ietf-ospf-segment- 359 routing-extensions-04 (work in progress), February 2015. 361 [I-D.ietf-rtgwg-bgp-routing-large-dc] 362 Lapukhov, P., Premji, A., and J. Mitchell, "Use of BGP for 363 routing in large-scale data centers", draft-ietf-rtgwg- 364 bgp-routing-large-dc-01 (work in progress), February 2015. 366 [I-D.kompella-mpls-larp] 367 Kompella, K., Rajagopalan, B., and G. Swallow, "Label 368 Distribution Using ARP", draft-kompella-mpls-larp-02 (work 369 in progress), October 2014. 371 Author's Address 373 Hannes Gredler 374 Juniper Networks, Inc. 375 1194 N. Mathilda Ave. 376 Sunnyvale, CA 94089 377 US 379 Email: hannes@juniper.net