idnits 2.17.1 draft-ietf-idr-tunnel-encaps-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 22, 2019) is 1862 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5512 (Obsoleted by RFC 9012) == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-07 == Outdated reference: A later version (-13) exists of draft-ietf-nvo3-vxlan-gpe-06 -- Obsolete informational reference (is this intentional?): RFC 5566 (Obsoleted by RFC 9012) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group E. Rosen, Ed. 3 Internet-Draft Juniper Networks, Inc. 4 Obsoletes: 5512 (if approved) K. Patel 5 Intended status: Standards Track Arrcus, Inc 6 Expires: August 26, 2019 G. Van de Velde 7 Nokia 8 February 22, 2019 10 The BGP Tunnel Encapsulation Attribute 11 draft-ietf-idr-tunnel-encaps-11 13 Abstract 15 RFC 5512 defines a BGP Path Attribute known as the "Tunnel 16 Encapsulation Attribute". This attribute allows one to specify a set 17 of tunnels. For each such tunnel, the attribute can provide the 18 information needed to create the tunnel and the corresponding 19 encapsulation header. The attribute can also provide information 20 that aids in choosing whether a particular packet is to be sent 21 through a particular tunnel. RFC 5512 states that the attribute is 22 only carried in BGP UPDATEs that have the "Encapsulation Subsequent 23 Address Family (Encapsulation SAFI)". This document deprecates the 24 Encapsulation SAFI (which has never been used in production), and 25 specifies semantics for the attribute when it is carried in UPDATEs 26 of certain other SAFIs. This document adds support for additional 27 tunnel types, and allows a remote tunnel endpoint address to be 28 specified for each tunnel. This document also provides support for 29 specifying fields of any inner or outer encapsulations that may be 30 used by a particular tunnel. 32 This document obsoletes RFC 5512. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on August 26, 2019. 50 Copyright Notice 52 Copyright (c) 2019 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Brief Summary of RFC 5512 . . . . . . . . . . . . . . . . 3 69 1.2. Deficiencies in RFC 5512 . . . . . . . . . . . . . . . . 4 70 1.3. Brief Summary of Changes from RFC 5512 . . . . . . . . . 5 71 1.4. Impact on RFC 5566 . . . . . . . . . . . . . . . . . . . 6 72 2. The Tunnel Encapsulation Attribute . . . . . . . . . . . . . 6 73 3. Tunnel Encapsulation Attribute Sub-TLVs . . . . . . . . . . . 8 74 3.1. The Remote Endpoint Sub-TLV . . . . . . . . . . . . . . . 8 75 3.2. Encapsulation Sub-TLVs for Particular Tunnel Types . . . 10 76 3.2.1. VXLAN . . . . . . . . . . . . . . . . . . . . . . . . 10 77 3.2.2. VXLAN-GPE . . . . . . . . . . . . . . . . . . . . . . 12 78 3.2.3. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . 13 79 3.2.4. L2TPv3 . . . . . . . . . . . . . . . . . . . . . . . 14 80 3.2.5. GRE . . . . . . . . . . . . . . . . . . . . . . . . . 15 81 3.2.6. MPLS-in-GRE . . . . . . . . . . . . . . . . . . . . . 15 82 3.3. Outer Encapsulation Sub-TLVs . . . . . . . . . . . . . . 16 83 3.3.1. IPv4 DS Field . . . . . . . . . . . . . . . . . . . . 16 84 3.3.2. UDP Destination Port . . . . . . . . . . . . . . . . 16 85 3.4. Sub-TLVs for Aiding Tunnel Selection . . . . . . . . . . 17 86 3.4.1. Protocol Type Sub-TLV . . . . . . . . . . . . . . . . 17 87 3.4.2. Color Sub-TLV . . . . . . . . . . . . . . . . . . . . 17 88 3.5. Embedded Label Handling Sub-TLV . . . . . . . . . . . . . 17 89 3.6. MPLS Label Stack Sub-TLV . . . . . . . . . . . . . . . . 18 90 3.7. Prefix-SID Sub-TLV . . . . . . . . . . . . . . . . . . . 20 91 4. Extended Communities Related to the Tunnel Encapsulation 92 Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . 21 93 4.1. Encapsulation Extended Community . . . . . . . . . . . . 21 94 4.2. Router's MAC Extended Community . . . . . . . . . . . . . 22 95 4.3. Color Extended Community . . . . . . . . . . . . . . . . 23 97 5. Semantics and Usage of the Tunnel Encapsulation attribute . . 23 98 6. Routing Considerations . . . . . . . . . . . . . . . . . . . 27 99 6.1. No Impact on BGP Decision Process . . . . . . . . . . . . 27 100 6.2. Looping, Infinite Stacking, Etc. . . . . . . . . . . . . 27 101 7. Recursive Next Hop Resolution . . . . . . . . . . . . . . . . 28 102 8. Use of Virtual Network Identifiers and Embedded Labels when 103 Imposing a Tunnel Encapsulation . . . . . . . . . . . . . . . 29 104 8.1. Tunnel Types without a Virtual Network Identifier Field . 29 105 8.2. Tunnel Types with a Virtual Network Identifier Field . . 29 106 8.2.1. Unlabeled Address Families . . . . . . . . . . . . . 30 107 8.2.2. Labeled Address Families . . . . . . . . . . . . . . 30 108 9. Applicability Restrictions . . . . . . . . . . . . . . . . . 32 109 10. Scoping . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 110 11. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 33 111 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 112 12.1. Subsequent Address Family Identifiers . . . . . . . . . 35 113 12.2. BGP Path Attributes . . . . . . . . . . . . . . . . . . 35 114 12.3. Extended Communities . . . . . . . . . . . . . . . . . . 35 115 12.4. BGP Tunnel Encapsulation Attribute Sub-TLVs . . . . . . 35 116 12.5. Tunnel Types . . . . . . . . . . . . . . . . . . . . . . 36 117 13. Security Considerations . . . . . . . . . . . . . . . . . . . 36 118 14. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 37 119 15. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 37 120 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 121 16.1. Normative References . . . . . . . . . . . . . . . . . . 38 122 16.2. Informative References . . . . . . . . . . . . . . . . . 38 123 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 125 1. Introduction 127 This document obsoletes RFC 5512. The deficiencies of RFC 5512, and 128 a summary of the changes made, are discussed in Sections 1.1-1.3. 129 The material from RFC 5512 that is retained has been incorporated 130 into this document. 132 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 133 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 134 "OPTIONAL" in this document are to be interpreted as described in BCP 135 14 [RFC2119] [RFC8174] when, and only when, they appear in all 136 capitals, as shown here. 138 1.1. Brief Summary of RFC 5512 140 [RFC5512] defines a BGP Path Attribute known as the Tunnel 141 Encapsulation attribute. This attribute consists of one or more 142 TLVs. Each TLV identifies a particular type of tunnel. Each TLV 143 also contains one or more sub-TLVs. Some of the sub-TLVs, e.g., the 144 "Encapsulation sub-TLV", contain information that may be used to form 145 the encapsulation header for the specified tunnel type. Other sub- 146 TLVs, e.g., the "color sub-TLV" and the "protocol sub-TLV", contain 147 information that aids in determining whether particular packets 148 should be sent through the tunnel that the TLV identifies. 150 [RFC5512] only allows the Tunnel Encapsulation attribute to be 151 attached to BGP UPDATE messages of the Encapsulation Address Family. 152 These UPDATE messages have an AFI (Address Family Identifier) of 1 or 153 2, and a SAFI of 7. In an UPDATE of the Encapsulation SAFI, the NLRI 154 (Network Layer Reachability Information) is an address of the BGP 155 speaker originating the UPDATE. Consider the following scenario: 157 o BGP speaker R1 has received and installed UPDATE U; 159 o UPDATE U's SAFI is the Encapsulation SAFI; 161 o UPDATE U has the address R2 as its NLRI; 163 o UPDATE U has a Tunnel Encapsulation attribute. 165 o R1 has a packet, P, to transmit to destination D; 167 o R1's best path to D is a BGP route that has R2 as its next hop; 169 In this scenario, when R1 transmits packet P, it should transmit it 170 to R2 through one of the tunnels specified in U's Tunnel 171 Encapsulation attribute. The IP address of the remote endpoint of 172 each such tunnel is R2. Packet P is known as the tunnel's "payload". 174 1.2. Deficiencies in RFC 5512 176 While the ability to specify tunnel information in a BGP UPDATE is 177 useful, the procedures of [RFC5512] have certain limitations: 179 o The requirement to use the "Encapsulation SAFI" presents an 180 unfortunate operational cost, as each BGP session that may need to 181 carry tunnel encapsulation information needs to be reconfigured to 182 support the Encapsulation SAFI. The Encapsulation SAFI has never 183 been used, and this requirement has served only to discourage the 184 use of the Tunnel Encapsulation attribute. 186 o There is no way to use the Tunnel Encapsulation attribute to 187 specify the remote endpoint address of a given tunnel; [RFC5512] 188 assumes that the remote endpoint of each tunnel is specified as 189 the NLRI of an UPDATE of the Encapsulation-SAFI. 191 o If the respective best paths to two different address prefixes 192 have the same next hop, [RFC5512] does not provide a 193 straightforward method to associate each prefix with a different 194 tunnel. 196 o If a particular tunnel type requires an outer IP or UDP 197 encapsulation, there is no way to signal the values of any of the 198 fields of the outer encapsulation. 200 o In [RFC5512]'s specification of the sub-TLVs, each sub-TLV has 201 one-octet length field. In some cases, a two-octet length field 202 may be needed. 204 1.3. Brief Summary of Changes from RFC 5512 206 In this document we address these deficiencies by: 208 o Deprecating the Encapsulation SAFI. 210 o Defining a new "Remote Endpoint Address sub-TLV" that can be 211 included in any of the TLVs contained in the Tunnel Encapsulation 212 attribute. This sub-TLV can be used to specify the remote 213 endpoint address of a particular tunnel. 215 o Allowing the Tunnel Encapsulation attribute to be carried by BGP 216 UPDATEs of additional AFI/SAFIs. Appropriate semantics are 217 provided for this way of using the attribute. 219 o Defining a number of new sub-TLVs that provide additional 220 information that is useful when forming the encapsulation header 221 used to send a packet through a particular tunnel. 223 o Defining the sub-TLV type field so that a sub-TLV whose type is in 224 the range from 0 to 127 inclusive has a one-octet length field, 225 but a sub-TLV whose type is in the range from 128 to 255 inclusive 226 has a two-octet length field. 228 One of the sub-TLVs defined in [RFC5512] is the "Encapsulation sub- 229 TLV". For a given tunnel, the encapsulation sub-TLV specifies some 230 of the information needed to construct the encapsulation header used 231 when sending packets through that tunnel. This document defines 232 encapsulation sub-TLVs for a number of tunnel types not discussed in 233 [RFC5512]: VXLAN (Virtual Extensible Local Area Network, [RFC7348]), 234 VXLAN-GPE (Generic Protocol Extension for VXLAN, 235 [I-D.ietf-nvo3-vxlan-gpe]), NVGRE (Network Virtualization Using 236 Generic Routing Encapsulation [RFC7637]), and MPLS-in-GRE (MPLS in 237 Generic Routing Encapsulation [RFC2784], [RFC2890], [RFC4023]). 238 MPLS-in-UDP [RFC7510] is also supported, but an Encapsulation sub-TLV 239 for it is not needed. 241 Some of the encapsulations mentioned in the previous paragraph need 242 to be further encapsulated inside UDP and/or IP. [RFC5512] provides 243 no way to specify that certain information is to appear in these 244 outer IP and/or UDP encapsulations. This document provides a 245 framework for including such information in the TLVs of the Tunnel 246 Encapsulation attribute. 248 When the Tunnel Encapsulation attribute is attached to a BGP UPDATE 249 whose AFI/SAFI identifies one of the labeled address families, it is 250 not always obvious whether the label embedded in the NLRI is to 251 appear somewhere in the tunnel encapsulation header (and if so, 252 where), or whether it is to appear in the payload, or whether it can 253 be omitted altogether. This is especially true if the tunnel 254 encapsulation header itself contains a "virtual network identifier". 255 This document provides a mechanism that allows one to signal (by 256 using sub-TLVs of the Tunnel Encapsulation attribute) how one wants 257 to use the embedded label when the tunnel encapsulation has its own 258 virtual network identifier field. 260 [RFC5512] defines a Tunnel Encapsulation Extended Community, that can 261 be used instead of the Tunnel Encapsulation attribute under certain 262 circumstances. This document addresses the issue of how to handle a 263 BGP UPDATE that carries both a Tunnel Encapsulation attribute and one 264 or more Tunnel Encapsulation Extended Communities. 266 1.4. Impact on RFC 5566 268 [RFC5566] uses the mechanisms defined in [RFC5512]. While this 269 document obsoletes [RFC5512], it does not address the issue of how to 270 use the mechanisms of [RFC5566] without also using the Encapsulation 271 SAFI. Those issues are considered to be outside the scope of this 272 document. 274 2. The Tunnel Encapsulation Attribute 276 The Tunnel Encapsulation attribute is an optional transitive BGP Path 277 attribute. IANA has assigned the value 23 as the type code of the 278 attribute. The attribute is composed of a set of Type-Length-Value 279 (TLV) encodings. Each TLV contains information corresponding to a 280 particular tunnel type. A TLV is structured as shown in Figure 1: 282 0 1 2 3 283 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 284 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 | Tunnel Type (2 Octets) | Length (2 Octets) | 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 | | 288 | Value | 289 | | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 Figure 1: Tunnel Encapsulation TLV Value Field 294 o Tunnel Type (2 octets): identifies a type of tunnel. The field 295 contains values from the IANA Registry "BGP Tunnel Encapsulation 296 Attribute Tunnel Types". 298 Note that for tunnel types whose names are of the form "X-in-Y", 299 e.g., "MPLS-in-GRE", only packets of the specified payload type 300 "X" are to be carried through the tunnel of type "Y". This is the 301 equivalent of specifying a tunnel type "Y" and including in its 302 TLV a Protocol Type sub-TLV (see Section 3.4.1) specifying 303 protocol "X". If the tunnel type is "X-in-Y", it is unnecessary, 304 though harmless, to include a Protocol Type sub-TLV specifying 305 "X". 307 o Length (2 octets): the total number of octets of the value field. 309 o Value (variable): comprised of multiple sub-TLVs. 311 Each sub-TLV consists of three fields: a 1-octet type, a 1-octet or 312 2-octet length field (depending on the type), and zero or more octets 313 of value. A sub-TLV is structured as shown in Figure 2: 315 +--------------------------------+ 316 | Sub-TLV Type (1 Octet) | 317 +--------------------------------+ 318 | Sub-TLV Length (1 or 2 Octets) | 319 +--------------------------------+ 320 | Sub-TLV Value (Variable) | 321 +--------------------------------+ 323 Table 1: Tunnel Encapsulation Sub-TLV Format 325 o Sub-TLV Type (1 octet): each sub-TLV type defines a certain 326 property about the tunnel TLV that contains this sub-TLV. 328 o Sub-TLV Length (1 or 2 octets): the total number of octets of the 329 sub-TLV value field. The Sub-TLV Length field contains 1 octet if 330 the Sub-TLV Type field contains a value in the range from 0-127. 331 The Sub-TLV Length field contains two octets if the Sub-TLV Type 332 field contains a value in the range from 128-255. 334 o Sub-TLV Value (variable): encodings of the value field depend on 335 the sub-TLV type as enumerated above. The following sub-sections 336 define the encoding in detail. 338 3. Tunnel Encapsulation Attribute Sub-TLVs 340 In this section, we specify a number of sub-TLVs. These sub-TLVs can 341 be included in a TLV of the Tunnel Encapsulation attribute. 343 3.1. The Remote Endpoint Sub-TLV 345 The Remote Endpoint sub-TLV is a sub-TLV whose value field contains 346 three sub-fields: 348 1. a four-octet Autonomous System (AS) number sub-field 350 2. a two-octet Address Family sub-field 352 3. an address sub-field, whose length depends upon the Address 353 Family. 355 0 1 2 3 356 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 358 | Autonomous System Number | 359 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 360 | Address Family | Address ~ 361 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 362 ~ ~ 363 | | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 366 Figure 2: Remote Endpoint Sub-TLV Value Field 368 The Address Family subfield contains a value from IANA's "Address 369 Family Numbers" registry. In this document, we assume that the 370 Address Family is either IPv4 or IPv6; use of other address families 371 is outside the scope of this document. 373 If the Address Family subfield contains the value for IPv4, the 374 address subfield must contain an IPv4 address (a /32 IPv4 prefix). 376 In this case, the length field of Remote Endpoint sub-TLV must 377 contain the value 10 (0xa). 379 If the Address Family subfield contains the value for IPv6, the 380 address sub-field must contain an IPv6 address (a /128 IPv6 prefix). 381 In this case, the length field of Remote Endpoint sub-TLV must 382 contain the value 22 (0x16). IPv6 link local addresses are not valid 383 values of the IP address field. 385 In a given BGP UPDATE, the address family (IPv4 or IPv6) of a Remote 386 Endpoint sub-TLV is independent of the address family of the UPDATE 387 itself. For example, an UPDATE whose NLRI is an IPv4 address may 388 have a Tunnel Encapsulation attribute containing Remote Endpoint sub- 389 TLVs that contain IPv6 addresses. Also, different tunnels 390 represented in the Tunnel Encapsulation attribute may have Remote 391 Endpoints of different address families. 393 A two-octet AS number can be carried in the AS number field by 394 setting the two high order octets to zero, and carrying the number in 395 the two low order octets of the field. 397 The AS number in the sub-TLV MUST be the number of the AS to which 398 the IP address in the sub-TLV belongs. 400 There is one special case: the Remote Endpoint sub-TLV MAY have a 401 value field whose Address Family subfield contains 0. This means 402 that the tunnel's remote endpoint is the UPDATE's BGP next hop. If 403 the Address Family subfield contains 0, the Address subfield is 404 omitted, and the Autonomous System number field is set to 0. 406 If any of the following conditions hold, the Remote Endpoint sub-TLV 407 is considered to be "malformed": 409 o The sub-TLV contains the value for IPv4 in its Address Family 410 subfield, but the length of the sub-TLV's value field is other 411 than 10 (0xa). 413 o The sub-TLV contains the value for IPv6 in its Address Family 414 subfield, but the length of the sub-TLV's value field is other 415 than 22 (0x16). 417 o The sub-TLV contains the value zero in its Address Family field, 418 but the length of the sub-TLV's value field is other than 6, or 419 the Autonomous System subfield is not set to zero. 421 o The IP address in the sub-TLV's address subfield is not a valid IP 422 address (e.g., it's an IPv4 broadcast address). 424 o It can be determined that the IP address in the sub-TLV's address 425 subfield does not belong to the non-zero AS whose number is in the 426 its Autonomous System subfield. (See section Section 13 for 427 discussion of one way to determine this.) 429 If the Remote Endpoint sub-TLV is malformed, the TLV containing it is 430 also considered to be malformed, and the entire TLV MUST be ignored. 431 However, the Tunnel Encapsulation attribute SHOULD NOT be considered 432 to be malformed in this case; other TLVs in the attribute SHOULD be 433 processed (if they can be parsed correctly). 435 When redistributing a route that is carrying a Tunnel Encapsulation 436 attribute containing a TLV that itself contains a malformed Remote 437 Endpoint sub-TLV, the TLV SHOULD be removed from the attribute before 438 redistribution. 440 See Section 11 for further discussion of how to handle errors that 441 are encountered when parsing the Tunnel Encapsulation attribute. 443 If the Remote Endpoint sub-TLV contains an IPv4 or IPv6 address that 444 is valid but not reachable, the sub-TLV is NOT considered to be 445 malformed, and the containing TLV SHOULD NOT be removed from the 446 attribute before redistribution. However, the tunnel identified by 447 the TLV containing that sub-TLV cannot be used until such time as the 448 address becomes reachable. See Section 5. 450 3.2. Encapsulation Sub-TLVs for Particular Tunnel Types 452 This section defines Tunnel Encapsulation sub-TLVs for the following 453 tunnel types: VXLAN ([RFC7348]), VXLAN-GPE 454 ([I-D.ietf-nvo3-vxlan-gpe]), NVGRE ([RFC7637]), MPLS-in-GRE 455 ([RFC2784], [RFC2890], [RFC4023]), L2TPv3 ([RFC3931]), and GRE 456 ([RFC2784], [RFC2890], [RFC4023]). 458 Rules for forming the encapsulation based on the information in a 459 given TLV are given in Sections 5 and 8. 461 For some tunnel types, the rules are obvious and not mentioned in 462 this document. 464 There are also tunnel types for which it is not necessary to define 465 an Encapsulation sub-TLV, because there are no fields in the 466 encapsulation header whose values need to be signaled from the remote 467 endpoint. 469 3.2.1. VXLAN 471 This document defines an encapsulation sub-TLV for VXLAN tunnels. 472 When the tunnel type is VXLAN, the following is the structure of the 473 value field in the encapsulation sub-TLV: 475 0 1 2 3 476 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 478 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 480 | MAC Address (4 Octets) | 481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 482 | MAC Address (2 Octets) | Reserved | 483 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 485 Figure 3: VXLAN Encapsulation Sub-TLV 487 V: This bit is set to 1 to indicate that a "valid" VN-ID (Virtual 488 Network Identifier) is present in the encapsulation sub-TLV. 489 Please see Section 8. 491 M: This bit is set to 1 to indicate that a valid MAC Address is 492 present in the encapsulation sub-TLV. 494 R: The remaining bits in the 8-bit flags field are reserved for 495 further use. They SHOULD always be set to 0. 497 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 498 ID value. If the V bit is not set, the VN-id field SHOULD be set 499 to zero. 501 MAC Address: If the M bit is set, this field contains a 6 octet 502 Ethernet MAC address. If the M bit is not set, this field SHOULD 503 be set to all zeroes. 505 When forming the VXLAN encapsulation header: 507 o The values of the V, M, and R bits are NOT copied into the flags 508 field of the VXLAN header. The flags field of the VXLAN header is 509 set as per [RFC7348]. 511 o If the M bit is set, the MAC Address is copied into the Inner 512 Destination MAC Address field of the Inner Ethernet Header (see 513 section 5 of [RFC7348]. 515 If the M bit is not set, and the payload being sent through the 516 VXLAN tunnel is an ethernet frame, the Destination MAC Address 517 field of the Inner Ethernet Header is just the Destination MAC 518 Address field of the payload's ethernet header. 520 If the M bit is not set, and the payload being sent through the 521 VXLAN tunnel is an IP or MPLS packet, the Inner Destination MAC 522 address field is set to a configured value; if there is no 523 configured value, the VXLAN tunnel cannot be used. 525 o See Section 8 to see how the VNI field of the VXLAN encapsulation 526 header is set. 528 Note that in order to send an IP packet or an MPLS packet through a 529 VXLAN tunnel, the packet must first be encapsulated in an ethernet 530 header, which becomes the "inner ethernet header" described in 531 [RFC7348]. The VXLAN Encapsulation sub-TLV may contain information 532 (e.g.,the MAC address) that is used to form this ethernet header. 534 3.2.2. VXLAN-GPE 536 This document defines an encapsulation sub-TLV for VXLAN tunnels. 537 When the tunnel type is VXLAN-GPE, the following is the structure of 538 the value field in the encapsulation sub-TLV: 540 0 1 2 3 541 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 |Ver|V|R|R|R|R|R| Reserved | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | VN-ID | Reserved | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 Figure 4: VXLAN GPE Encapsulation Sub-TLV 550 V: This bit is set to 1 to indicate that a "valid" VN-ID is 551 present in the encapsulation sub-TLV. Please see Section 8. 553 R: The bits designated "R" above are reserved for future use. 554 They SHOULD always be set to zero. 556 Version (Ver): Indicates VXLAN GPE protocol version. (See the 557 "Version Bits" section of [I-D.ietf-nvo3-vxlan-gpe].) If the 558 indicated version is not supported, the TLV that contains this 559 Encapsulation sub-TLV MUST be treated as specifying an unsupported 560 tunnel type. The value of this field will be copied into the 561 corresponding field of the VXLAN encapsulation header. 563 VN-ID: If the V bit is set, this field contains a 3 octet VN-ID 564 value. If the V bit is not set, this field SHOULD be set to zero. 566 When forming the VXLAN-GPE encapsulation header: 568 o The values of the V and R bits are NOT copied into the flags field 569 of the VXLAN-GPE header. However, the values of the Ver bits are 570 copied into the VXLAN-GPE header. Other bits in the flags field 571 of the VXLAN-GPE header are set as per [I-D.ietf-nvo3-vxlan-gpe]. 573 o See Section 8 to see how the VNI field of the VXLAN-GPE 574 encapsulation header is set. 576 3.2.3. NVGRE 578 This document defines an encapsulation sub-TLV for NVGRE tunnels. 579 When the tunnel type is NVGRE, the following is the structure of the 580 value field in the encapsulation sub-TLV: 582 0 1 2 3 583 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 585 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 587 | MAC Address (4 Octets) | 588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 589 | MAC Address (2 Octets) | Reserved | 590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 Figure 5: NVGRE Encapsulation Sub-TLV 594 V: This bit is set to 1 to indicate that a "valid" VN-ID is 595 present in the encapsulation sub-TLV. Please see Section 8. 597 M: This bit is set to 1 to indicate that a valid MAC Address is 598 present in the encapsulation sub-TLV. 600 R: The remaining bits in the 8-bit flags field are reserved for 601 further use. They SHOULD always be set to 0. 603 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 604 ID value. If the V bit is not set, the VN-id field SHOULD be set 605 to zero. 607 MAC Address: If the M bit is set, this field contains a 6 octet 608 Ethernet MAC address. If the M bit is not set, this field SHOULD 609 be set to all zeroes. 611 When forming the NVGRE encapsulation header: 613 o The values of the V, M, and R bits are NOT copied into the flags 614 field of the NVGRE header. The flags field of the VXLAN header is 615 set as per [RFC7637]. 617 o If the M bit is set, the MAC Address is copied into the Inner 618 Destination MAC Address field of the Inner Ethernet Header (see 619 section 3.2 of [RFC7637]. 621 If the M bit is not set, and the payload being sent through the 622 NVGRE tunnel is an ethernet frame, the Destination MAC Address 623 field of the Inner Ethernet Header is just the Destination MAC 624 Address field of the payload's ethernet header. 626 If the M bit is not set, and the payload being sent through the 627 NVGRE tunnel is an IP or MPLS packet, the Inner Destination MAC 628 address field is set to a configured value; if there is no 629 configured value, the NVGRE tunnel cannot be used. 631 o See Section 8 to see how the VSID (Virtual Subnet Identifier) 632 field of the NVGRE encapsulation header is set. 634 3.2.4. L2TPv3 636 When the tunnel type of the TLV is L2TPv3 over IP, the following is 637 the structure of the value field of the encapsulation sub-TLV: 639 0 1 2 3 640 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 641 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 642 | Session ID (4 octets) | 643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 644 | | 645 | Cookie (Variable) | 646 | | 647 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 649 Figure 6: L2TPv3 Encapsulation Sub-TLV 651 Session ID: a non-zero 4-octet value locally assigned by the 652 advertising router that serves as a lookup key in the incoming 653 packet's context. 655 Cookie: an optional, variable length (encoded in octets -- 0 to 8 656 octets) value used by L2TPv3 to check the association of a 657 received data message with the session identified by the Session 658 ID. Generation and usage of the cookie value is as specified in 659 [RFC3931]. 661 The length of the cookie is not encoded explicitly, but can be 662 calculated as (sub-TLV length - 4). 664 3.2.5. GRE 666 When the tunnel type of the TLV is GRE, the following is the 667 structure of the value field of the encapsulation sub-TLV: 669 0 1 2 3 670 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 671 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 672 | GRE Key (4 octets) | 673 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 675 Figure 7: GRE Encapsulation Sub-TLV 677 GRE Key: 4-octet field [RFC2890] that is generated by the 678 advertising router. The actual method by which the key is 679 obtained is beyond the scope of this document. The key is 680 inserted into the GRE encapsulation header of the payload packets 681 sent by ingress routers to the advertising router. It is intended 682 to be used for identifying extra context information about the 683 received payload. 685 Note that the key is optional. Unless a key value is being 686 advertised, the GRE encapsulation sub-TLV MUST NOT be present. 688 3.2.6. MPLS-in-GRE 690 When the tunnel type is MPLS-in-GRE, the following is the structure 691 of the value field in an optional encapsulation sub-TLV: 693 0 1 2 3 694 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 | GRE-Key (4 Octets) | 697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 699 Figure 8: MPLS-in-GRE Encapsulation Sub-TLV 701 GRE-Key: 4-octet field [RFC2890] that is generated by the 702 advertising router. The actual method by which the key is 703 obtained is beyond the scope of this document. The key is 704 inserted into the GRE encapsulation header of the payload packets 705 sent by ingress routers to the advertising router. It is intended 706 to be used for identifying extra context information about the 707 received payload. Note that the key is optional. Unless a key 708 value is being advertised, the MPLS-in-GRE encapsulation sub-TLV 709 MUST NOT be present. 711 Note that the GRE tunnel type defined in Section 3.2.5 can be used 712 instead of the MPLS-in-GRE tunnel type when it is necessary to 713 encapsulate MPLS in GRE. Including a TLV of the MPLS-in-GRE tunnel 714 type is equivalent to including a TLV of the GRE tunnel type that 715 also includes a Protocol Type sub-TLV (Section 3.4.1) specifying MPLS 716 as the protocol to be encapsulated. That is, if a TLV specifies 717 MPLS-in-GRE or if it includes a Protocol Type sub-TLV specifying 718 MPLS, the GRE tunnel advertised in that TLV MUST NOT be used for 719 carrying IP packets. 721 While it is not really necessary to have both the GRE and MPLS-in-GRE 722 tunnel types, both are included for reasons of backwards 723 compatibility. 725 3.3. Outer Encapsulation Sub-TLVs 727 The Encapsulation sub-TLV for a particular tunnel type allows one to 728 specify the values that are to be placed in certain fields of the 729 encapsulation header for that tunnel type. However, some tunnel 730 types require an outer IP encapsulation, and some also require an 731 outer UDP encapsulation. The Encapsulation sub-TLV for a given 732 tunnel type does not usually provide a way to specify values for 733 fields of the outer IP and/or UDP encapsulations. If it is necessary 734 to specify values for fields of the outer encapsulation, additional 735 sub-TLVs must be used. This document defines two such sub-TLVs. 737 If an outer encapsulation sub-TLV occurs in a TLV for a tunnel type 738 that does not use the corresponding outer encapsulation, the sub-TLV 739 is treated as if it were an unknown type of sub-TLV. 741 3.3.1. IPv4 DS Field 743 Most of the tunnel types that can be specified in the Tunnel 744 Encapsulation attribute require an outer IP encapsulation. The IPv4 745 Differentiated Services (DS) Field sub-TLV can be carried in the TLV 746 of any such tunnel type. It specifies the setting of the one-octet 747 Differentiated Services field in the outer IP encapsulation (see 748 [RFC2474]). The value field is always a single octet. 750 3.3.2. UDP Destination Port 752 Some of the tunnel types that can be specified in the Tunnel 753 Encapsulation attribute require an outer UDP encapsulation. 754 Generally there is a standard UDP Destination Port value for a 755 particular tunnel type. However, sometimes it is useful to be able 756 to use a non-standard UDP destination port. If a particular tunnel 757 type requires an outer UDP encapsulation, and it is desired to use a 758 UDP destination port other than the standard one, the port to be used 759 can be specified by including a UDP Destination Port sub-TLV. The 760 value field of this sub-TLV is always a two-octet field, containing 761 the port value. 763 3.4. Sub-TLVs for Aiding Tunnel Selection 765 3.4.1. Protocol Type Sub-TLV 767 The protocol type sub-TLV MAY be included in a given TLV to indicate 768 the type of the payload packets that may be encapsulated with the 769 tunnel parameters that are being signaled in the TLV. The value 770 field of the sub-TLV contains a 2-octet value from IANA's ethertype 771 registry [Ethertypes]. 773 For example, if we want to use three L2TPv3 sessions, one carrying 774 IPv4 packets, one carrying IPv6 packets, and one carrying MPLS 775 packets, the egress router will include three TLVs of L2TPv3 776 encapsulation type, each specifying a different Session ID and a 777 different payload type. The protocol type sub-TLV for these will be 778 IPv4 (protocol type = 0x0800), IPv6 (protocol type = 0x86dd), and 779 MPLS (protocol type = 0x8847), respectively. This informs the 780 ingress routers of the appropriate encapsulation information to use 781 with each of the given protocol types. Insertion of the specified 782 Session ID at the ingress routers allows the egress to process the 783 incoming packets correctly, according to their protocol type. 785 3.4.2. Color Sub-TLV 787 The color sub-TLV MAY be encoded as a way to "color" the 788 corresponding tunnel TLV. The value field of the sub-TLV is eight 789 octets long, and consists of a Color Extended Community, as defined 790 in Section 4.3. For the use of this sub-TLV and Extended Community, 791 please see Section 7. 793 Note that the high-order octet of this sub-TLV's value field MUST be 794 set to 3, and the next octet MUST be set to 0x0b. (Otherwise the 795 value field is not identical to a Color Extended Community.) 797 If a Color sub-TLV is not of the proper length, or the first two 798 octets of its value field are not 0x030b, the sub-TLV should be 799 treated as if it were an unrecognized sub-TLV (see Section 11). 801 3.5. Embedded Label Handling Sub-TLV 803 Certain BGP address families (corresponding to particular AFI/SAFI 804 pairs, e.g., 1/4, 2/4, 1/128, 2/128) have MPLS labels embedded in 805 their NLRIs. We will use the term "embedded label" to refer to the 806 MPLS label that is embedded in an NLRI, and the term "labeled address 807 family" to refer to any AFI/SAFI that has embedded labels. 809 Some of the tunnel types (e.g., VXLAN, VXLAN-GPE, and NVGRE) that can 810 be specified in the Tunnel Encapsulation attribute have an 811 encapsulation header containing "Virtual Network" identifier of some 812 sort. The Encapsulation sub-TLVs for these tunnel types may 813 optionally specify a value for the virtual network identifier. 815 Suppose a Tunnel Encapsulation attribute is attached to an UPDATE of 816 an embedded address family, and it is decided to use a particular 817 tunnel (specified in one of the attribute's TLVs) for transmitting a 818 packet that is being forwarded according to that UPDATE. When 819 forming the encapsulation header for that packet, different 820 deployment scenarios require different handling of the embedded label 821 and/or the virtual network identifier. The Embedded Label Handling 822 sub-TLV can be used to control the placement of the embedded label 823 and/or the virtual network identifier in the encapsulation. 825 The Embedded Label Handling sub-TLV may be included in any TLV of the 826 Tunnel Encapsulation attribute. If the Tunnel Encapsulation 827 attribute is attached to an UPDATE of a non-labeled address family, 828 the sub-TLV is treated as a no-op. If the sub-TLV is contained in a 829 TLV whose tunnel type does not have a virtual network identifier in 830 its encapsulation header, the sub-TLV is treated as a no-op. In 831 those cases where the sub-TLV is treated as a no-op, it SHOULD NOT be 832 stripped from the TLV before the UPDATE is forwarded. 834 The sub-TLV's Length field always contains the value 1, and its value 835 field consists of a single octet. The following values are defined: 837 1: The payload will be an MPLS packet with the embedded label at the 839 top of its label stack. 841 2: The embedded label is not carried in the payload, but is carried 842 either in the virtual network identifier field of the 843 encapsulation header, or else is ignored entirely. 845 Please see Section 8 for the details of how this sub-TLV is used when 846 it is carried by an UPDATE of a labeled address family. 848 3.6. MPLS Label Stack Sub-TLV 850 This sub-TLV allows an MPLS label stack ([RFC3032]) to be associated 851 with a particular tunnel. 853 The value field of this sub-TLV is a sequence of MPLS label stack 854 entries. The first entry in the sequence is the "topmost" label, the 855 final entry in the sequence is the "bottommost" label. When this 856 label stack is pushed onto a packet, this ordering MUST be preserved. 858 Each label stack entry has the following format: 860 0 1 2 3 861 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 863 | Label | TC |S| TTL | 864 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 866 Figure 9: MPLS Label Stack Sub-TLV 868 If a packet is to be sent through the tunnel identified in a 869 particular TLV, and if that TLV contains an MPLS Label Stack sub-TLV, 870 then the label stack appearing in the sub-TLV MUST be pushed onto the 871 packet. This label stack MUST be pushed onto the packet before any 872 other labels are pushed onto the packet. 874 In particular, if the Tunnel Encapsulation attribute is attached to a 875 BGP UPDATE of a labeled address family, the contents of the MPLS 876 Label Stack sub-TLV MUST be pushed onto the packet before the label 877 embedded in the NLRI is pushed onto the packet. 879 If the MPLS label stack sub-TLV is included in a TLV identifying a 880 tunnel type that uses virtual network identifiers (see Section 8), 881 the contents of the MPLS label stack sub-TLV MUST be pushed onto the 882 packet before the procdures of Section 8 are applied. 884 The number of label stack entries in the sub-TLV MUST be determined 885 from the sub-TLV length field. Thus it is not necessary to set the S 886 bit in any of the label stack entries of the sub-TLV, and the setting 887 of the S bit is ignored when parsing the sub-TLV. When the label 888 stack entries are pushed onto a packet that already has a label 889 stack, the S bits of all the entries MUST be cleared. When the label 890 stack entries are pushed onto a packet that does not already have a 891 label stack, the S bit of the bottommost label stack entry MUST be 892 set, and the S bit of all the other label stack entries MUST be 893 cleared.. 895 By default, the TC (Traffic Class) field ([RFC3032], [RFC5462]) of 896 each label stack entry is set to 0. This may of course be changed by 897 policy at the originator of the sub-TLV. When pushing the label 898 stack onto a packet, the TC of the label stack entries is preserved 899 by default. However, local policy at the router that is pushing on 900 the stack MAY cause modification of the TC values. 902 By default, the TTL (Time to Live) field of each label stack entry is 903 set to 255. This may be changed by policy at the originator of the 904 sub-TLV. When pushing the label stack onto a packet, the TTL of the 905 label stack entries is preserved by default. However, local policy 906 at the router that is pushing on the stack MAY cause modification of 907 the TTL values. If any label stack entry in the sub-TLV has a TTL 908 value of zero, the router that is pushing the stack on a packet MUST 909 change the value to a non-zero value. 911 Note that this sub-TLV can be appear within a TLV identifying any 912 type of tunnel, not just within a TLV identifying an MPLS tunnel. 913 However, if this sub-TLV appears within a TLV identifying an MPLS 914 tunnel (or an MPLS-in-X tunnel), this sub-TLV plays the same role 915 that would be played by an MPLS Encapsulation sub-TLV. Therefore, an 916 MPLS Encapsulation sub-TLV is not defined. 918 3.7. Prefix-SID Sub-TLV 920 [I-D.ietf-idr-bgp-prefix-sid] defines a BGP Path attribute known as 921 the "Prefix-SID Attribute". This attribute is defined to contain a 922 sequence of one or more TLVs, where each TLV is either a "Label- 923 Index" TLV, an "IPv6 SID (Segment Identifier)" TLV, or an "Originator 924 SRGB (Source Routing Global Block)" TLV. 926 In this document, we define a Prefix-SID sub-TLV. The value field of 927 the Prefix-SID sub-TLV can be set to any valid value of the value 928 field of a BGP Prefix-SID attribute, as defined in 929 [I-D.ietf-idr-bgp-prefix-sid]. 931 The Prefix-SID sub-TLV can occur in a TLV identifying any type of 932 tunnel. If an Originator SRGB is specified in the sub-TLV, that SRGB 933 MUST be interpreted to be the SRGB used by the tunnel's Remote 934 Endpoint. The Label-Index, if present, is the Segment Routing SID 935 that the tunnel's Remote Endpoint uses to represent the prefix 936 appearing in the NLRI field of the BGP UPDATE to which the Tunnel 937 Encapsulation attribute is attached. 939 If a Label-Index is present in the prefix-SID sub-TLV, then when a 940 packet is sent through the tunnel identified by the TLV, the 941 corresponding MPLS label MUST be pushed on the packet's label stack. 942 The corresponding MPLS label is computed from the Label-Index value 943 and the SRGB of the route's originator. 945 If the Originator SRGB is not present,it is assumed that the 946 originator's SRGB is known by other means. Such "other means" are 947 outside the scope of this document. 949 The corresponding MPLS label is pushed on after the processing of the 950 MPLS Label Stack sub-TLV, if present, as specified in Section 3.6. 951 It is pushed on before any other labels (e.g., a label embedded in 952 UPDATE's NLRI, or a label determined by the procedures of Section 8 953 are pushed on the stack. 955 The Prefix-SID sub-TLV has slightly different semantics than the 956 Prefix-SID attribute. When the Prefix-SID attribute is attached to a 957 given route, the BGP speaker that originally attached the attribute 958 is expected to be in the same Segment Routing domain as the BGP 959 speakers who receive the route with the attached attribute. The 960 Label-Index tells the receiving BGP speakers that the prefix-SID is 961 for the advertised prefix in that Segment Routing domain. When the 962 Prefix-SID sub-TLV is used, the BGP speaker at the head end of the 963 tunnel need even not be in the same Segment Routing Domain as the 964 tunnel's Remote Endpoint, and there is no implication that the 965 prefix-SID for the advertised prefix is the same in the Segment 966 Routing domains of the BGP speaker that originated the sub-TLV and 967 the BGP speaker that received it. 969 4. Extended Communities Related to the Tunnel Encapsulation Attribute 971 4.1. Encapsulation Extended Community 973 The Encapsulation Extended Community is a Transitive Opaque Extended 974 Community. This Extended Community may be attached to a route of any 975 AFI/SAFI to which the Tunnel Encapsulation attribute may be attached. 976 Each such Extended Community identifies a particular tunnel type. If 977 the Encapsulation Extended Community identifies a particular tunnel 978 type, its semantics are exactly equivalent to the semantics of a 979 Tunnel Encapsulation attribute Tunnel TLV for which the following 980 three conditions all hold: 982 1. it identifies the same tunnel type, 984 2. it has a Remote Endpoint sub-TLV for which one of the following 985 two conditions holds: 987 A. its "Address Family" subfield contains zero, or 989 B. its "Address" subfield contains the same IP address that 990 appears in the next hop field of the route to which the 991 Tunnel Encapsulation attribute is attached 993 3. it has no other sub-TLVs. 995 We will refer to such a Tunnel TLV as a "barebones" Tunnel TLV. 997 The Encapsulation Extended Community was first defined in [RFC5512]. 998 While it provides only a small subset of the functionality of the 999 Tunnel Encapsulation attribute, it is used in a number of deployed 1000 applications, and is still needed for backwards compatibility. To 1001 ensure backwards compatibility, this specification establishes the 1002 following rules: 1004 1. If the Tunnel Encapsulation attribute of a given route contains a 1005 barebones Tunnel TLV identifying a particular tunnel type, an 1006 Encapsulation Extended Community identifying the same tunnel type 1007 SHOULD be attached to the route. 1009 2. If the Encapsulation Extended Community identifying a particular 1010 tunnel type is attached to a given route, the corresponding 1011 barebones Tunnel TLV MAY be omitted from the Tunnel Encapsulation 1012 attribute. 1014 3. Suppose a particular route has both (a) an Encapsulation Extended 1015 Community specifying a particular tunnel type, and (b) a Tunnel 1016 Encapsulation attribute with a barebones Tunnel TLV specifying 1017 that same tunnel type. Both (a) and (b) MUST be interpreted as 1018 denoting the same tunnel. 1020 In short, in situations where one could use either the Encapsulation 1021 Extended Community or a barebones Tunnel TLV, one may use either or 1022 both. However, to ensure backwards compatibility with applications 1023 that do not support the Tunnel Encapsulation attribute, it is 1024 preferable to use the Encapsulation Extended Community. If the 1025 Extended Community (identifying a particular tunnel type) is present, 1026 the corresponding Tunnel TLV is optional. 1028 Note that for tunnel types of the form "X-in-Y", e.g., MPLS-in-GRE, 1029 the Encapsulation Extended Community implies that only packets of the 1030 specified payload type "X" are to be carried through the tunnel of 1031 type "Y". 1033 In the remainder of this specification, when we speak of a route as 1034 containing a Tunnel Encapsulation attribute with a TLV identifying a 1035 particular tunnel type, we are implicitly including the case where 1036 the route contains a Tunnel Encapsulation Extended Community 1037 identifying that tunnel type. 1039 4.2. Router's MAC Extended Community 1041 [I-D.ietf-bess-evpn-inter-subnet-forwarding] defines a Router's MAC 1042 Extended Community. This Extended Community provides information 1043 that may conflict with information in one or more of the 1044 Encapsulation Sub-TLVs of a Tunnel Encapsulation attribute. In case 1045 of such a conflict, the information in the Encapsulation Sub-TLV 1046 takes precedence. 1048 4.3. Color Extended Community 1050 The Color Extended Community is a Transitive Opaque Extended 1051 Community with the following encoding: 1053 0 1 2 3 1054 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1055 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1056 | 0x03 | 0x0b | Reserved | 1057 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1058 | Color Value | 1059 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1061 Figure 10: Color Extended Community 1063 For the use of this Extended Community please see Section 7. 1065 5. Semantics and Usage of the Tunnel Encapsulation attribute 1067 [RFC5512] specifies the use of the Tunnel Encapsulation attribute in 1068 BGP UPDATE messages of AFI/SAFI 1/7 and 2/7. That document restricts 1069 the use of this attribute to UPDATE messsages of those SAFIs. This 1070 document removes that restriction. 1072 The BGP Tunnel Encapsulation attribute MAY be carried in any BGP 1073 UPDATE message whose AFI/SAFI is 1/1 (IPv4 Unicast), 2/1 (IPv6 1074 Unicast), 1/4 (IPv4 Labeled Unicast), 2/4 (IPv6 Labeled Unicast), 1075 1/128 (VPN-IPv4 Labeled Unicast), 2/128 (VPN-IPv6 Labeled Unicast), 1076 or 25/70 (Ethernet VPN, usually known as EVPN)). Use of the Tunnel 1077 Encapsulation attribute in BGP UPDATE messages of other AFI/SAFIs is 1078 outside the scope of this document. 1080 It has been suggested that it may sometimes be useful to attach a 1081 Tunnel Encapsulation attribute to a BGP UPDATE message that is also 1082 carrying a PMSI (Provider Multicast Service Interface) Tunnel 1083 attribute [RFC6514]. If the PMSI Tunnel attribute specifies an IP 1084 tunnel, the Tunnel Encapsulation attribute could be used to provide 1085 additional information about the IP tunnel. The usage of the Tunnel 1086 Encapsulation attribute in combination with the PMSI Tunnel attribute 1087 is outside the scope of this document. 1089 The decision to attach a Tunnel Encapsulation attribute to a given 1090 BGP UPDATE is determined by policy. The set of TLVs and sub-TLVs 1091 contained in the attribute is also determined by policy. 1093 When the Tunnel Encapsulation attribute is carried in an UPDATE of 1094 one of the AFI/SAFIs specified in the previous paragraph, each TLV 1095 MUST have a Remote Endpoint sub-TLV. If a TLV that does not have a 1096 Remote Endpoint sub-TLV, that TLV should be treated as if it had a 1097 malformed Remote Endpoint sub-TLV (see Section 3.1). 1099 Suppose that: 1101 o a given packet P must be forwarded by router R; 1103 o the path along which P is to be forwarded is determined by BGP 1104 UPDATE U; 1106 o UPDATE U has a Tunnel Encapsulation attribute, containing at least 1107 one TLV that identifies a "feasible tunnel" for packet P. A 1108 tunnel is considered feasible if it has the following three 1109 properties: 1111 * The tunnel type is supported (i.e., router R knows how to set 1112 up tunnels of that type, how to create the encapsulation header 1113 for tunnels of that type, etc.) 1115 * The tunnel is of a type that can be used to carry packet P 1116 (e.g., an MPLS-in-UDP tunnel would not be a feasible tunnel for 1117 carrying an IP packet, UNLESS the IP packet can first be 1118 converted to an MPLS packet). 1120 * The tunnel is specified in a TLV whose Remote Endpoint sub-TLV 1121 identifies an IP address that is reachable. 1123 Then router R SHOULD send packet P through one of the feasible 1124 tunnels identified in the Tunnel Encapsulation attribute of UPDATE U. 1126 If the Tunnel Encapsulation attribute contains several TLVs (i.e., if 1127 it specifies several tunnels), router R may choose any one of those 1128 tunnels, based upon local policy. If any of tunnels' TLVs contain 1129 the Color sub-TLV(Section 3.4.2) and/or the Protocol Type sub-TLV 1130 (Section 3.4.1, the choice of tunnel may be influenced by these sub- 1131 TLVs. 1133 Note that if none of the TLVs specifies the MPLS tunnel type, a Label 1134 Switched Path SHOULD NOT be used unless none of the TLVs specifies a 1135 feasible tunnel. 1137 If a particular tunnel is not feasible at some moment because its 1138 Remote Endpoint cannot be reached at that moment, the tunnel may 1139 become feasible at a later time (when its endpoint becomes 1140 reachable). Router R SHOULD take note of this. If router R is 1141 already using a different tunnel, it MAY switch to the tunnel that 1142 just became feasible, or it MAY decide to continue using the tunnel 1143 that it is already using. How this decision is made is outside the 1144 scope of this document. 1146 A TLV specifying a non-feasible tunnel is not considered to be 1147 malformed or erroneous in any way, and the TLV SHOULD NOT be stripped 1148 from the Tunnel Encapsulation attribute before redistribution. 1150 In addition to the sub-TLVs already defined, additional sub-TLVs may 1151 be defined that affect the choice of tunnel to be used, or that 1152 affect the contents of the tunnel encapsulation header. The 1153 documents that define any such additional sub-TLVs must specify the 1154 effect that including the sub-TLV is to have. 1156 Once it is determined to send a packet through the tunnel specified 1157 in a particular TLV of a particular Tunnel Encapsulation attribute, 1158 then the tunnel's remote endpoint address is the IP address contained 1159 in the sub-TLV. If the TLV contains a Remote Endpoint sub-TLV whose 1160 value field is all zeroes, then the tunnel's remote endpoint is the 1161 IP address specified as the Next Hop of the BGP Update containing the 1162 Tunnel Encapsulation attribute. The address of the remote endpoint 1163 generally appears in a "destination address" field of the 1164 encapsulation. 1166 The full set of procedures for sending a packet through a particular 1167 tunnel type to a particular remote endpoint depends upon the tunnel 1168 type, and is outside the scope of this document. Note that some 1169 tunnel types may require the execution of an explicit tunnel setup 1170 protocol before they can be used for carrying data. Other tunnel 1171 types may not require any tunnel setup protocol. 1173 Sending a packet through a tunnel always requires that the packet be 1174 encapsulated, with an encapsulation header that is appropriate for 1175 the tunnel type. The contents of the tunnel encapsulation header MAY 1176 be influenced by the Encapsulation sub-TLV. If there is no 1177 Encapsulation sub-TLV present, the router transmitting the packet 1178 through the tunnel must have a priori knowledge (e.g., by 1179 provisioning) of how to fill in the various fields in the 1180 encapsulation header. 1182 Whenever a new Tunnel Type TLV is defined, the specification of that 1183 TLV should describe (or reference) the procedures for creating the 1184 encapsulation header used to forward packets through that tunnel 1185 type. If a tunnel type codepoint is assigned in the IANA "BGP Tunnel 1186 Encapsulation Tunnel Types" registry, but there is no corresponding 1187 specification that defines an Encapsulation sub-TLV for that tunnel 1188 type, the transmitting endpoint of such a tunnel is presumed to know 1189 a priori how to form the encapsulation header for that tunnel type. 1191 If a Tunnel Encapsulation attribute specifies several tunnels, the 1192 way in which a router chooses which one to use is a matter of policy, 1193 subject to the following constraint: if a router can determine that a 1194 given tunnel is not functional, it MUST NOT use that tunnel. In 1195 particular, if the tunnel is identified in a TLV that has a Remote 1196 Endpoint sub-TLV, and if the IP address specified in the sub-TLV is 1197 not reachable from router R, then the tunnel SHOULD be considered 1198 non-functional. Other means of determining whether a given tunnel is 1199 functional MAY be used; specification of such means is outside the 1200 scope of this specification. Of course, if a non-functional tunnel 1201 later becomes functional, router R SHOULD reevaluate its choice of 1202 tunnels. 1204 If router R determines that it cannot use any of the tunnels 1205 specified in the Tunnel Encapsulation attribute, it MAY either drop 1206 packet P, or it MAY transmit packet P as it would had the Tunnel 1207 Encapsulation attribute not been present. This is a matter of local 1208 policy. By default, the packet SHOULD be transmitted as if the 1209 Tunnel Encapsulation attribute had not been present. 1211 A Tunnel Encapsulation attribute may contain several TLVs that all 1212 specify the same tunnel type. Each TLV should be considered as 1213 specifying a different tunnel. Two tunnels of the same type may have 1214 different Remote Endpoint sub-TLVs, different Encapsulation sub-TLVs, 1215 etc. Choosing between two such tunnels is a matter of local policy. 1217 Once router R has decided to send packet P through a particular 1218 tunnel, it encapsulates packet P appropriately and then forwards it 1219 according to the route that leads to the tunnel's remote endpoint. 1220 This route may itself be a BGP route with a Tunnel Encapsulation 1221 attribute. If so, the encapsulated packet is treated as the payload 1222 and is encapsulated according to the Tunnel Encapsulation attribute 1223 of that route. That is, tunnels may be "stacked". 1225 Notwithstanding anything said in this document, a BGP speaker MAY 1226 have local policy that influences the choice of tunnel, and the way 1227 the encapsulation is formed. A BGP speaker MAY also have a local 1228 policy that tells it to ignore the Tunnel Encapsulation attribute 1229 entirely or in part. Of course, interoperability issues must be 1230 considered when such policies are put into place. 1232 6. Routing Considerations 1234 6.1. No Impact on BGP Decision Process 1236 The presence of the Tunnel Encapsulation attribute does not affect 1237 the BGP bestpath selection algorithm. 1239 Under certain circumstances, this may lead to counter-intuitive 1240 consequences. For example, suppose: 1242 o router R1 receives a BGP UPDATE message from router R2, such that 1244 * the NLRI of that UPDATE is prefix X, 1246 * the UPDATE contains a Tunnel Encapsulation attribute specifying 1247 two tunnels, T1 and T2, 1249 * R1 cannot use tunnel T1 or tunnel T2, either because the tunnel 1250 remote endpoint is not reachable or because R1 does not support 1251 that kind of tunnel 1253 o router R1 receives a BGP UPDATE message from router R3, such that 1255 * the NLRI of that UPDATE is prefix X, 1257 * the UPDATE contains a Tunnel Encapsulation attribute specifying 1258 two tunnels, T3 and T4, 1260 * R1 can use at least one of the two tunnels 1262 Since the Tunnel Encapsulation attribute does not affect bestpath 1263 selection, R1 may well install the route from R2 rather than the 1264 route from R3, even though R2's route contains no usable tunnels. 1266 This possibility must be kept in mind whenever a Remote Endpoint sub- 1267 TLV carried by a given UPDATE specifies an IP address that is 1268 different than the next hop of that UPDATE. 1270 6.2. Looping, Infinite Stacking, Etc. 1272 Consider a packet destined for address X. Suppose a BGP UPDATE for 1273 address prefix X carries a Tunnel Encapsulation attribute that 1274 specifies a remote tunnel endpoint of Y. And suppose that a BGP 1275 UPDATE for address prefix Y carries a Tunnel Encapsulation attribute 1276 that specifies a Remote Endpoint of X. It is easy to see that this 1277 will cause an infinite number of encapsulation headers to be put on 1278 the given packet. 1280 This could happen as a result of misconfiguration, either accidental 1281 or intentional. It could also happen if the Tunnel Encapsulation 1282 attribute were altered by a malicious agent. Implementations should 1283 be aware of this. This document does not specify a maximum number of 1284 recursions; that is an implementation-specific matter. 1286 Improper setting (or malicious altering) of the Tunnel Encapsulation 1287 attribute could also cause data packets to loop. Suppose a BGP 1288 UPDATE for address prefix X carries a Tunnel Encapsulation attribute 1289 that specifies a remote tunnel endpoint of Y. Suppose router R 1290 receives and processes the update. When router R receives a packet 1291 destined for X, it will apply the encapsulation and send the 1292 encapsulated packet to Y. Y will decapsulate the packet and forward 1293 it further. If Y is further away from X than is router R, it is 1294 possible that the path from Y to X will traverse R. This would cause 1295 a long-lasting routing loop. The control plane itself cannot detect 1296 this situation, though a TTL field in the payload packets would 1297 presumably prevent any given packet from looping infinitely. 1299 These possibilities must also be kept in mind whenever the Remote 1300 Endpoint for a given prefix differs from the BGP next hop for that 1301 prefix. 1303 7. Recursive Next Hop Resolution 1305 Suppose that: 1307 o a given packet P must be forwarded by router R1; 1309 o the path along which P is to be forwarded is determined by BGP 1310 UPDATE U1; 1312 o UPDATE U1 does not have a Tunnel Encapsulation attribute; 1314 o the next hop of UPDATE U1 is router R2; 1316 o the best path to router R2 is a BGP route that was advertised in 1317 UPDATE U2; 1319 o UPDATE U2 has a Tunnel Encapsulation attribute. 1321 Then packet P SHOULD be sent through one of the tunnels identified in 1322 the Tunnel Encapsulation attribute of UPDATE U2. See Section 5 for 1323 further details. 1325 However, suppose that one of the TLVs in U2's Tunnel Encapsulation 1326 attribute contains the Color Sub-TLV. In that case, packet P SHOULD 1327 NOT be sent through the tunnel identified in that TLV, unless U1 is 1328 carrying the Color Extended Community that is identified in U2's 1329 Color Sub-TLV. 1331 Note that if UPDATE U1 and UPDATE U2 both have Tunnel Encapsulation 1332 attributes, packet P will be carried through a pair of nested 1333 tunnels. P will first be encapsulated based on the Tunnel 1334 Encapsulation attribute of U1. This encapsulated packet then becomes 1335 the payload, and is encapsulated based on the Tunnel Encapsulation 1336 attribute of U2. This is another way of "stacking" tunnels (see also 1337 Section 5. 1339 The procedures in this section presuppose that U1's next hop resolves 1340 to a BGP route, and that U2's next hop resolves (perhaps after 1341 further recursion) to a non-BGP route. 1343 8. Use of Virtual Network Identifiers and Embedded Labels when Imposing 1344 a Tunnel Encapsulation 1346 If the TLV specifying a tunnel contains an MPLS Label Stack sub-TLV, 1347 then when sending a packet through that tunnel, the procedures of 1348 Section 3.6 are applied before the procedures of this section. 1350 If the TLV specifying a tunnel contains a Prefix-SID sub-TLV, the 1351 procedures of Section 3.7 are applied before the procedures of this 1352 section. If the TLV also contains an MPLS Label Stack sub-TLV, the 1353 procedures of Section 3.6 are applied before the procedures of 1354 Section 3.7. 1356 8.1. Tunnel Types without a Virtual Network Identifier Field 1358 If a Tunnel Encapsulation attribute is attached to an UPDATE of a 1359 labeled address family, there will be one or more labels specified in 1360 the UPDATE's NLRI. When a packet is sent through a tunnel specified 1361 in one of the attribute's TLVs, and that tunnel type does not contain 1362 a virtual network identifier field, the label or labels from the NLRI 1363 are pushed on the packet's label stack. The resulting MPLS packet is 1364 then further encapsulated, as specified by the TLV. 1366 8.2. Tunnel Types with a Virtual Network Identifier Field 1368 Three of the tunnel types that can be specified in a Tunnel 1369 Encapsulation TLV have virtual network identifier fields in their 1370 encapsulation headers. In the VXLAN and VXLAN-GPE encapsulations, 1371 this field is called the VNI (Virtual Network Identifier) field; in 1372 the NVGRE encapsulation, this field is called the VSID (Virtual 1373 Subnet Identifier) field. 1375 When one of these tunnel encapsulations is imposed on a packet, the 1376 setting of the virtual network identifier field in the encapsulation 1377 header depends upon the contents of the Encapsulation sub-TLV (if one 1378 is present). When the Tunnel Encapsulation attribute is being 1379 carried on a BGP UPDATE of a labeled address family, the setting of 1380 the virtual network identifier field also depends upon the contents 1381 of the Embedded Label Handling sub-TLV (if present). 1383 This section specifies the procedures for choosing the value to set 1384 in the virtual network identifier field of the encapsulation header. 1385 These procedures apply only when the tunnel type is VXLAN, VXLAN-GPE, 1386 or NVGRE. 1388 8.2.1. Unlabeled Address Families 1390 This sub-section applies when: 1392 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of 1393 an unlabeled address family, and 1395 o at least one of the attribute's TLVs identifies a tunnel type that 1396 uses a virtual network identifier, and 1398 o it has been determined to send a packet through one of those 1399 tunnels. 1401 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 1402 whose V bit is set, the virtual network identifier field of the 1403 encapsulation header is set to the value of the virtual network 1404 identifier field of the Encapsulation sub-TLV. 1406 Otherwise, the virtual network identifier field of the encapsulation 1407 header is set to a configured value; if there is no configured value, 1408 the tunnel cannot be used. 1410 8.2.2. Labeled Address Families 1412 This sub-section applies when: 1414 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of a 1415 labeled address family, and 1417 o at least one of the attribute's TLVs identifies a tunnel type that 1418 uses a virtual network identifier, and 1420 o it has been determined to send a packet through one of those 1421 tunnels. 1423 8.2.2.1. When a Valid VNI has been Signaled 1425 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 1426 whose V bit is set, the virtual network identifier field of the 1427 encapsulation header is set as follows: 1429 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1430 is 1, then the virtual network identifier field of the 1431 encapsulation header is set to the value of the virtual network 1432 identifier field of the Encapsulation sub-TLV. 1434 The embedded label (from the NLRI of the route that is carrying 1435 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1436 label stack in the encapsulation payload. 1438 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1439 if contains an Embedded Label Handling sub-TLV whose value is 2, 1440 the embedded label is ignored entirely, and the virtual network 1441 identifier field of the encapsulation header is set to the value 1442 of the virtual network identifier field of the Encapsulation sub- 1443 TLV. 1445 8.2.2.2. When a Valid VNI has not been Signaled 1447 If the TLV identifying the tunnel does not contain an Encapsulation 1448 sub-TLV whose V bit is set, the virtual network identifier field of 1449 the encapsulation header is set as follows: 1451 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1452 is 1, then the virtual network identifier field of the 1453 encapsulation header is set to a configured value. 1455 If there is no configured value, the tunnel cannot be used. 1457 The embedded label (from the NLRI of the route that is carrying 1458 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1459 label stack in the encapsulation payload. 1461 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1462 if it contains an Embedded Label Handling sub-TLV whose value is 1463 2, the embedded label is copied into the virtual network 1464 identifier field of the encapsulation header. 1466 In this case, the payload may or may not contain an MPLS label 1467 stack, depending upon other factors. If the payload does contain 1468 an MPLS lable stack, the embedded label does not appear in that 1469 stack. 1471 9. Applicability Restrictions 1473 In a given UPDATE of a labeled address family, the label embedded in 1474 the NLRI is generally a label that is meaningful only to the router 1475 whose address appears as the next hop. Certain of the procedures of 1476 Section 8.2.2.1 or Section 8.2.2.2 cause the embedded label to be 1477 carried by a data packet to the router whose address appears in the 1478 Remote Endpoint sub-TLV. If the Remote Endpoint sub-TLV does not 1479 identify the same router that is the next hop, sending the packet 1480 through the tunnel may cause the label to be misinterpreted at the 1481 tunnel's remote endpoint. This may cause misdelivery of the packet. 1483 Therefore the embedded label MUST NOT be carried by a data packet 1484 traveling through a tunnel unless it is known that the label will be 1485 properly interpreted at the tunnel's remote endpoint. How this is 1486 known is outside the scope of this document. 1488 Note that if the Tunnel Encapsulation attribute is attached to a VPN- 1489 IP route [RFC4364], and if Inter-AS "option b" (see section 10 of 1490 [RFC4364] is being used, and if the Remote Endpoint sub-TLV contains 1491 an IP address that is not in same AS as the router receiving the 1492 route, it is very likely that the embedded label has been changed. 1493 Therefore use of the Tunnel Encapsulation attribute in an "Inter-AS 1494 option b" scenario is not supported. 1496 10. Scoping 1498 The Tunnel Encapsulation attribute is defined as a transitive 1499 attribute, so that it may be passed along by BGP speakers that do not 1500 recognize it. However, it is intended that the Tunnel Encapsulation 1501 attribute be used only within a well-defined scope, e.g., within a 1502 set of Autonomous Systems that belong to a single administrative 1503 entity. If the attribute is distributed beyond its intended scope, 1504 packets may be sent through tunnels in a manner that is not intended. 1506 To prevent the Tunnel Encapsulation attribute from being distributed 1507 beyond its intended scope, any BGP speaker that understands the 1508 attribute MUST be able to filter the attribute from incoming BGP 1509 UPDATE messages. When the attribute is filtered from an incoming 1510 UPDATE, the attribute is neither processed nor redistributed. This 1511 filtering SHOULD be possible on a per-BGP-session basis. For each 1512 session, filtering of the attribute on incoming UPDATEs MUST be 1513 enabled by default. 1515 In addition, any BGP speaker that understands the attribute MUST be 1516 able to filter the attribute from outgoing BGP UPDATE messages. This 1517 filtering SHOULD be possible on a per-BGP-session basis. For each 1518 session, filtering of the attribute on outgoing UPDATEs MUST be 1519 enabled by default. 1521 11. Error Handling 1523 The Tunnel Encapsulation attribute is a sequence of TLVs, each of 1524 which is a sequence of sub-TLVs. The final octet of a TLV is 1525 determined by its length field. Similarly, the final octet of a sub- 1526 TLV is determined by its length field. The final octet of a TLV MUST 1527 also be the final octet of its final sub-TLV. If this is not the 1528 case, the TLV MUST be considered to be malformed. A TLV that is 1529 found to be malformed for this reason MUST NOT be processed, and MUST 1530 be stripped from the Tunnel Encapsulation attribute before the 1531 attribute is propagated. Subsequent TLVs in the Tunnel Encapsulation 1532 attribute may still be valid, in which case they MUST be processed 1533 and redistributed normally. 1535 If a Tunnel Encapsulation attribute does not have any valid TLVs, or 1536 it does not have the transitive bit set, the "Attribute Discard" 1537 procedure of [RFC7606] is applied. 1539 If a Tunnel Encapsulation attribute can be parsed correctly, but 1540 contains a TLV whose tunnel type is not recognized by a particular 1541 BGP speaker, that BGP speaker MUST NOT consider the attribute to be 1542 malformed. Rather, the TLV with the unrecognized tunnel type MUST be 1543 ignored, and the BGP speaker MUST interpret the attribute as if that 1544 TLV had not been present. If the route carrying the Tunnel 1545 Encapsulation attribute is propagated with the attribute, the 1546 unrecognized TLV SHOULD remain in the attribute. 1548 If a TLV of a Tunnel Encapsulation attribute contains a sub-TLV that 1549 is not recognized by a particular BGP speaker, the BGP speaker SHOULD 1550 process that TLV as if the unrecognized sub-TLV had not been present. 1551 If the route carrying the Tunnel Encapsulation attribute is 1552 propagated with the attribute, the unrecognized TLV SHOULD remain in 1553 the attribute. 1555 If the type code of a sub-TLV appears as "reserved" in the IANA "BGP 1556 Tunnel Encapsulation Attribute Sub-TLVs" registry, the sub-TLV MUST 1557 be treated as an unrecognized sub-TLV. 1559 In general, if a TLV contains a sub-TLV that is malformed (e.g., 1560 contains a length field whose value is not legal for that sub-TLV), 1561 the sub-TLV should be treated as if it were an unrecognized sub-TLV. 1562 This document specifies one exception to this rule -- within a tunnel 1563 encapsulation attribute that is carried by a BGP UPDATE whose AFI/ 1564 SAFI is one of those explicitly listed in the second paragraph of 1565 Section 5, if a TLV contains a malformed Remote Endpoint sub-TLV (as 1566 defined in Section 3.1, the entire TLV MUST be ignored, and SHOULD be 1567 removed from the Tunnel Encapsulation attribute before the route 1568 carrying that attribute is redistributed. 1570 Within a tunnel encapsulation attribute that is carried by a BGP 1571 UPDATE whose AFI/SAFI is one of those explicitly listed in the second 1572 paragraph of Section 5, a TLV that does not contain exactly one 1573 Remote Endpoint sub-TLV MUST be treated as if it contained a 1574 malformed Remote Endpoint sub-TLV. 1576 A TLV identifying a particular tunnel type may contain a sub-TLV that 1577 is meaningless for that tunnel type. For example, perhaps the TLV 1578 contains a "UDP Destination Port" sub-TLV, but the identified tunnel 1579 type does not use UDP encapsulation at all. Sub-TLVs of this sort 1580 SHOULD be treated as no-ops. That is, they SHOULD NOT affect the 1581 creation of the encapsulation header. However, the sub-TLV MUST NOT 1582 be considered to be malformed, and MUST NOT be removed from the TLV 1583 before the route carrying the Tunnel Encapsulation attribute is 1584 redistributed. (This allows for the possibility that such sub-TLVs 1585 may be given a meaning, in the context of the specified tunnel type, 1586 in the future.) 1588 There is no significance to the order in which the TLVs occur within 1589 the Tunnel Encapsulation attribute. Multiple TLVs may occur for a 1590 given tunnel type; each such TLV is regarded as describing a 1591 different tunnel. 1593 The following sub-TLVs defined in this document SHOULD NOT occur more 1594 than once in a given Tunnel TLV: Remote Endpoint (discussed above), 1595 Encapsulation, IPv4 DS, UDP Destination Port, Embedded Label 1596 Handling, MPLS Label Stack, Prefix-SID. If a Tunnel TLV has more 1597 than one of any of these sub-TLVs, all but the first occurrence of 1598 each such sub-TLV type MUST be treated as a no-op. However, the 1599 Tunnel TLV containing them MUST NOT be considered to be malformed, 1600 and all the sub-TLVs SHOULD be propagated if the route carrying the 1601 Tunnel Encapsulation attribute is propagated. 1603 The following sub-TLVs defined in this document may appear zero or 1604 more times in a given Tunnel TLV: Protocol Type, Color. Each 1605 occurrence of such sub-TLVs is meaningful. For example, the Color 1606 sub-TLV may appear multiple times to assign multiple colors to a 1607 tunnel. 1609 12. IANA Considerations 1610 12.1. Subsequent Address Family Identifiers 1612 IANA is requested to modify the "Subsequent Address Family 1613 Identifiers" registry to indicate that the Encapsulation SAFI is 1614 deprecated. This document should be the reference. 1616 12.2. BGP Path Attributes 1618 IANA has previously assigned value 23 from the "BGP Path Attributes" 1619 Registry to "Tunnel Encapsulation Attribute". IANA is requested to 1620 add this document as a reference. 1622 12.3. Extended Communities 1624 IANA has previously assigned values from the "Transitive Opaque 1625 Extended Community" type Registry to the "Color Extended Community" 1626 (sub-type 0x0b), and to the "Encapsulation Extended 1627 Community"(0x030c). IANA is requested to add this document as a 1628 reference for both assignments. 1630 12.4. BGP Tunnel Encapsulation Attribute Sub-TLVs 1632 IANA is requested to add the following note to the "BGP Tunnel 1633 Encapsulation Attribute Sub-TLVs" registry: 1635 If the Sub-TLV Type is in the range from 0 to 127 inclusive, the 1636 Sub-TLV Length field contains one octet. If the Sub-TLV Type is 1637 in the range from 128-255 inclusive, the Sub-TLV Length field 1638 contains two octets. 1640 IANA is requested to change the registration policy of the "BGP 1641 Tunnel Encapsulation Attribute Sub-TLVs" registry to the following: 1643 o The values 0 and 255 are reserved. 1645 o The values in the range 1-63 and 128-191 are to be allocated using 1646 the "Standards Action" registration procedure. 1648 o The values in the range 64-125 and 192-252 are to be allocated 1649 using the "First Come, First Served" registration procedure. 1651 o The values in the range 126-127 and 253-254 are reserved for 1652 experimental use; IANA shall not allocate values from this range. 1654 IANA has assigned the following codepoints in the "BGP Tunnel 1655 Encapsulation Attribute Sub-TLVs registry: 1657 6: Remote Endpoint 1658 7: IPv4 DS Field 1660 8: UDP Destination Port 1662 9: Embedded Label Handling 1664 10: MPLS Label Stack 1666 11: Prefix SID 1668 IANA has previously assigned codepoints from the "BGP Tunnel 1669 Encapsulation Attribute Sub-TLVs" registry for "Encapsulation", 1670 "Protocol Type", and "Color". IANA is requested to add this document 1671 as a reference. 1673 12.5. Tunnel Types 1675 IANA is requested to add this document as a reference for tunnel 1676 types 8 (VXLAN), 9 (NVGRE), 11 (MPLS-in-GRE), and 12 (VXLAN-GPE) in 1677 the "BGP Tunnel Encapsulation Tunnel Types" registry. 1679 IANA is requested to add this document as a reference for tunnel 1680 types 1 (L2TPv3), 2 (GRE), and 7 (IP in IP) in the "BGP Tunnel 1681 Encapsulation Tunnel Types" registry. 1683 13. Security Considerations 1685 The Tunnel Encapsulation attribute can cause traffic to be diverted 1686 from its normal path, especially when the Remote Endpoint sub-TLV is 1687 used. This can have serious consequences if the attribute is added 1688 or modified illegitimately, as it enables traffic to be "hijacked". 1690 The Remote Endpoint sub-TLV contains both an IP address and an AS 1691 number. BGP Origin Validation [RFC6811] can be used to obtain 1692 assurance that the given IP address belongs to the given AS. While 1693 this provides some protection against misconfiguration, it does not 1694 prevent a malicious agent from inserting a sub-TLV that will appear 1695 valid. 1697 Before sending a packet through the tunnel identified in a particular 1698 TLV of a Tunnel Encapsulation attribute, it may be advisable to use 1699 BGP Origin Validation to obtain the following additional assurances: 1701 o the origin AS of the route carrying the Tunnel Encapsulation 1702 attribute is correct; 1704 o the origin AS of the route to the IP address specified in the 1705 Remote Endpoint sub-TLV is correct, and is the same AS that is 1706 specified in the Remote Endpoint sub-TLV. 1708 One then has some level of assurance that the tunneled traffic is 1709 going to the same destination AS that it would have gone to had the 1710 Tunnel Encapsulation attribute not been present. However, this may 1711 not suit all use cases, and in any event is not very strong 1712 protection against hijacking. 1714 For these reasons, BGP Origin Validation should not be relied upon 1715 exclusively, and the filtering procedures of Section 10 should always 1716 be in place. 1718 Increased protection can be obtained by using BGPSEC [RFC8205] to 1719 ensure that the route carrying the Tunnel Encapsulation attribute, 1720 and the routes to the Remote Endpoint of each specified tunnel, have 1721 not been altered illegitimately. 1723 If BGP Origin Validation is used as specified above, and the tunnel 1724 specified in a particular TLV of a Tunnel Encapsulation attribute is 1725 therefore regarded as "suspicious", that tunnel should not be used. 1726 Other tunnels specified in (other TLVs of) the Tunnel Encapsulation 1727 attribute may still be used. 1729 14. Acknowledgments 1731 This document contains text from RFC5512, co-authored by Pradosh 1732 Mohapatra. The authors of the current document wish to thank Pradosh 1733 for his contribution. RFC5512 itself built upon prior work by Gargi 1734 Nalawade, Ruchi Kapoor, Dan Tappan, David Ward, Scott Wainner, Simon 1735 Barber, and Chris Metz, whom we also thank for their contributions. 1737 The authors wish to thank Lou Berger, Ron Bonica, Martin Djernaes, 1738 John Drake, Satoru Matsushima, Dhananjaya Rao, John Scudder, Ravi 1739 Singh, Thomas Morin, Xiaohu Xu, and Zhaohui Zhang for their review, 1740 comments, and/or helpful discussions. 1742 15. Contributor Addresses 1744 Below is a list of other contributing authors in alphabetical order: 1746 Randy Bush 1747 Internet Initiative Japan 1748 5147 Crystal Springs 1749 Bainbridge Island, Washington 98110 1750 United States 1752 Email: randy@psg.com 1754 Robert Raszuk 1755 Bloomberg LP 1756 731 Lexington Ave 1757 New York City, NY 10022 1758 United States 1760 Email: robert@raszuk.net 1762 16. References 1764 16.1. Normative References 1766 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1767 Requirement Levels", BCP 14, RFC 2119, 1768 DOI 10.17487/RFC2119, March 1997, 1769 . 1771 [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation 1772 Subsequent Address Family Identifier (SAFI) and the BGP 1773 Tunnel Encapsulation Attribute", RFC 5512, 1774 DOI 10.17487/RFC5512, April 2009, 1775 . 1777 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1778 Patel, "Revised Error Handling for BGP UPDATE Messages", 1779 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1780 . 1782 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1783 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1784 May 2017, . 1786 16.2. Informative References 1788 [Ethertypes] 1789 "IANA Ethertype Registry", 1790 . 1793 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 1794 Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. 1795 Rabadan, "Integrated Routing and Bridging in EVPN", draft- 1796 ietf-bess-evpn-inter-subnet-forwarding-07 (work in 1797 progress), February 2019. 1799 [I-D.ietf-idr-bgp-prefix-sid] 1800 Previdi, S., Filsfils, C., Lindem, A., Sreekantiah, A., 1801 and H. Gredler, "Segment Routing Prefix SID extensions for 1802 BGP", draft-ietf-idr-bgp-prefix-sid-27 (work in progress), 1803 June 2018. 1805 [I-D.ietf-nvo3-vxlan-gpe] 1806 Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol 1807 Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-06 (work 1808 in progress), April 2018. 1810 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1811 "Definition of the Differentiated Services Field (DS 1812 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1813 DOI 10.17487/RFC2474, December 1998, 1814 . 1816 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1817 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1818 DOI 10.17487/RFC2784, March 2000, 1819 . 1821 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 1822 RFC 2890, DOI 10.17487/RFC2890, September 2000, 1823 . 1825 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1826 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1827 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1828 . 1830 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1831 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1832 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1833 . 1835 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1836 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1837 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1838 . 1840 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1841 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1842 2006, . 1844 [RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching 1845 (MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic 1846 Class" Field", RFC 5462, DOI 10.17487/RFC5462, February 1847 2009, . 1849 [RFC5566] Berger, L., White, R., and E. Rosen, "BGP IPsec Tunnel 1850 Encapsulation Attribute", RFC 5566, DOI 10.17487/RFC5566, 1851 June 2009, . 1853 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1854 Encodings and Procedures for Multicast in MPLS/BGP IP 1855 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1856 . 1858 [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. 1859 Austein, "BGP Prefix Origin Validation", RFC 6811, 1860 DOI 10.17487/RFC6811, January 2013, 1861 . 1863 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1864 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1865 eXtensible Local Area Network (VXLAN): A Framework for 1866 Overlaying Virtualized Layer 2 Networks over Layer 3 1867 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1868 . 1870 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1871 "Encapsulating MPLS in UDP", RFC 7510, 1872 DOI 10.17487/RFC7510, April 2015, 1873 . 1875 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1876 Virtualization Using Generic Routing Encapsulation", 1877 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1878 . 1880 [RFC8205] Lepinski, M., Ed. and K. Sriram, Ed., "BGPsec Protocol 1881 Specification", RFC 8205, DOI 10.17487/RFC8205, September 1882 2017, . 1884 Authors' Addresses 1886 Eric C. Rosen (editor) 1887 Juniper Networks, Inc. 1888 10 Technology Park Drive 1889 Westford, Massachusetts 01886 1890 United States 1892 Email: erosen@juniper.net 1894 Keyur Patel 1895 Arrcus, Inc 1896 2077 Gateway Pl 1897 San Jose, CA 95110 1898 United States 1900 Email: keyur@arrcus.com 1902 Gunter Van de Velde 1903 Nokia 1904 Copernicuslaan 50 1905 Antwerpen 2018 1906 Belgium 1908 Email: gunter.van_de_velde@nokia.com