idnits 2.17.1 draft-rosen-idr-tunnel-encaps-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). (Using the creation date from RFC5512, updated by this document, for RFC5378 checks: 2008-01-24) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 6, 2015) is 3186 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5512 (Obsoleted by RFC 9012) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group E. Rosen, Ed. 3 Internet-Draft Juniper Networks, Inc. 4 Updates: 5512 (if approved) K. Patel 5 Intended status: Standards Track Cisco Systems 6 Expires: February 7, 2016 G. Van de Velde 7 Alcatel-Lucent 8 August 6, 2015 10 Using the BGP Tunnel Encapsulation Attribute without the BGP 11 Encapsulation SAFI 12 draft-rosen-idr-tunnel-encaps-03 14 Abstract 16 RFC 5512 defines a BGP Path Attribute known as the "Tunnel 17 Encapsulation Attribute". This attribute allows one to specify a set 18 of tunnels. For each such tunnel, the attribute can provide 19 additional information used to create a tunnel and the corresponding 20 encapsulation header, and can also provide information that aids in 21 choosing whether a particular packet is to be sent through a 22 particular tunnel. RFC 5512 states that the attribute is only 23 carried in BGP UPDATEs that have the "Encapsulation Subsequent 24 Address Family (Encapsulation SAFI)". This document updates RFC 5512 25 by deprecating the Encapsulation SAFI (which has never been used),and 26 by specifying semantics for the attribute when it is carried in 27 UPDATEs of certain other SAFIs. This document also extends the 28 attribute by enabling it to carry additional information needed to 29 create the encapsulation headers additional tunnel types not 30 mentioned in RFC 5512. Finally, this document also extends the 31 attribute by allowing it to specify a remote tunnel endpoint address 32 for each tunnel. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at http://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on February 7, 2016. 50 Copyright Notice 52 Copyright (c) 2015 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 2. Tunnel Encapsulation Attribute Sub-TLVs . . . . . . . . . . . 5 69 2.1. The Remote Endpoint Sub-TLV . . . . . . . . . . . . . . . 5 70 2.2. Encapsulation Sub-TLVs for Particular Tunnel Types . . . 8 71 2.2.1. VXLAN . . . . . . . . . . . . . . . . . . . . . . . . 8 72 2.2.2. VXLAN-GPE . . . . . . . . . . . . . . . . . . . . . . 9 73 2.2.3. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . 10 74 2.2.4. GTP . . . . . . . . . . . . . . . . . . . . . . . . . 11 75 2.2.5. MPLS-in-GRE . . . . . . . . . . . . . . . . . . . . . 12 76 2.3. Outer Encapsulation Sub-TLVs . . . . . . . . . . . . . . 13 77 2.3.1. IPv4 DS Field . . . . . . . . . . . . . . . . . . . . 13 78 2.3.2. UDP Destination Port . . . . . . . . . . . . . . . . 13 79 2.4. Embedded Label Handling Sub-TLV . . . . . . . . . . . . . 14 80 3. Tunnel Encapsulation Extended Community . . . . . . . . . . . 15 81 4. Semantics and Usage of the Tunnel Encapsulation 82 attribute . . . . . . . . . . . . . . . . . . . . . . . . . . 15 83 5. Routing Considerations . . . . . . . . . . . . . . . . . . . 18 84 5.1. No Impact on BGP Decision Process . . . . . . . . . . . . 18 85 5.2. Looping, Infinite Stacking, Etc. . . . . . . . . . . . . 19 86 6. Recursive Next Hop Resolution . . . . . . . . . . . . . . . . 19 87 7. Use of Virtual Network Identifiers and Embedded Labels 88 when Imposing a Tunnel Encapsulation . . . . . . . . . . . . 20 89 7.1. Unlabeled Address Families . . . . . . . . . . . . . . . 20 90 7.2. Labeled Address Families . . . . . . . . . . . . . . . . 21 91 7.2.1. When a Valid VNI has been Signaled . . . . . . . . . 21 92 7.2.2. When a Valid VNI has not been Signaled . . . . . . . 22 93 7.2.3. Applicability Restrictions . . . . . . . . . . . . . 22 94 8. Scoping . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 95 9. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 23 96 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 97 11. Security Considerations . . . . . . . . . . . . . . . . . . . 25 98 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 99 13. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 26 100 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 27 101 14.1. Normative References . . . . . . . . . . . . . . . . . . 27 102 14.2. Informative References . . . . . . . . . . . . . . . . . 27 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 105 1. Introduction 107 [RFC5512] defines a BGP Path Attribute known as the Tunnel 108 Encapsulation attribute. This attribute consists of one or more 109 TLVs. Each TLV identifies a particular type of tunnel. Each TLV 110 also contains one or more sub-TLVs. Some of the sub-TLVs, e.g., the 111 "Encapsulation sub-TLV", contain information that may be used to form 112 the encapsulation header for the specified tunnel type. Other sub- 113 TLVs, e.g., the "color sub-TLV" and the "protocol sub-TLV", contain 114 information that aids in determining whether particular packets 115 should be sent through the tunnel that the TLV identifies. 117 [RFC5512] only allows the Tunnel Encapsulation attribute to be 118 attached to BGP UPDATE messages that have the "Encapsulation SAFI" 119 (i.e., UPDATE messages with AFI/SAFI 1/7 or 2/7). In an UPDATE of 120 the Encapsulation SAFI, the NLRI is an address of the BGP speaker 121 originating the UPDATE. Consider the following scenario: 123 o BGP speaker R1 has received and installed UPDATE U; 125 o UPDATE U's SAFI is the Encapsulation SAFI; 127 o UPDATE U has the address R2 as its NLRI; 129 o UPDATE U has a Tunnel Encapsulation attribute. 131 o R1 has a packet, P, to transmit to destination D; 133 o R1's best path to D is a BGP route that has R2 as its next hop; 135 In this scenario, when R1 transmits packet P, it should transmit it 136 to R2 through one of the tunnels specified in U's Tunnel 137 Encapsulation attribute. The IP address of the remote endpoint of 138 each such tunnel is R2. Packet P is known as the tunnel's "payload". 140 While the ability to specify tunnel information in a BGP UPDATE is 141 useful, the procedures of [RFC5512] have certain limitations: 143 o The requirement to use the "Encapsulation SAFI" presents an 144 unfortunate operational cost, as each BGP session that may need to 145 carry tunnel encapsulation information needs to be reconfigured to 146 support the Encapsulation SAFI. The Encapsulation SAFI has never 147 been used, and this requirement has served only to discourage the 148 use of the Tunnel Encapsulation attribute. 150 o There is no way to use the Tunnel Encapsulation attribute to 151 specify the remote endpoint address of a given tunnel; [RFC5512] 152 assumes that the remote endpoint of each tunnel is specified as 153 the NLRI of an UPDATE of the Encapsulation-SAFI. 155 o If the respective best paths to two different address prefixes 156 have the same next hop, [RFC5512] does not provide a 157 straightforward method to associate each prefix with a different 158 tunnel. 160 In this document we address these deficiencies by: 162 o Deprecating the Encapsulation SAFI. 164 o Defining a new "Remote Endpoint Address sub-TLV" that can be 165 included in any of the TLVs contained in the Tunnel Encapsulation 166 attribute. This sub-TLV can be used to specify the remote 167 endpoint address of a particular tunnel. 169 o Allowing the Tunnel Encapsulation attribute to be carried by BGP 170 UPDATEs of additional AFI/SAFIs. Appropriate semantics are 171 provided for this way of using the attribute. 173 One of the sub-TLVs defined in [RFC5512] is the "Encapsulation sub- 174 TLV". For a given tunnel, the encapsulation sub-TLV specifies some 175 of the information needed to construct the encapsulation header used 176 when sending packets through that tunnel. This document defines 177 encapsulation sub-TLVs for a number of tunnel types not discussed in 178 [RFC5512]: VXLAN, VXLAN-GRE, NVGRE, GTP, and MPLS-in-GRE. MPLS-in- 179 UDP [RFC7510] is also supported, but an Encapsulation sub-TLV for it 180 is not needed. 182 Some of the encapsulations mentioned in the previous paragraph need 183 to be further encapsulated inside UDP and/or IP. [RFC5512] provides 184 no way to specify that certain information is to appear in these 185 outer IP and/or UDP encapsulations. This document provides a 186 framework for including such information in the TLVs of the Tunnel 187 Encapsulation attribute. 189 When the Tunnel Encapsulation attribute is attached to a BGP UPDATE 190 whose AFI/SAFI identifies one of the labeled address families, it is 191 not always obvious whether the label embedded in the NLRI is to 192 appear somewhere in the tunnel encapsulation header (and if so, 193 where), or whether it is to appear in the payload, or whether it can 194 be omitted altogether. This is especially true if the tunnel 195 encapsulation header itself contains a "virtual network identifier". 196 This document provides a mechanism that allows one to signal (by 197 using sub-TLVs of the Tunnel Encapsulation attribute) how one wants 198 to use the embedded label when the tunnel encapsulation has its own 199 virtual network identifier field. 201 [RFC5512] defines a Tunnel Encapsulation Extended Community, that can 202 be used instead of the Tunnel Encapsulation attribute under certain 203 circumstances. This document addresses the issue of how to handle a 204 BGP UPDATE that carries both a Tunnel Encapsulation attribute and one 205 or more Tunnel Encapsulation Extended Communities. 207 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 208 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 209 "OPTIONAL", when and only when appearing in all capital letters, are 210 to be interpreted as described in [RFC2119]. 212 2. Tunnel Encapsulation Attribute Sub-TLVs 214 [RFC5512] specifies three sub-TLVs for the Tunnel Encapsulation 215 attribute: the Encapsulation sub-TLV, the Color sub-TLV, and the 216 Protocol Type sub-TLV. In this section we specify a number of 217 additional sub-TLVs. We also specify Encapsulation sub-TLVs for a 218 number of tunnel types that are not mentioned in [RFC5512]. 220 2.1. The Remote Endpoint Sub-TLV 222 The Remote Endpoint sub-TLV is a sub-TLV whose value field contains 223 three sub-fields: 225 1. a four-octet Autonomous System (AS) number sub-field 227 2. a two-octet Address Family sub-field 229 3. an address sub-field, whose length depends upon the Address 230 Family. 232 0 1 2 3 233 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 235 | Autonomous System Number | 236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 237 | Address Family | Address ~ 238 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 239 ~ ~ 240 | | 241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 243 Figure 1: Remote Endpoint Sub-TLV Value Field 245 The Address Family subfield contains a value from IANA's "Address 246 Family Numbers" registry. In this document, we assume that the 247 Address Family is either IPv4 or IPv6; use of other address families 248 is outside the scope of this document. 250 If the Address Family subfield contains the value for IPv4, the 251 address subfield must contain an IPv4 address (a /32 IPv4 prefix). 252 In this case, the length field of Remote Endpoint sub-TLV must 253 contain the value 10 (0xa). IPv4 broadcast addresses are not valid 254 values of this field. 256 If the Address Family subfield contains the value for IPv6, the 257 address sub-field must contain an IPv6 address (a /128 IPv6 prefix). 258 In this case, the length field of Remote Endpoint sub-TLV must 259 contain the value 22 (0x16). IPv6 link local addresses are not valid 260 values of the IP address field. 262 In a given BGP UPDATE, the address family (IPv4 or IPv6) of a Remote 263 Endpoint sub-TLV is independent of the address family of the UPDATE 264 itself. For example, an UPDATE whose NLRI is an IPv4 address may 265 have a Tunnel Encapsulation attribute containing Remote Endpoint sub- 266 TLVs that contain IPv6 addresses. Also, different tunnels 267 represented in the Tunnel Encapsulation attribute may have Remote 268 Endpoints of different address families. 270 A two-octet AS number can be carried in the AS number field by 271 setting the two high order octets to zero, and carrying the number in 272 the two low order octets of the field. 274 The AS number in the sub-TLV MUST be the number of the AS to which 275 the IP address in the sub-TLV belongs. 277 There is one special case: the Remote Endpoint sub-TLV MAY have a 278 value field whose Address Family subfield contains 0. This means 279 that the tunnel's remote endpoint is the UPDATE's BGP next hop. If 280 the Address Family subfield contains 0, the Address subfield is 281 omitted, and the Autonomous System number field is set to 0. 283 If any of the following conditions hold, the Remote Endpoint sub-TLV 284 is considered to be "malformed": 286 o The sub-TLV contains the value for IPv4 in its Address Family 287 subfield, but the length of the sub-TLV's value field is other 288 than 10 (0xa). 290 o The sub-TLV contains the value for IPv6 in its Address Family 291 subfield, but the length of the sub-TLV's value field is other 292 than 22 (0x16). 294 o The sub-TLV contains the value zero in its Address Family field, 295 but the length of the sub-TLV's value field is other than 6, or 296 the Autonomous System subfield is not set to zero. 298 o The IP address in the sub-TLV's address subfield is not a valid IP 299 address (e.g., it's an IPv4 broadcast address). 301 o It can be determined that the IP address in the sub-TLV's address 302 subfield does not belong to the non-zero AS whose number is in the 303 its Autonomous System subfield. (See section Section 11 for 304 discussion of one way to determine this.) 306 If the Remote Endpoint sub-TLV is malformed, the TLV containing it is 307 also considered to be malformed, and the entire TLV MUST be ignored. 308 However, the Tunnel Encapsulation attribute SHOULD NOT be considered 309 to be malformed in this case; other TLVs in the attribute SHOULD be 310 processed (if they can be parsed correctly). 312 When redistributing a route that is carrying a Tunnel Encapsulation 313 attribute containing a TLV that itself contains a malformed Remote 314 Endpoint sub-TLV, the TLV SHOULD be removed from the attribute before 315 redistribution. 317 See Section 9 for further discussion of how to handle errors that are 318 encountered when parsing the Tunnel Encapsulation attribute. 320 If the Remote Endpoint sub-TLV contains an IPv4 or IPv6 address that 321 is valid but not reachable, the sub-TLV is NOT considered to be 322 malformed, and the containing TLV SHOULD NOT be removed from the 323 attribute before redistribution. However, the tunnel identified by 324 the TLV containing that sub-TLV cannot be used until such time as the 325 address becomes reachable. See Section 4. 327 2.2. Encapsulation Sub-TLVs for Particular Tunnel Types 329 Tunnel Encapsulation sub-TLVs for the following tunnel types are 330 defined in [RFC5512]: L2TPv3, and GRE. 332 This section defines Tunnel Encapsulation sub-TLVs for the following 333 tunnel types: VXLAN ([RFC7348]), VXLAN-GPE ([VXLAN-GPE]), NVGRE 334 ([NVGRE]), GTP [GTP-U], and MPLS-in-GRE ([RFC2784], [RFC2890], 335 [RFC4023]). 337 Rules for forming the encapsulation based on the information in a 338 given TLV are given in Section 7. 340 2.2.1. VXLAN 342 This document defines an encapsulation sub-TLV for VXLAN tunnels. 343 When the tunnel type is VXLAN, the following is the structure of the 344 value field in the encapsulation sub-TLV: 346 0 1 2 3 347 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 348 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 349 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 351 | MAC Address (4 Octets) | 352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 353 | MAC Address (2 Octets) | Reserved | 354 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 356 Figure 2: VXLAN Encapsulation Sub-TLV 358 V: This bit is set to 1 to indicate that a valid VN-ID is present 359 in the encapsulation sub-TLV. 361 M: This bit is set to 1 to indicate that a valid MAC Address is 362 present in the encapsulation sub-TLV. 364 R: The remaining bits in the 8-bit flags field are reserved for 365 further use. They SHOULD always be set to 0. 367 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 368 ID value. If the V bit is not set, the VN-id field SHOULD be set 369 to zero. 371 MAC Address: If the M bit is set, this field contains a 6 octet 372 Ethernet MAC address. If the M bit is not set, this field SHOULD 373 be set to all zeroes. 375 Note that, strictly speaking, VXLAN tunnels only carry ethernet 376 frames. To send an IP packet or an MPLS packet through a VXLAN 377 tunnel, it is necessary to form an IP-in-ethernet-in-VXLAN or an 378 MPLS-in-ethernet-in-VXLAN tunnel. 380 When forming the VXLAN encapsulation header: 382 o The values of the V, M, and R bits are NOT copied into the flags 383 field of the VXLAN header. The flags field of the VXLAN header is 384 set as per [RFC7348]. 386 o If the M bit is set, the MAC Address is copied into the Inner 387 Destination MAC Address field of the Inner Ethernet Header (see 388 section 5 of [RFC7348]. 390 If the M bit is not set, and the payload being sent through the 391 VXLAN tunnel is an ethernet frame, the Destination MAC Address 392 field of the Inner Ethernet Header is just the Destination MAC 393 Address field of the payload's ethernet header. 395 If the M bit is not set, and the payload being sent through the 396 VXLAN tunnel is an IP or MPLS packet, the Inner Destination MAC 397 address field is set to a configured value; if there is no 398 configured value, the VXLAN tunnel cannot be used. 400 o See Section 7 to see how the VNI field of the VXLAN encapsulation 401 header is set. 403 2.2.2. VXLAN-GPE 405 This document defines an encapsulation sub-TLV for VXLAN tunnels. 406 When the tunnel type is VXLAN-GPE, the following is the structure of 407 the value field in the encapsulation sub-TLV: 409 0 1 2 3 410 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 411 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 412 |Ver|V|R|R|R|R|R| Reserved | 413 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 414 | VN-ID | Reserved | 415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 417 Figure 3: VXLAN GPE Encapsulation Sub-TLV 419 V: This bit is set to 1 to indicate that a valid VN-ID is present 420 in the encapsulation sub-TLV. 422 R: The bits designated "R" above are reserved for future use. 423 They SHOULD always be set to zero. 425 Version (Ver): Indicates VXLAN GPE protocol version. If the 426 indicated version is not supported, the TLV that contains this 427 Encapsulation sub-TLV MUST be treated as specifying an unsupported 428 tunnel type. The value of this field will be copied into the 429 corresponding field of the VXLAN encapsulation header. 431 VN-ID: If the V bit is set, this field contains a 3 octet VN-ID 432 value. If the V bit is not set, this field SHOULD be set to zero. 434 When forming the VXLAN-GPE encapsulation header: 436 o The values of the V and R bits are NOT copied into the flags field 437 of the VXLAN-GPE header. However, the values of the Ver bits are 438 copied into the VXLAN-GPE header. Other bits in the flags field 439 of the VXLAN-GPE header are set as per [VXLAN-GPE]. 441 o See Section 7 to see how the VNI field of the VXLAN-GPE 442 encapsulation header is set. 444 2.2.3. NVGRE 446 This document defines an encapsulation sub-TLV for NVGRE tunnels. 447 When the tunnel type is NVGRE, the following is the structure of the 448 value field in the encapsulation sub-TLV: 450 0 1 2 3 451 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 453 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 455 | MAC Address (4 Octets) | 456 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 457 | MAC Address (2 Octets) | Reserved | 458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 460 Figure 4: NVGRE Encapsulation Sub-TLV 462 V: This bit is set to 1 to indicate that a valid VN-ID is present 463 in the encapsulation sub-TLV. 465 M: This bit is set to 1 to indicate that a valid MAC Address is 466 present in the encapsulation sub-TLV. 468 R: The remaining bits in the 8-bit flags field are reserved for 469 further use. They SHOULD always be set to 0. 471 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 472 ID value. If the V bit is not set, the VN-id field SHOULD be set 473 to zero. 475 MAC Address: If the M bit is set, this field contains a 6 octet 476 Ethernet MAC address. If the M bit is not set, this field SHOULD 477 be set to all zeroes. 479 When forming the NVGRE encapsulation header: 481 o The values of the V, M, and R bits are NOT copied into the flags 482 field of the NVGRE header. The flags field of the VXLAN header is 483 set as per [NVGRE]. 485 o If the M bit is set, the MAC Address is copied into the Inner 486 Destination MAC Address field of the Inner Ethernet Header (see 487 section 3.2 of [NVGRE]. 489 If the M bit is not set, and the payload being sent through the 490 NVGRE tunnel is an ethernet frame, the Destination MAC Address 491 field of the Inner Ethernet Header is just the Destination MAC 492 Address field of the payload's ethernet header. 494 If the M bit is not set, and the payload being sent through the 495 NVGRE tunnel is an IP or MPLS packet, the Inner Destination MAC 496 address field is set to a configured value; if there is no 497 configured value, the NVGRE tunnel cannot be used. 499 o See Section 7 to see how the VSID field of the NVGRE encapsulation 500 header is set. 502 2.2.4. GTP 504 When the tunnel type is GTP [GTP-U], the Encapsulation sub-TLV 505 contains information needed to send data packets through a GTP 506 tunnel, and also contains information needed by the tunnel's remote 507 endpoint to create a "reverse" tunnel back to the transmitter. This 508 allows a bidirectional control connection to be created. The format 509 of the Encapsulation Sub-TLV is: 511 0 1 2 3 512 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 513 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 514 | Remote TEID (4 Octets) | 515 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 516 | Local TEID (4 Octets) | 517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 518 | Local Endpoint Address (4/16 Octets (IPv4/IPv6)) | 519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 521 Figure 5: GTP Encapsulation Sub-TLV 523 Remote TEID: Contains the 32-bit Tunnel Endpoint Identifier of the 524 GTP tunnel through which data packets are to be sent. When data 525 packets are sent through the tunnel, the Remote TEID is carried in 526 the GTP encapsulation header. The GTP header is itself 527 encapsulation within an IP header, whose IP destination address 528 field is set to the value of the Remote Endpoint sub-TLV. 530 Local TEID: Contains a 32-bit Tunnel Endpoint Identifier of a GTP 531 tunnel assigned by EPC ([vEPC]). 533 Local Endpoint Address: Contains an IPv4 or IPv6 anycast address. 534 This is used, along with the Local TEID, to set up a tunnel in the 535 reverse direction. See [vEPC] for details. 537 2.2.5. MPLS-in-GRE 539 When the tunnel type is MPLS-in-GRE, the following is the structure 540 of the value field in an optional encapsulation sub-TLV: 542 0 1 2 3 543 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | GRE-Key (4 Octets) | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 Figure 6: MPLS-in-GRE Encapsulation Sub-TLV 550 GRE-Key: 4-octet field [RFC2890] that is generated by the 551 advertising router. The actual method by which the key is 552 obtained is beyond the scope of this document. The key is 553 inserted into the GRE encapsulation header of the payload packets 554 sent by ingress routers to the advertising router. It is intended 555 to be used for identifying extra context information about the 556 received payload. Note that the key is optional. Unless a key 557 value is being advertised, the MPLS-in-GRE encapsulation sub-TLV 558 MUST NOT be present. 560 Note that the GRE tunnel type defined in [RFC5512] can be used 561 instead of the MPLS-in-GRE tunnel type when it is necessary to 562 encapsulate MPLS in GRE. Including a TLV of the MPLS-in-GRE tunnel 563 type is equivalent to including a TLV of the GRE tunnel type that 564 also includes a Protocol Type sub-TLV ([RFC5512]) specifying MPLS as 565 the protocol to be encapsulated. That is, if a TLV specifies MPLS- 566 in-GRE or if it includes a Protocol Type sub-TLV specifying MPLS, the 567 GRE tunnel advertised in that TLV MUST NOT be used for carrying IP 568 packets. 570 2.3. Outer Encapsulation Sub-TLVs 572 The Encapsulation sub-TLV for a particular tunnel type allows one to 573 specify the values that are to be placed in certain fields of the 574 encapsulation header for that tunnel type. However, some tunnel 575 types require an outer IP encapsulation, and some also require an 576 outer UDP encapsulation. The Encapsulation sub-TLV for a given 577 tunnel type does not usually provide a way to specify values for 578 fields of the outer IP and/or UDP encapsulations. If it is necessary 579 to specify values for fields of the outer encapsulation, additional 580 sub-TLVs must be used. This document defines two such sub-TLVs. 582 If an outer encapsulation sub-TLV occurs in a TLV for a tunnel type 583 that does not use the corresponding outer encapsulation, the sub-TLV 584 as if it were an unknown type of sub-TLV. 586 2.3.1. IPv4 DS Field 588 Most of the tunnel types that can be specified in the Tunnel 589 Encapsulation attribute require an outer IP encapsulation. The IPv4 590 DS Field sub-TLV can be carried in the TLV of any such tunnel type. 591 It specifies the setting of one-octet Differentiated Services field 592 in the outer IP encapsulation (see [RFC2474]). The value field is 593 always a single octet. 595 2.3.2. UDP Destination Port 597 Some of the tunnel types that can be specified in the Tunnel 598 Encapsulation attribute require an outer UDP encapsulation. 599 Generally there is a standard UDP Destination Port value for a 600 particular tunnel type. However, sometimes it is useful to be able 601 to use a non-standard UDP destination port. If a particular tunnel 602 type requires an outer UDP encapsulation, and it is desired to use a 603 UDP destination port other than the standard one, the port to be used 604 can be specified by including a UDP Destination Port sub-TLV. The 605 value field of this sub-TLV is always a two-octet field, containing 606 the port value. 608 2.4. Embedded Label Handling Sub-TLV 610 Certain BGP address families (corresponding to particular AFI/SAFI 611 pairs, e.g., 1/4, 2/4, 1/128, 2/128) have MPLS labels embedded in 612 their NLRIs. We will use the term "embedded label" to refer to the 613 MPLS label that is embedded in an NLRI, and the term "labeled address 614 family" to refer to any AFI/SAFI that has embedded labels. 616 Some of the tunnel types (e.g., VXLAN, VXLAN-GPE, and NVGRE) that can 617 be specified in the Tunnel Encapsulation attribute have an 618 encapsulation header containing "Virtual Network" identifier of some 619 sort. The Encapsulation sub-TLVs for these tunnel types may 620 optionally specify a value for the virtual network identifier. 622 Suppose a Tunnel Encapsulation attribute is attached to an UPDATE of 623 an embedded address family, and it is decided to use a particular 624 tunnel (specified in one of the attribute's TLVs) for transmitting a 625 packet that is being forwarded according to that UPDATE. When 626 forming the encapsulation header for that packet, different 627 deployment scenarios require different handling of the embedded label 628 and/or the virtual network identifier. The Embedded Label Handling 629 sub-TLV can be used to control the placement of the embedded label 630 and/or the virtual network identifier in the encapsulation. 632 The Embedded Label Handling sub-TLV may be included in any TLV of the 633 Tunnel Encapsulation attribute. If the Tunnel Encapsulation 634 attribute is attached to an UPDATE of a non-labeled address family, 635 the sub-TLV is treated as a no-op. If the sub-TLV is contained in a 636 TLV whose tunnel type does not have a virtual network identifier in 637 its encapsulation header, the sub-TLV is treated as a no-op. 639 The sub-TLV's Length field always contains the value 1, and its value 640 field consists of a single octet. The following values are defined: 642 1: The payload will be an MPLS packet with the embedded label at the 643 top of its label stack. 645 2: The embedded label is not carried in the payload, but is carried 646 either in the virtual network identifier field of the 647 encapsulation header, or else is ignored entirely. 649 Please see Section 7 for the details of how this sub-TLV is used when 650 it is carried by an UPDATE of a labeled address family. 652 If the Embedded Label sub-TLV is carried by an UPDATE of a non- 653 labeled address family, it is treated as a no-op. However, it SHOULD 654 NOT be stripped from the TLV before the UPDATE is forwarded. 656 3. Tunnel Encapsulation Extended Community 658 [RFC5512] defines an Encapsulation Extended Community. This Extended 659 Community may be attached to a route any AFI/SAFI to which the Tunnel 660 Encapsulation attribute may be attached. Each such Extended 661 Community identifies a particular tunnel type. If the Encapsulation 662 Extended Community identifies a particular tunnel type, its semantics 663 are exactly equivalent to the semantics of a Tunnel Encapsulation 664 attribute TLV that: 666 o identifies the same tunnel type, and 668 o has a Remote Endpoint sub-TLV whose IP address field contains the 669 address of the BGP next hop of the route to which it is attached, 670 and 672 o has no other sub-TLVs. 674 In the remainder of this specification, when we speak of a route as 675 containing a Tunnel Encapsulation attribute with a TLV identifying a 676 particular tunnel type, we are implicitly including the case where 677 the route contains a Tunnel Encapsulation Extended Community 678 identifying that tunnel type. 680 [EVPN-Inter-Subnet] defines a Router's MAC Extended Community. This 681 Extended Community provides information that may conflict with 682 information in one or more of the Encapsulation Sub-TLVs of a Tunnel 683 Encapsulation attribute. In case of such a conflict, the information 684 in the Encapsulation Sub-TLV takes precedence. 686 4. Semantics and Usage of the Tunnel Encapsulation attribute 688 [RFC5512] specifies the use of the Tunnel Encapsulation attribute in 689 BGP UPDATE messages of AFI/SAFI 1/7 and 2/7. That document restricts 690 the use of this attribute to UPDATE messsages of those SAFIs. This 691 document removes that restriction. 693 The BGP Tunnel Encapsulation attribute MAY be carried in any BGP 694 UPDATE message whose AFI/SAFI is 1/1 (IPv4 Unicast), 2/1 (IPv6 695 Unicast), 1/4 (IPv4 Labeled Unicast), 2/4 (IPv6 Labeled Unicast), 696 1/128 (VPN-IPv4 Labeled Unicast), 2/128 (VPN-IPv6 Labeled Unicast), 697 or 25/70 (EVPN). Use of the Tunnel Encapsulation attribute in BGP 698 UPDATE messages of other AFI/SAFIs is outside the scope of this 699 document. 701 The decision to attach a Tunnel Encapsulation attribute to a given 702 BGP UPDATE is determined by policy. The set of TLVs and sub-TLVs 703 contained in the attribute is also determined by policy. 705 When the Tunnel Encapsulation attribute is carried in an UPDATE of 706 one of the AFI/SAFIs specifies in the previous paragraph, each TLV 707 MUST have a Remote Endpoint sub-TLV. If a TLV that does not have a 708 Remote Endpoint sub-TLV, that TLV should be treated as if it had a 709 malformed Remote Endpoint sub-TLV (see Section 2.1). 711 Suppose that: 713 o a given packet P must be forwarded by router R; 715 o the path along which P is to be forwarded is determined by BGP 716 UPDATE U; 718 o UPDATE U has a Tunnel Encapsulation attribute, containing at least 719 one TLV that identifies a "feasible tunnel" for packet P. A 720 tunnel is considered feasible if it has the following two 721 properties: 723 * The tunnel type is supported (i.e., router R knows how to set 724 up tunnels of that type, how to create the encapsulation header 725 for tunnels of that type, etc.) 727 * The tunnel is of a type that can be used to carry packet P 728 (e.g., an MPLS-in-UDP tunnel would not be a feasible tunnel for 729 carrying an IP packet, UNLESS the IP packet can first be 730 converted to an MPLS packet). 732 * The tunnel is specified in a TLV whose Remote Endpoint sub-TLV 733 identifies an IP address that is reachable. 735 Then router R SHOULD send packet P through one of the feasible 736 tunnels identified in the Tunnel Encapsulation attribute of UPDATE U. 738 If the Tunnel Encapsulation attribute contains several TLVs (i.e., if 739 it specifies several tunnels), router R may choose any one of those 740 tunnels, based upon local policy. If any of tunnels' TLVs contain 741 the Color sub-TLV and/or the Protocol Type sub-TLV defined in 742 [RFC5512], the choice of tunnel may be influenced by these sub-TLVs. 744 Note that if none of the TLVs specifies the MPLS tunnel type, a Label 745 Switched Path SHOULD NOT be used. 747 If a particular tunnel is not feasible at some moment because its 748 Remote Endpoint cannot be reached at that moment, the tunnel may 749 become feasible at a later time. When this happens, router R SHOULD 750 reconsider its choice of tunnel to use, and MAY choose to now use the 751 tunnel. 753 A TLV specifying a non-feasible tunnel is not considered to be 754 malformed or erroneous in any way, and the TLV SHOULD NOT be stripped 755 from the Tunnel Encapsulation attribute before redistribution. 757 In addition to the sub-TLVs already defined, additional sub-TLVs may 758 be defined that affect the choice of tunnel to be used, or that 759 affect the contents of the tunnel encapsulation header. The 760 documents that define any such additional sub-TLVs must specify the 761 effect that including the sub-TLV is to have. 763 If it is determined to send a packet through the tunnel specified in 764 a particular TLV of a particular Tunnel Encapsulation attribute, and 765 if that TLV contains a Remote Endpoint sub-TLV, then the tunnel's 766 remote endpoint address is the IP address contained in the sub-TLV. 767 If the TLV does not contain a Remote Endpoint sub-TLV, or if it 768 contains a Remote Endpoint sub-TLV whose value field is all zeroes, 769 then the tunnel's remote endpoint is the IP address specified as the 770 Next Hop of the BGP Update containing the Tunnel Encapsulation 771 attribute. 773 The procedure for sending a packet through a particular tunnel type 774 to a particular remote endpoint depends upon the tunnel type, and is 775 outside the scope of this document. The contents of the tunnel 776 encapsulation header MAY be influenced by the Encapsulation sub-TLV. 778 Note that some tunnel types may require the execution of an explicit 779 tunnel setup protocol before they can be used for carrying data. 780 Other tunnel types may not require any tunnel setup protocol. 781 Whenever a new Tunnel Type TLV is defined, the specification of that 782 TLV must describe (or reference) the procedures for creating the 783 encapsulation header used to forward packets through that tunnel 784 type. 786 If a Tunnel Encapsulation attribute specifies several tunnels, the 787 way in which a router chooses which one to use is a matter of policy, 788 subject to the following constraint: if a router can determine that a 789 given tunnel is not functional, it MUST NOT use that tunnel. In 790 particular, if the tunnel is identified in a TLV that has a Remote 791 Endpoint sub-TLV, and if the IP address specified in the sub-TLV is 792 not reachable from router R, then the tunnel SHOULD be considered 793 non-functional. Other means of determining whether a given tunnel is 794 functional MAY be used; specification of such means is outside the 795 scope of this specification. Of course, if a non-functional tunnel 796 later becomes functional, router R SHOULD reevaluate its choice of 797 tunnels. 799 If router R determines that it cannot use any of the tunnels 800 specified in the Tunnel Encapsulation attribute, it MAY either drop 801 packet P, or it MAY transmit packet P as it would had the Tunnel 802 Encapsulation attribute not been present. This is a matter of local 803 policy. By default, the packet SHOULD be transmitted as if the 804 Tunnel Encapsulation attribute had not been present. 806 A Tunnel Encapsulation attribute may contain several TLVs that all 807 specify the same tunnel type. Each TLV should be considered as 808 specifying a different tunnel. Two tunnels of the same type may have 809 different Remote Endpoint sub-TLVs, different Encapsulation sub-TLVs, 810 etc. Choosing between two such tunnels is a matter of local policy. 812 Once router R has decided to send packet P through a particular 813 tunnel, it encapsulates packet P appropriately and then forwards it 814 according to the route that leads to the tunnel's remote endpoint. 815 This route may itself be a BGP route with a Tunnel Encapsulation 816 attribute. If so, the encapsulated packet is treated as the payload 817 and is encapsulated according to the Tunnel Encapsulation attribute 818 of that route. That is, tunnels may be "stacked". 820 5. Routing Considerations 822 5.1. No Impact on BGP Decision Process 824 The presence of the Tunnel Encapsulation attribute does not affect 825 the BGP bestpath selection algorithm. 827 Under certain circumstances, this may need to counter-intuitive 828 consequences. For example, suppose: 830 o router R1 receives a BGP UPDATE message from router R2, such that 832 * the NLRI of that UPDATE is prefix X, 834 * the UPDATE contains a Tunnel Encapsulation attribute specifying 835 two tunnels, T1 and T2, 837 * R1 cannot use tunnel T1 or tunnel T2, either because the tunnel 838 remote endpoint is not reachable or because R1 does not support 839 that kind of tunnel 841 o router R1 receives a BGP UPDATE message from router R3, such that 843 * the NLRI of that UPDATE is prefix X, 845 * the UPDATE contains a Tunnel Encapsulation attribute specifying 846 two tunnels, T3 and T4, 848 * R1 can use at least one of the two tunnels 850 Since the Tunnel Encapsulation attribute does not affect bestpath 851 selection, R1 may well install the route from R2 rather than the 852 route from R3, even though R2's route contains no usable tunnels. 854 This possibility must be kept in mind whenever a Remote Endpoint sub- 855 TLV carried by a given UPDATE specifies an IP address that is 856 different than the next hop of that UPDATE. 858 5.2. Looping, Infinite Stacking, Etc. 860 Consider a packet destined for address X. Suppose a BGP UPDATE for 861 address prefix X carries a Tunnel Encapsulation attribute that 862 specifies a remote tunnel endpoint of Y. And suppose that a BGP 863 UPDATE for address prefix Y carries a Tunnel Encapsulation attribute 864 that specifies a Remote Endpoint of X. It is easy to see that this 865 will cause an infinite number of encapsulation headers to be put on 866 the given packet. 868 This could happen as a result of misconfiguration, either accidental 869 or intentional. It could also happen if the Tunnel Encapsulation 870 attribute were altered by a malicious agent. Implementations should 871 be aware of this. 873 Improper setting (or malicious altering) of the Tunnel Encapsulation 874 attribute could also cause data packets to loop. Suppose a BGP 875 UPDATE for address prefix X carries a Tunnel Encapsulation attribute 876 that specifies a remote tunnel endpoint of Y. Suppose router R 877 receives and processes the update. When router R receives a packet 878 destined for X, it will apply the encapsulation and send the 879 encapsulated packet to Y. Y will decapsulate the packet and forward 880 it further. If Y is further away from X than is router R, it is 881 possible that the path from Y to X will traverse R. This would cause 882 a long-lasting routing loop. 884 These possibilities must also be kept in mind whenever the Remote 885 Endpoint for a given prefix differs from the BGP next hop for that 886 prefix. 888 6. Recursive Next Hop Resolution 890 Suppose that: 892 o a given packet P must be forwarded by router R1; 894 o the path along which P is to be forwarded is determined by BGP 895 UPDATE U1; 897 o UPDATE U1 does not have a Tunnel Encapsulation attribute; 898 o the next hop of UPDATE U1 is router R2; 900 o the best path to router R2 is a BGP route that was advertised in 901 UPDATE U2; 903 o UPDATE U2 has a Tunnel Encapsulation attribute. 905 Then packet P SHOULD be sent through one of the tunnels identified in 906 the Tunnel Encapsulation attribute of UPDATE U2. See Section 4 for 907 further details. 909 Note that if UPDATE U1 and UPDATE U2 both have Tunnel Encapsulation 910 attributes, packet P will be carried through a pair of nested 911 tunnels. P will first be encapsulated based on the Tunnel 912 Encapsulation attribute of U1. This encapsulated packet then becomes 913 the payload, and is encapsulated based on the Tunnel Encapsulation 914 attribute of U2. This is another way of "stacking" tunnels (see also 915 Section 4. 917 7. Use of Virtual Network Identifiers and Embedded Labels when Imposing 918 a Tunnel Encapsulation 920 Three of the tunnel types that can be specified in a Tunnel 921 Encapsulation TLV have virtual network identifier fields in their 922 encapsulation headers. In the VXLAN and VXLAN-GPE encapsulations, 923 this field is called the VNI field; in the NVGRE encapsulation, this 924 field is called the VSID field. 926 When one of these tunnel encapsulations is imposed on a packet, the 927 setting of the virtual network identifier field in the encapsulation 928 header depends upon the contents of the Encapsulation sub-TLV (if one 929 is present). When the Tunnel Encapsulation attribute is being 930 carried on a BGP UPDATE of a labeled address family, the setting of 931 the virtual network identifier field also depends upon the contents 932 of the Embedded Label Handling sub-TLV (if present). 934 This section specifies the procedures for choosing the value to set 935 in the virtual network identifier field of the encapsulation header. 936 These procedures apply only when the tunnel type is VXLAN, VXLAN-GPE, 937 or NVGRE. 939 7.1. Unlabeled Address Families 941 This sub-section applies when: 943 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of 944 an unlabeled address family, and 946 o at least one of the attribute's TLVs identifies a tunnel type that 947 uses a virtual network identifier, and 949 o it has been determined to send a packet through one of those 950 tunnels. 952 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 953 whose V bit is set, the virtual network identifier field of the 954 encapsulation header is set to the value of the virtual network 955 identifier field of the Encapsulation sub-TLV. 957 Otherwise, the virtual network identifier field of the encapsulation 958 header is set to a configured value; if there is no configured value, 959 the tunnel cannot be used. 961 7.2. Labeled Address Families 963 This sub-section applies when: 965 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of a 966 labeled address family, and 968 o at least one of the attribute's TLVs identifies a tunnel type that 969 uses a virtual network identifier, and 971 o it has been determined to send a packet through one of those 972 tunnels. 974 7.2.1. When a Valid VNI has been Signaled 976 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 977 whose V bit is set, the virtual network identifier field of the 978 encapsulation header is set as follows: 980 o If the TLV contains an Embedded Label Handling sub-TLV whose value 981 is 1, then the virtual network identifier field of the 982 encapsulation header is set to the value of the virtual network 983 identifier field of the Encapsulation sub-TLV. 985 The embedded label (from the NLRI of the route that is carrying 986 the Tunnel Encapsulation attribute) appears at the top of the MPLS 987 label stack in the encapsulation payload. 989 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 990 if contains an Embedded Label Handling sub-TLV whose value is 2, 991 the embedded label is ignored entirely, and the virtual network 992 identifier field of the encapsulation header is set to the value 993 of the virtual network identifier field of the Encapsulation sub- 994 TLV. 996 7.2.2. When a Valid VNI has not been Signaled 998 If the TLV identifying the tunnel does not contain an Encapsulation 999 sub-TLV whose V bit is set, the virtual network identifier field of 1000 the encapsulation header is set as follows: 1002 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1003 is 1, then the virtual network identifier field of the 1004 encapsulation header is set to a configured value. 1006 If there is no configured value, the tunnel cannot be used. 1008 The embedded label (from the NLRI of the route that is carrying 1009 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1010 label stack in the encapsulation payload. 1012 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1013 if it contains an Embedded Label Handling sub-TLV whose value is 1014 2, the embedded label is copied into the virtual network 1015 identifier field of the encapsulation header. 1017 The embedded label does not appear in the MPLS label stack of the 1018 payload. 1020 7.2.3. Applicability Restrictions 1022 In a given UPDATE of a labeled address family, the label embedded in 1023 the NLRI is generally a label that is meaningful only to the router 1024 whose address appears as the next hop. Certain of the procedures of 1025 Section 7.2.1 or Section 7.2.2 cause the embedded label to be carried 1026 by a data packet to the router whose address appears in the Remote 1027 Endpoint sub-TLV. If the Remote Endpoint sub-TLV does not identify 1028 the same router that is the next hop, sending the packet through the 1029 tunnel may cause the label to be misinterpreted at the tunnel's 1030 remote endpoint. This may cause misdelivery of the packet. 1032 Therefore the embedded label MUST NOT be carried by a data packet 1033 traveling through a tunnel unless it is known that the label will be 1034 properly interpreted at the tunnel's remote endpoint. How this is 1035 known is outside the scope of this document. 1037 Note that if the Tunnel Encapsulation attribute is attached to a VPN- 1038 IP route [RFC4364], and if Inter-AS "option b" (see section 10 of 1039 [RFC4364] is being used, and if the Remote Endpoint sub-TLV contains 1040 an IP address that is not in same AS as the router receiving the 1041 route, it is very likely that the embedded label has been changed. 1042 Therefore use of the Tunnel Encapsulation attribute in an "Inter-AS 1043 option b" scenario is not supported. 1045 8. Scoping 1047 The Tunnel Encapsulation attribute is defined as a transitive 1048 attribute, so that it may be passed along by BGP speakers that do not 1049 recognize it. However, it is intended that the Tunnel Encapsulation 1050 attribute be used only within a well-defined scope, e.g., within a 1051 set of Autonomous Systems that belong to a single administrative 1052 entity. If the attribute is distributed beyond its intended scope, 1053 packets may be sent through tunnels in a manner that is not intended. 1055 To prevent the Tunnel Encapsulation attribute from being distributed 1056 beyond its intended scope, any BGP speaker that understands the 1057 attribute MUST be able to filter the attribute from incoming BGP 1058 UPDATE messages. When the attribute is filtered from an incoming 1059 UPDATE, the attribute is neither processed nor redistributed. This 1060 filtering SHOULD be possible on a per-BGP-session basis. For each 1061 session, filtering of the attribute on incoming UPDATEs MUST be 1062 enabled by default. 1064 In addition, any BGP speaker that understands the attribute MUST be 1065 able to filter the attribute from outgoing BGP UPDATE messages. This 1066 filtering SHOULD be possible on a per-BGP-session basis. For each 1067 session, filtering of the attribute on outgoing UPDATEs MUST be 1068 enabled by default. 1070 9. Error Handling 1072 The Tunnel Encapsulation attribute is a sequence of TLVs, each of 1073 which is a sequence of sub-TLVs. The final octet of a TLV is 1074 determined by its length field. Similarly, the final octet of a sub- 1075 TLV is determined by its length field. The final octet of a TLV must 1076 also be the final octet of its final sub-TLV. If this is not the 1077 case, the TLV MUST be considered malformed. A TLV that is found to 1078 be malformed for this reason MUST NOT be processed, and MUST be 1079 stripped from the Tunnel Encapsulation attribute before 1080 redistribution. Subsequent TLVs in the Tunnel Encapsulation 1081 attribute may still be valid, in which case they MUST be processed 1082 and redistributed normally. 1084 If a Tunnel Encapsulation attribute does not have any valid TLVs, or 1085 it does not have the transitive bit set, the "Attribute Discard" 1086 procedure of [ERRORS] is applied. 1088 If a Tunnel Encapsulation attribute can be parsed correctly, but 1089 contains a TLV that is not recognized (i.e., the tunnel type is not 1090 recognized) by a particular BGP speaker, the attribute is NOT 1091 considered to be malformed. The unrecognized TLV MUST be ignored, 1092 and the BGP speaker MUST interpret the attribute as if the 1093 unrecognized TLV had not been present. If the route carrying the 1094 Tunnel Encapsulation attribute is redistributed with the attribute, 1095 the unrecognized TLV SHOULD remain in the attribute. 1097 If a TLV of a Tunnel Encapsulation attribute contains a sub-TLV that 1098 is not recognized by a particular BGP speaker, the BGP speaker SHOULD 1099 process that TLV as if the unrecognized sub-TLV had not been present. 1100 If the route carrying the Tunnel Encapsulation attribute is 1101 redistributed with the attribute, the unrecognized TLV SHOULD remain 1102 in the attribute. 1104 In general, if a TLV contains a sub-TLV that is malformed (e.g., 1105 contains a length field whose value is not legal for that sub-TLV), 1106 the sub-TLV should be treated as if it were an unrecognized sub-TLV. 1107 This document specifies one exception to this rule -- if a TLV 1108 contains a malformed Remote Endpoint sub-TLV (as defined in 1109 Section 2.1, the entire TLV MUST be ignored, and SHOULD be removed 1110 from the Tunnel Encapsulation attribute before the route carrying 1111 that attribute is redistributed. 1113 A TLV that does not contain the Remote Endpoint sub-TLV MUST be 1114 treated as if it contained a malformed Remote Endpoint sub-TLV. 1116 A TLV identifying a particular tunnel type may contain a sub-TLV that 1117 is meaningless for that tunnel type. For example, perhaps the TLV 1118 contains a "UDP Destination Port" sub-TLV, but the identified tunnel 1119 type does not use UDP encapsulation at all. Sub-TLVs of this sort 1120 SHOULD be treated as no-ops. That is, they SHOULD NOT affect the 1121 creation of the encapsulation header. However, the sub-TLV MUST NOT 1122 be considered to be malformed, and MUST NOT be removed from the TLV 1123 before the route carrying the Tunnel Encapsulation attribute is 1124 redistributed. 1126 There is no significance to the order in which the TLVs occur within 1127 the Tunnel Encapsulation attribute. Multiple TLVs may occur for a 1128 given tunnel type; each such TLV is regarded as describing a 1129 different tunnel. 1131 10. IANA Considerations 1133 IANA is requested to modify the "Subsequent Address Family 1134 Identifiers" registry to indicate that the Encapsulation SAFI is 1135 deprecated. This document should be the reference. 1137 IANA is requested to change the registration policy of the "BGP 1138 Tunnel Encapsulation Attribute Sub-TLVs" registry to the following: 1140 o The values 0 and 255 are reserved. 1142 o The values in the range 1-127 are to be allocated using the 1143 "Standards Action" registration procedure. 1145 o The values in the range 128-251 are to be allocated using the 1146 "First Come, First Served" registration procedure. 1148 o The values in the range 252-254 are reserved for experimental use; 1149 IANA shall not allocate values from this range. 1151 IANA is requested to assign a codepoint from the "BGP Tunnel 1152 Encapsulation Attribute Sub-TLVs" registry for "Remote Endpoint", 1153 with this document being the reference. 1155 IANA is requested to assign a codepoint from the "BGP Tunnel 1156 Encapsulation Attribute Sub-TLVs" registry for "IPv4 DS Field", with 1157 this document being the reference. 1159 IANA is requested to assign a codepoint from the "BGP Tunnel 1160 Encapsulation Attribute Sub-TLVs" registry for "UDP Destination 1161 Port", with this document being the reference. 1163 IANA is requested to assign a codepoint from the "BGP Tunnel 1164 Encapsulation Attribute Sub-TLVs" registry for "Embedded Label 1165 Handling", with this document being the reference. 1167 IANA is requested to assign a codepoint from the "BGP Tunnel 1168 Encapsulation Tunnel Types" registry for "GTP". 1170 IANA is requested to add this document as a reference for tunnel 1171 types 8 (VXLAN), 9 (NVGRE), 11 (MPLS-in-GRE), and 12 (VXLAN-GPE) in 1172 the "BGP Tunnel Encapsulation Tunnel Types" registry. 1174 11. Security Considerations 1176 The Tunnel Encapsulation attribute can cause traffic to be diverted 1177 from its normal path, especially when the Remote Endpoint sub-TLV is 1178 used. This can have serious consequences if the attribute is added 1179 or modified illegitimately, as it enables traffic to be "hijacked". 1181 The Remote Endpoint sub-TLV contains both an IP address and an AS 1182 number. BGP Origin Validation [RFC6811] can be used to obtain 1183 assurance that the given IP address belongs to the given AS. While 1184 this provides some protection against misconfiguration, it does not 1185 prevent a malicious agent from inserting a sub-TLV that will appear 1186 valid. 1188 Before sending a packet through the tunnel identified in a particular 1189 TLV of a Tunnel Encapsulation attribute, it may be advisable to use 1190 BGP Origin Validation to obtain the following additional assurances: 1192 o the origin AS of the route carrying the Tunnel Encapsulation 1193 attribute is correct; 1195 o the origin AS of the route to the IP address specified in the 1196 Remote Endpoint sub-TLV is correct, and is the same AS that is 1197 specified in the Remote Endpoint sub-TLV. 1199 One then has some level of assurance that the tunneled traffic is 1200 going to the same destination AS that it would have gone to had the 1201 Tunnel Encapsulation attribute not been present. However, this may 1202 not suit all use cases, and in any event is not very strong 1203 protection against hijacking. 1205 For these reasons, BGP Origin Validation should not be relied upon 1206 exclusively, and the filtering procedures of Section 8 should always 1207 be in place. 1209 Increased protection can be obtained by using BGP Path Validation 1210 [BGPSEC] to ensure that the route carrying the Tunnel Encapsulation 1211 attribute, and the routes to the Remote Endpoint of each specified 1212 tunnel, have not been altered illegitimately. 1214 If BGP Origin Validation is used as specified above, and the tunnel 1215 specified in a particular TLV of a Tunnel Encapsulation attribute is 1216 therefore regarded as "suspicious", that tunnel should not be used. 1217 Other tunnels specified in (other TLVs of) the Tunnel Encapsulation 1218 attribute may still be used. 1220 12. Acknowledgments 1222 The authors wish to think Ron Bonica, John Drake, Satoru Matushima, 1223 Dhananjaya Rao, John Scudder, Ravi Singh, Thomas Morin, and Xiaohu Xu 1224 for their review, comments, and/or helpful discussions. 1226 13. Contributor Addresses 1228 Below is a list of other contributing authors in alphabetical order: 1230 Randy Bush 1231 Internet Initiative Japan 1232 5147 Crystal Springs 1233 Bainbridge Island, Washington 98110 1234 United States 1236 Email: randy@psg.com 1238 Robert Raszuk 1239 Mirantis Inc. 1240 615 National Ave. #100 1241 Mountain View, California 94043 1242 United States 1244 Email: robert@raszuk.net 1246 14. References 1248 14.1. Normative References 1250 [ERRORS] Chen, E., Scudder, J., Mohapatra, P., and K. Patel, 1251 "Revised Error Handling for BGP UPDATE Messages", 1252 internet-draft draft-ietf-idr-error-handling-19, April 1253 2015. 1255 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1256 Requirement Levels", BCP 14, RFC 2119, 1257 DOI 10.17487/RFC2119, March 1997, 1258 . 1260 [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation 1261 Subsequent Address Family Identifier (SAFI) and the BGP 1262 Tunnel Encapsulation Attribute", RFC 5512, 1263 DOI 10.17487/RFC5512, April 2009, 1264 . 1266 14.2. Informative References 1268 [BGPSEC] Lepinski, M. and S. Turner, "An Overview of BGPsec", 1269 internet-draft draft-ietf-sidr-bgpsec-overview, January 1270 2015. 1272 [EVPN-Inter-Subnet] 1273 Sajassi, A., "Integrated Routing and Bridging in EVPN", 1274 internet-draft draft-ietf-bess-evpn-inter-subnet- 1275 forwarding, November 2014. 1277 [GTP-U] 3GPP, "GPRS Tunneling Protocol User Plane, TS 29.281", 1278 2014. 1280 [NVGRE] Garg, P. and Y. Wang, "NVGRE: Network Virtualization using 1281 Generic Routing Encapsulation", internet-draft draft- 1282 sridharan-virtualization-nvgre, April 2015. 1284 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1285 "Definition of the Differentiated Services Field (DS 1286 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1287 DOI 10.17487/RFC2474, December 1998, 1288 . 1290 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1291 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1292 DOI 10.17487/RFC2784, March 2000, 1293 . 1295 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 1296 RFC 2890, DOI 10.17487/RFC2890, September 2000, 1297 . 1299 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1300 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1301 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1302 . 1304 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1305 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1306 2006, . 1308 [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. 1309 Austein, "BGP Prefix Origin Validation", RFC 6811, 1310 DOI 10.17487/RFC6811, January 2013, 1311 . 1313 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1314 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1315 eXtensible Local Area Network (VXLAN): A Framework for 1316 Overlaying Virtualized Layer 2 Networks over Layer 3 1317 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1318 . 1320 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1321 "Encapsulating MPLS in UDP", RFC 7510, 1322 DOI 10.17487/RFC7510, April 2015, 1323 . 1325 [vEPC] Matsushima, S. and R. Wakikawa, "Stateless User-Plane 1326 Architecture for Virtualized EPC", internet-draft draft- 1327 matsushima-stateless-uplane-vepc-04, March 2015. 1329 [VXLAN-GPE] 1330 Quinn, P., Manur, R., Kreeger, L., Lewis, D., Maino, F., 1331 Smith, M., Agarwal, P., Xu, X., Elzur, U., Garg, P., 1332 Melman, D., and R. Manur, "Generic Protocol Extension for 1333 VXLAN", internet-draft draft-ietf-nvo3-vxlan-gpe, May 1334 2015. 1336 Authors' Addresses 1338 Eric C. Rosen (editor) 1339 Juniper Networks, Inc. 1340 10 Technology Park Drive 1341 Westford, Massachusetts 01886 1342 United States 1344 Email: erosen@juniper.net 1346 Keyur Patel 1347 Cisco Systems 1348 170 W. Tasman Drive 1349 San Jose, CA 95134 1350 United States 1352 Email: keyupate@cisco.com 1354 Gunter Van de Velde 1355 Alcatel-Lucent 1356 Copernicuslaan 50 1357 Antwerpen 2018 1358 Belgium 1360 Email: gunter.van_de_velde@alcatel-lucent.com