idnits 2.17.1 draft-ietf-idr-tunnel-encaps-12.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 20, 2019) is 1801 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4760' is defined on line 1834, but no explicit reference was found in the text == Outdated reference: A later version (-13) exists of draft-ietf-nvo3-vxlan-gpe-07 ** Downref: Normative reference to an Informational draft: draft-ietf-nvo3-vxlan-gpe (ref. 'I-D.ietf-nvo3-vxlan-gpe') ** Obsolete normative reference: RFC 5512 (Obsoleted by RFC 9012) ** Obsolete normative reference: RFC 5566 (Obsoleted by RFC 9012) ** Downref: Normative reference to an Informational RFC: RFC 7348 ** Downref: Normative reference to an Informational RFC: RFC 7637 == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-08 Summary: 5 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group K. Patel 3 Internet-Draft Arrcus, Inc 4 Obsoletes: 5512 (if approved) G. Van de Velde 5 Intended status: Standards Track Nokia 6 Expires: November 21, 2019 S. Sangli 7 Juniper Networks, Inc 8 E. Rosen 9 May 20, 2019 11 The BGP Tunnel Encapsulation Attribute 12 draft-ietf-idr-tunnel-encaps-12.txt 14 Abstract 16 RFC 5512 defines a BGP Path Attribute known as the "Tunnel 17 Encapsulation Attribute". This attribute allows one to specify a set 18 of tunnels. For each such tunnel, the attribute can provide the 19 information needed to create the tunnel and the corresponding 20 encapsulation header. The attribute can also provide information 21 that aids in choosing whether a particular packet is to be sent 22 through a particular tunnel. RFC 5512 states that the attribute is 23 only carried in BGP UPDATEs that have the "Encapsulation Subsequent 24 Address Family (Encapsulation SAFI)". This document deprecates the 25 Encapsulation SAFI (which has never been used in production), and 26 specifies semantics for the attribute when it is carried in UPDATEs 27 of certain other SAFIs. This document adds support for additional 28 tunnel types, and allows a remote tunnel endpoint address to be 29 specified for each tunnel. This document also provides support for 30 specifying fields of any inner or outer encapsulations that may be 31 used by a particular tunnel. 33 This document obsoletes RFC 5512. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at https://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on November 21, 2019. 51 Copyright Notice 53 Copyright (c) 2019 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (https://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 69 1.1. Brief Summary of RFC 5512 . . . . . . . . . . . . . . . . 4 70 1.2. Deficiencies in RFC 5512 . . . . . . . . . . . . . . . . 4 71 1.3. Brief Summary of Changes from RFC 5512 . . . . . . . . . 5 72 1.4. Impact on RFC 5566 . . . . . . . . . . . . . . . . . . . 6 73 2. The Tunnel Encapsulation Attribute . . . . . . . . . . . . . 6 74 3. Tunnel Encapsulation Attribute Sub-TLVs . . . . . . . . . . . 8 75 3.1. The Remote Endpoint Sub-TLV . . . . . . . . . . . . . . . 8 76 3.2. Encapsulation Sub-TLVs for Particular Tunnel Types . . . 10 77 3.2.1. VXLAN . . . . . . . . . . . . . . . . . . . . . . . . 10 78 3.2.2. VXLAN-GPE . . . . . . . . . . . . . . . . . . . . . . 12 79 3.2.3. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . 13 80 3.2.4. L2TPv3 . . . . . . . . . . . . . . . . . . . . . . . 14 81 3.2.5. GRE . . . . . . . . . . . . . . . . . . . . . . . . . 15 82 3.2.6. MPLS-in-GRE . . . . . . . . . . . . . . . . . . . . . 15 83 3.2.7. IP-in-IP . . . . . . . . . . . . . . . . . . . . . . 16 84 3.3. Outer Encapsulation Sub-TLVs . . . . . . . . . . . . . . 16 85 3.3.1. IPv4 DS Field . . . . . . . . . . . . . . . . . . . . 16 86 3.3.2. UDP Destination Port . . . . . . . . . . . . . . . . 17 87 3.4. Sub-TLVs for Aiding Tunnel Selection . . . . . . . . . . 17 88 3.4.1. Protocol Type Sub-TLV . . . . . . . . . . . . . . . . 17 89 3.4.2. Color Sub-TLV . . . . . . . . . . . . . . . . . . . . 17 90 3.5. Embedded Label Handling Sub-TLV . . . . . . . . . . . . . 18 91 3.6. MPLS Label Stack Sub-TLV . . . . . . . . . . . . . . . . 19 92 3.7. Prefix-SID Sub-TLV . . . . . . . . . . . . . . . . . . . 20 93 4. Extended Communities Related to the Tunnel Encapsulation 94 Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . 21 95 4.1. Encapsulation Extended Community . . . . . . . . . . . . 21 96 4.2. Router's MAC Extended Community . . . . . . . . . . . . . 23 97 4.3. Color Extended Community . . . . . . . . . . . . . . . . 23 98 5. Semantics and Usage of the Tunnel Encapsulation attribute . . 23 99 6. Routing Considerations . . . . . . . . . . . . . . . . . . . 27 100 6.1. Impact on BGP Decision Process . . . . . . . . . . . . . 27 101 6.2. Looping, Infinite Stacking, Etc. . . . . . . . . . . . . 27 102 7. Recursive Next Hop Resolution . . . . . . . . . . . . . . . . 28 103 8. Use of Virtual Network Identifiers and Embedded Labels when 104 Imposing a Tunnel Encapsulation . . . . . . . . . . . . . . . 28 105 8.1. Tunnel Types without a Virtual Network Identifier Field . 29 106 8.2. Tunnel Types with a Virtual Network Identifier Field . . 29 107 8.2.1. Unlabeled Address Families . . . . . . . . . . . . . 30 108 8.2.2. Labeled Address Families . . . . . . . . . . . . . . 30 109 9. Applicability Restrictions . . . . . . . . . . . . . . . . . 31 110 10. Scoping . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 111 11. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 32 112 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 113 12.1. Subsequent Address Family Identifiers . . . . . . . . . 34 114 12.2. BGP Path Attributes . . . . . . . . . . . . . . . . . . 34 115 12.3. Extended Communities . . . . . . . . . . . . . . . . . . 35 116 12.4. BGP Tunnel Encapsulation Attribute Sub-TLVs . . . . . . 35 117 12.5. Tunnel Types . . . . . . . . . . . . . . . . . . . . . . 36 118 12.6. Flags Field of Vxlan Encapsulation sub-TLV . . . . . . . 36 119 12.7. Flags Field of Vxlan-GPE Encapsulation sub-TLV . . . . . 36 120 12.8. Flags Field of NVGRE Encapsulation sub-TLV . . . . . . . 36 121 12.9. Embedded Label Handling sub-TLV . . . . . . . . . . . . 36 122 13. Security Considerations . . . . . . . . . . . . . . . . . . . 37 123 14. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 38 124 15. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 38 125 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 126 16.1. Normative References . . . . . . . . . . . . . . . . . . 38 127 16.2. Informative References . . . . . . . . . . . . . . . . . 40 128 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 130 1. Introduction 132 This document obsoletes RFC 5512. The deficiencies of RFC 5512, and 133 a summary of the changes made, are discussed in Sections 1.1-1.3. 134 The material from RFC 5512 that is retained has been incorporated 135 into this document. 137 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 138 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 139 "OPTIONAL" in this document are to be interpreted as described in BCP 140 14 [RFC2119] [RFC8174] when, and only when, they appear in all 141 capitals, as shown here. 143 1.1. Brief Summary of RFC 5512 145 [RFC5512] defines a BGP Path Attribute known as the Tunnel 146 Encapsulation attribute. This attribute consists of one or more 147 TLVs. Each TLV identifies a particular type of tunnel. Each TLV 148 also contains one or more sub-TLVs. Some of the sub-TLVs, e.g., the 149 "Encapsulation sub-TLV", contain information that may be used to form 150 the encapsulation header for the specified tunnel type. Other sub- 151 TLVs, e.g., the "color sub-TLV" and the "protocol sub-TLV", contain 152 information that aids in determining whether particular packets 153 should be sent through the tunnel that the TLV identifies. 155 [RFC5512] only allows the Tunnel Encapsulation attribute to be 156 attached to BGP UPDATE messages of the Encapsulation Address Family. 157 These UPDATE messages have an AFI (Address Family Identifier) of 1 or 158 2, and a SAFI of 7. In an UPDATE of the Encapsulation SAFI, the NLRI 159 (Network Layer Reachability Information) is an address of the BGP 160 speaker originating the UPDATE. Consider the following scenario: 162 o BGP speaker R1 has received and installed UPDATE U; 164 o UPDATE U's SAFI is the Encapsulation SAFI; 166 o UPDATE U has the address R2 as its NLRI; 168 o UPDATE U has a Tunnel Encapsulation attribute. 170 o R1 has a packet, P, to transmit to destination D; 172 o R1's best path to D is a BGP route that has R2 as its next hop; 174 In this scenario, when R1 transmits packet P, it should transmit it 175 to R2 through one of the tunnels specified in U's Tunnel 176 Encapsulation attribute. The IP address of the remote endpoint of 177 each such tunnel is R2. Packet P is known as the tunnel's "payload". 179 1.2. Deficiencies in RFC 5512 181 While the ability to specify tunnel information in a BGP UPDATE is 182 useful, the procedures of [RFC5512] have certain limitations: 184 o The requirement to use the "Encapsulation SAFI" presents an 185 unfortunate operational cost, as each BGP session that may need to 186 carry tunnel encapsulation information needs to be reconfigured to 187 support the Encapsulation SAFI. The Encapsulation SAFI has never 188 been used, and this requirement has served only to discourage the 189 use of the Tunnel Encapsulation attribute. 191 o There is no way to use the Tunnel Encapsulation attribute to 192 specify the remote endpoint address of a given tunnel; [RFC5512] 193 assumes that the remote endpoint of each tunnel is specified as 194 the NLRI of an UPDATE of the Encapsulation-SAFI. 196 o If the respective best paths to two different address prefixes 197 have the same next hop, [RFC5512] does not provide a 198 straightforward method to associate each prefix with a different 199 tunnel. 201 o If a particular tunnel type requires an outer IP or UDP 202 encapsulation, there is no way to signal the values of any of the 203 fields of the outer encapsulation. 205 o In [RFC5512]'s specification of the sub-TLVs, each sub-TLV has 206 one-octet length field. In some cases, a two-octet length field 207 may be needed. 209 1.3. Brief Summary of Changes from RFC 5512 211 In this document we address these deficiencies by: 213 o Deprecating the Encapsulation SAFI. 215 o Defining a new "Remote Endpoint Address sub-TLV" that can be 216 included in any of the TLVs contained in the Tunnel Encapsulation 217 attribute. This sub-TLV can be used to specify the remote 218 endpoint address of a particular tunnel. 220 o Allowing the Tunnel Encapsulation attribute to be carried by BGP 221 UPDATEs of additional AFI/SAFIs. Appropriate semantics are 222 provided for this way of using the attribute. 224 o Defining a number of new sub-TLVs that provide additional 225 information that is useful when forming the encapsulation header 226 used to send a packet through a particular tunnel. 228 o Defining the sub-TLV type field so that a sub-TLV whose type is in 229 the range from 0 to 127 inclusive has a one-octet length field, 230 but a sub-TLV whose type is in the range from 128 to 255 inclusive 231 has a two-octet length field. 233 One of the sub-TLVs defined in [RFC5512] is the "Encapsulation sub- 234 TLV". For a given tunnel, the encapsulation sub-TLV specifies some 235 of the information needed to construct the encapsulation header used 236 when sending packets through that tunnel. This document defines 237 encapsulation sub-TLVs for a number of tunnel types not discussed in 238 [RFC5512]: VXLAN (Virtual Extensible Local Area Network, [RFC7348]), 239 VXLAN-GPE (Generic Protocol Extension for VXLAN, 240 [I-D.ietf-nvo3-vxlan-gpe]), NVGRE (Network Virtualization Using 241 Generic Routing Encapsulation [RFC7637]), and MPLS-in-GRE (MPLS in 242 Generic Routing Encapsulation [RFC2784], [RFC2890], [RFC4023]). 243 MPLS-in-UDP [RFC7510] is also supported, but an Encapsulation sub-TLV 244 for it is not needed. 246 Some of the encapsulations mentioned in the previous paragraph need 247 to be further encapsulated inside UDP and/or IP. [RFC5512] provides 248 no way to specify that certain information is to appear in these 249 outer IP and/or UDP encapsulations. This document provides a 250 framework for including such information in the TLVs of the Tunnel 251 Encapsulation attribute. 253 When the Tunnel Encapsulation attribute is attached to a BGP UPDATE 254 whose AFI/SAFI identifies one of the labeled address families, it is 255 not always obvious whether the label embedded in the NLRI is to 256 appear somewhere in the tunnel encapsulation header (and if so, 257 where), or whether it is to appear in the payload, or whether it can 258 be omitted altogether. This is especially true if the tunnel 259 encapsulation header itself contains a "virtual network identifier". 260 This document provides a mechanism that allows one to signal (by 261 using sub-TLVs of the Tunnel Encapsulation attribute) how one wants 262 to use the embedded label when the tunnel encapsulation has its own 263 virtual network identifier field. 265 [RFC5512] defines a Tunnel Encapsulation Extended Community, that can 266 be used instead of the Tunnel Encapsulation attribute under certain 267 circumstances. This document addresses the issue of how to handle a 268 BGP UPDATE that carries both a Tunnel Encapsulation attribute and one 269 or more Tunnel Encapsulation Extended Communities. 271 1.4. Impact on RFC 5566 273 [RFC5566] uses the mechanisms defined in [RFC5512]. While this 274 document obsoletes [RFC5512], it does not address the issue of how to 275 use the mechanisms of [RFC5566] without also using the Encapsulation 276 SAFI. Those issues are considered to be outside the scope of this 277 document. 279 2. The Tunnel Encapsulation Attribute 281 The Tunnel Encapsulation attribute is an optional transitive BGP Path 282 attribute. IANA has assigned the value 23 as the type code of the 283 attribute. The attribute is composed of a set of Type-Length-Value 284 (TLV) encodings. Each TLV contains information corresponding to a 285 particular tunnel type. A TLV is structured as shown in Figure 1: 287 0 1 2 3 288 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 290 | Tunnel Type (2 Octets) | Length (2 Octets) | 291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 | | 293 | Value | 294 | | 295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 297 Figure 1: Tunnel Encapsulation TLV Value Field 299 o Tunnel Type (2 octets): identifies a type of tunnel. The field 300 contains values from the IANA Registry "BGP Tunnel Encapsulation 301 Attribute Tunnel Types". 303 Note that for tunnel types whose names are of the form "X-in-Y", 304 e.g., "MPLS-in-GRE", only packets of the specified payload type 305 "X" are to be carried through the tunnel of type "Y". This is the 306 equivalent of specifying a tunnel type "Y" and including in its 307 TLV a Protocol Type sub-TLV (see Section 3.4.1) specifying 308 protocol "X". If the tunnel type is "X-in-Y", it is unnecessary, 309 though harmless, to include a Protocol Type sub-TLV specifying 310 "X". 312 o Length (2 octets): the total number of octets of the value field. 314 o Value (variable): comprised of multiple sub-TLVs. 316 Each sub-TLV consists of three fields: a 1-octet type, a 1-octet or 317 2-octet length field (depending on the type), and zero or more octets 318 of value. A sub-TLV is structured as shown in Figure 2: 320 +--------------------------------+ 321 | Sub-TLV Type (1 Octet) | 322 +--------------------------------+ 323 | Sub-TLV Length (1 or 2 Octets) | 324 +--------------------------------+ 325 | Sub-TLV Value (Variable) | 326 +--------------------------------+ 328 Table 1: Tunnel Encapsulation Sub-TLV Format 330 o Sub-TLV Type (1 octet): each sub-TLV type defines a certain 331 property about the tunnel TLV that contains this sub-TLV. 333 o Sub-TLV Length (1 or 2 octets): the total number of octets of the 334 sub-TLV value field. The Sub-TLV Length field contains 1 octet if 335 the Sub-TLV Type field contains a value in the range from 0-127. 336 The Sub-TLV Length field contains two octets if the Sub-TLV Type 337 field contains a value in the range from 128-255. 339 o Sub-TLV Value (variable): encodings of the value field depend on 340 the sub-TLV type as enumerated above. The following sub-sections 341 define the encoding in detail. 343 3. Tunnel Encapsulation Attribute Sub-TLVs 345 In this section, we specify a number of sub-TLVs. These sub-TLVs can 346 be included in a TLV of the Tunnel Encapsulation attribute. 348 3.1. The Remote Endpoint Sub-TLV 350 The Remote Endpoint sub-TLV is a sub-TLV whose value field contains 351 three sub-fields: 353 1. a four-octet Autonomous System (AS) number sub-field 355 2. a two-octet Address Family sub-field 357 3. an address sub-field, whose length depends upon the Address 358 Family. 360 0 1 2 3 361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 363 | Autonomous System Number | 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | Address Family | Address ~ 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 367 ~ ~ 368 | | 369 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 371 Figure 2: Remote Endpoint Sub-TLV Value Field 373 The Address Family subfield contains a value from IANA's "Address 374 Family Numbers" registry. In this document, we assume that the 375 Address Family is either IPv4 or IPv6; use of other address families 376 is outside the scope of this document. 378 If the Address Family subfield contains the value for IPv4, the 379 address subfield must contain an IPv4 address (a /32 IPv4 prefix). 381 In this case, the length field of Remote Endpoint sub-TLV must 382 contain the value 10 (0xa). 384 If the Address Family subfield contains the value for IPv6, the 385 address sub-field must contain an IPv6 address (a /128 IPv6 prefix). 386 In this case, the length field of Remote Endpoint sub-TLV must 387 contain the value 22 (0x16). IPv6 link local addresses are not valid 388 values of the IP address field. 390 In a given BGP UPDATE, the address family (IPv4 or IPv6) of a Remote 391 Endpoint sub-TLV is independent of the address family of the UPDATE 392 itself. For example, an UPDATE whose NLRI is an IPv4 address may 393 have a Tunnel Encapsulation attribute containing Remote Endpoint sub- 394 TLVs that contain IPv6 addresses. Also, different tunnels 395 represented in the Tunnel Encapsulation attribute may have Remote 396 Endpoints of different address families. 398 A two-octet AS number can be carried in the AS number field by 399 setting the two high order octets to zero, and carrying the number in 400 the two low order octets of the field. 402 The AS number in the sub-TLV MUST be the number of the AS to which 403 the IP address in the sub-TLV belongs. 405 There is one special case: the Remote Endpoint sub-TLV MAY have a 406 value field whose Address Family subfield contains 0. This means 407 that the tunnel's remote endpoint is the UPDATE's BGP next hop. If 408 the Address Family subfield contains 0, the Address subfield is 409 omitted, and the Autonomous System number field is set to 0. 411 If any of the following conditions hold, the Remote Endpoint sub-TLV 412 is considered to be "malformed": 414 o The sub-TLV contains the value for IPv4 in its Address Family 415 subfield, but the length of the sub-TLV's value field is other 416 than 10 (0xa). 418 o The sub-TLV contains the value for IPv6 in its Address Family 419 subfield, but the length of the sub-TLV's value field is other 420 than 22 (0x16). 422 o The sub-TLV contains the value zero in its Address Family field, 423 but the length of the sub-TLV's value field is other than 6, or 424 the Autonomous System subfield is not set to zero. 426 o The IP address in the sub-TLV's address subfield is not a valid IP 427 address (e.g., it's an IPv4 broadcast address). 429 o It can be determined that the IP address in the sub-TLV's address 430 subfield does not belong to the non-zero AS whose number is in the 431 its Autonomous System subfield. (See section Section 13 for 432 discussion of one way to determine this.) 434 If the Remote Endpoint sub-TLV is malformed, the TLV containing it is 435 also considered to be malformed, and the entire TLV MUST be ignored. 436 However, the Tunnel Encapsulation attribute MUST NOT be considered to 437 be malformed in this case; other TLVs in the attribute MUST be 438 processed (if they can be parsed correctly). 440 When redistributing a route that is carrying a Tunnel Encapsulation 441 attribute containing a TLV that itself contains a malformed Remote 442 Endpoint sub-TLV, the TLV MUST be removed from the attribute before 443 redistribution. 445 See Section 11 for further discussion of how to handle errors that 446 are encountered when parsing the Tunnel Encapsulation attribute. 448 If the Remote Endpoint sub-TLV contains an IPv4 or IPv6 address that 449 is valid but not reachable, the sub-TLV is NOT considered to be 450 malformed. 452 3.2. Encapsulation Sub-TLVs for Particular Tunnel Types 454 This section defines Tunnel Encapsulation sub-TLVs for the following 455 tunnel types: VXLAN ([RFC7348]), VXLAN-GPE 456 ([I-D.ietf-nvo3-vxlan-gpe]), NVGRE ([RFC7637]), MPLS-in-GRE 457 ([RFC2784], [RFC2890], [RFC4023]), L2TPv3 ([RFC3931]), and GRE 458 ([RFC2784], [RFC2890], [RFC4023]). 460 Rules for forming the encapsulation based on the information in a 461 given TLV are given in Sections 5 and 8. 463 There are also tunnel types for which it is not necessary to define 464 an Encapsulation sub-TLV, because there are no fields in the 465 encapsulation header whose values need to be signaled from the remote 466 endpoint. 468 3.2.1. VXLAN 470 This document defines an encapsulation sub-TLV for VXLAN tunnels. 471 When the tunnel type is VXLAN, the following is the structure of the 472 value field in the encapsulation sub-TLV: 474 0 1 2 3 475 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 477 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 | MAC Address (4 Octets) | 480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 | MAC Address (2 Octets) | Reserved | 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 484 Figure 3: VXLAN Encapsulation Sub-TLV 486 V: This bit is set to 1 to indicate that a "valid" VN-ID (Virtual 487 Network Identifier) is present in the encapsulation sub-TLV. 488 Please see Section 8. 490 M: This bit is set to 1 to indicate that a valid MAC Address is 491 present in the encapsulation sub-TLV. 493 R: The remaining bits in the 8-bit flags field are reserved for 494 further use. They MUST always be set to 0 by the originator of 495 the sub-TLV. Intermediate routers MUST propagate them without 496 modification. Any receiving routers MUST ignore these bits upon a 497 receipt of the sub-TLV. 499 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 500 ID value. If the V bit is not set, the VN-id field MUST be set to 501 zero. 503 MAC Address: If the M bit is set, this field contains a 6 octet 504 Ethernet MAC address. If the M bit is not set, this field MUST be 505 set to all zeroes. 507 When forming the VXLAN encapsulation header: 509 o The values of the V, M, and R bits are NOT copied into the flags 510 field of the VXLAN header. The flags field of the VXLAN header is 511 set as per [RFC7348]. 513 o If the M bit is set, the MAC Address is copied into the Inner 514 Destination MAC Address field of the Inner Ethernet Header (see 515 section 5 of [RFC7348]). 517 If the M bit is not set, and the payload being sent through the 518 VXLAN tunnel is an ethernet frame, the Destination MAC Address 519 field of the Inner Ethernet Header is just the Destination MAC 520 Address field of the payload's ethernet header. 522 If the M bit is not set, and the payload being sent through the 523 VXLAN tunnel is an IP or MPLS packet, the Inner Destination MAC 524 address field is set to a configured value; if there is no 525 configured value, the VXLAN tunnel cannot be used. 527 o See Section 8 to see how the VNI field of the VXLAN encapsulation 528 header is set. 530 Note that in order to send an IP packet or an MPLS packet through a 531 VXLAN tunnel, the packet must first be encapsulated in an ethernet 532 header, which becomes the "inner ethernet header" described in 533 [RFC7348]. The VXLAN Encapsulation sub-TLV may contain information 534 (e.g.,the MAC address) that is used to form this ethernet header. 536 3.2.2. VXLAN-GPE 538 This document defines an encapsulation sub-TLV for VXLAN tunnels. 539 When the tunnel type is VXLAN-GPE, the following is the structure of 540 the value field in the encapsulation sub-TLV: 542 0 1 2 3 543 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 |Ver|V|R|R|R|R|R| Reserved | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 | VN-ID | Reserved | 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 550 Figure 4: VXLAN GPE Encapsulation Sub-TLV 552 V: This bit is set to 1 to indicate that a "valid" VN-ID is 553 present in the encapsulation sub-TLV. Please see Section 8. 555 R: The bits designated "R" above are reserved for future use. 556 They MUST always be set to 0 by the originator of the sub-TLV. 557 Intermediate routers MUST propagate them without modification. 558 Any receiving routers MUST ignore these bits upon a receipt of the 559 sub-TLV. 561 Version (Ver): Indicates VXLAN GPE protocol version. (See the 562 "Version Bits" section of [I-D.ietf-nvo3-vxlan-gpe].) If the 563 indicated version is not supported, the TLV that contains this 564 Encapsulation sub-TLV MUST be treated as specifying an unsupported 565 tunnel type. The value of this field will be copied into the 566 corresponding field of the VXLAN encapsulation header. 568 VN-ID: If the V bit is set, this field contains a 3 octet VN-ID 569 value. If the V bit is not set, this field MUST be set to zero. 571 When forming the VXLAN-GPE encapsulation header: 573 o The values of the V and R bits are NOT copied into the flags field 574 of the VXLAN-GPE header. However, the values of the Ver bits are 575 copied into the VXLAN-GPE header. Other bits in the flags field 576 of the VXLAN-GPE header are set as per [I-D.ietf-nvo3-vxlan-gpe]. 578 o See Section 8 to see how the VNI field of the VXLAN-GPE 579 encapsulation header is set. 581 3.2.3. NVGRE 583 This document defines an encapsulation sub-TLV for NVGRE tunnels. 584 When the tunnel type is NVGRE, the following is the structure of the 585 value field in the encapsulation sub-TLV: 587 0 1 2 3 588 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 | MAC Address (4 Octets) | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 | MAC Address (2 Octets) | Reserved | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 Figure 5: NVGRE Encapsulation Sub-TLV 599 V: This bit is set to 1 to indicate that a "valid" VN-ID is 600 present in the encapsulation sub-TLV. Please see Section 8. 602 M: This bit is set to 1 to indicate that a valid MAC Address is 603 present in the encapsulation sub-TLV. 605 R: The remaining bits in the 8-bit flags field are reserved for 606 further use. They MUST always be set to 0 by the originator of 607 the sub-TLV. Intermediate routers MUST propagate them without 608 modification. Any receiving routers MUST ignore these bits upon a 609 receipt of the sub-TLV. 611 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 612 ID value. If the V bit is not set, the VN-id field MUST be set to 613 zero. 615 MAC Address: If the M bit is set, this field contains a 6 octet 616 Ethernet MAC address. If the M bit is not set, this field MUST be 617 set to all zeroes. 619 When forming the NVGRE encapsulation header: 621 o The values of the V, M, and R bits are NOT copied into the flags 622 field of the NVGRE header. The flags field of the VXLAN header is 623 set as per [RFC7637]. 625 o If the M bit is set, the MAC Address is copied into the Inner 626 Destination MAC Address field of the Inner Ethernet Header (see 627 section 3.2 of [RFC7637]). 629 If the M bit is not set, and the payload being sent through the 630 NVGRE tunnel is an ethernet frame, the Destination MAC Address 631 field of the Inner Ethernet Header is just the Destination MAC 632 Address field of the payload's ethernet header. 634 If the M bit is not set, and the payload being sent through the 635 NVGRE tunnel is an IP or MPLS packet, the Inner Destination MAC 636 address field is set to a configured value; if there is no 637 configured value, the NVGRE tunnel cannot be used. 639 o See Section 8 to see how the VSID (Virtual Subnet Identifier) 640 field of the NVGRE encapsulation header is set. 642 3.2.4. L2TPv3 644 When the tunnel type of the TLV is L2TPv3 over IP, the following is 645 the structure of the value field of the encapsulation sub-TLV: 647 0 1 2 3 648 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 649 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 650 | Session ID (4 octets) | 651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 652 | | 653 | Cookie (Variable) | 654 | | 655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 657 Figure 6: L2TPv3 Encapsulation Sub-TLV 659 Session ID: a non-zero 4-octet value locally assigned by the 660 advertising router that serves as a lookup key in the incoming 661 packet's context. 663 Cookie: an optional, variable length (encoded in octets -- 0 to 8 664 octets) value used by L2TPv3 to check the association of a 665 received data message with the session identified by the Session 666 ID. Generation and usage of the cookie value is as specified in 667 [RFC3931]. 669 The length of the cookie is not encoded explicitly, but can be 670 calculated as (sub-TLV length - 4). 672 3.2.5. GRE 674 When the tunnel type of the TLV is GRE, the following is the 675 structure of the value field of the encapsulation sub-TLV: 677 0 1 2 3 678 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 680 | GRE Key (4 octets) | 681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 683 Figure 7: GRE Encapsulation Sub-TLV 685 GRE Key: 4-octet field [RFC2890] that is generated by the 686 advertising router. The actual method by which the key is 687 obtained is beyond the scope of this document. The key is 688 inserted into the GRE encapsulation header of the payload packets 689 sent by ingress routers to the advertising router. It is intended 690 to be used for identifying extra context information about the 691 received payload. 693 Note that the key is optional. Unless a key value is being 694 advertised, the GRE encapsulation sub-TLV MUST NOT be present. 696 3.2.6. MPLS-in-GRE 698 When the tunnel type is MPLS-in-GRE, the following is the structure 699 of the value field in an optional encapsulation sub-TLV: 701 0 1 2 3 702 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | GRE-Key (4 Octets) | 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 707 Figure 8: MPLS-in-GRE Encapsulation Sub-TLV 709 GRE-Key: 4-octet field [RFC2890] that is generated by the 710 advertising router. The actual method by which the key is 711 obtained is beyond the scope of this document. The key is 712 inserted into the GRE encapsulation header of the payload packets 713 sent by ingress routers to the advertising router. It is intended 714 to be used for identifying extra context information about the 715 received payload. Note that the key is optional. Unless a key 716 value is being advertised, the MPLS-in-GRE encapsulation sub-TLV 717 MUST NOT be present. 719 Note that the GRE tunnel type defined in Section 3.2.5 can be used 720 instead of the MPLS-in-GRE tunnel type when it is necessary to 721 encapsulate MPLS in GRE. Including a TLV of the MPLS-in-GRE tunnel 722 type is equivalent to including a TLV of the GRE tunnel type that 723 also includes a Protocol Type sub-TLV (Section 3.4.1) specifying MPLS 724 as the protocol to be encapsulated. That is, if a TLV specifies 725 MPLS-in-GRE or if it includes a Protocol Type sub-TLV specifying 726 MPLS, the GRE tunnel advertised in that TLV MUST NOT be used for 727 carrying IP packets. 729 While it is not really necessary to have both the GRE and MPLS-in-GRE 730 tunnel types, both are included for reasons of backwards 731 compatibility. 733 3.2.7. IP-in-IP 735 When the tunnel type of the TLV is IP-in-IP, it does not have Virtual 736 Network Identifier. See for Section 8.1 Embedded Label handling on 737 IP-in-IP tunnels. 739 3.3. Outer Encapsulation Sub-TLVs 741 The Encapsulation sub-TLV for a particular tunnel type allows one to 742 specify the values that are to be placed in certain fields of the 743 encapsulation header for that tunnel type. However, some tunnel 744 types require an outer IP encapsulation, and some also require an 745 outer UDP encapsulation. The Encapsulation sub-TLV for a given 746 tunnel type does not usually provide a way to specify values for 747 fields of the outer IP and/or UDP encapsulations. If it is necessary 748 to specify values for fields of the outer encapsulation, additional 749 sub-TLVs must be used. This document defines two such sub-TLVs. 751 If an outer encapsulation sub-TLV occurs in a TLV for a tunnel type 752 that does not use the corresponding outer encapsulation, the sub-TLV 753 is treated as if it were an unknown type of sub-TLV. 755 3.3.1. IPv4 DS Field 757 Most of the tunnel types that can be specified in the Tunnel 758 Encapsulation attribute require an outer IP encapsulation. The IPv4 759 Differentiated Services (DS) Field sub-TLV can be carried in the TLV 760 of any such tunnel type. It specifies the setting of the one-octet 761 Differentiated Services field in the outer IP encapsulation (see 762 [RFC2474]). The value field is always a single octet. 764 3.3.2. UDP Destination Port 766 Some of the tunnel types that can be specified in the Tunnel 767 Encapsulation attribute require an outer UDP encapsulation. 768 Generally there is a standard UDP Destination Port value for a 769 particular tunnel type. However, sometimes it is useful to be able 770 to use a non-standard UDP destination port. If a particular tunnel 771 type requires an outer UDP encapsulation, and it is desired to use a 772 UDP destination port other than the standard one, the port to be used 773 can be specified by including a UDP Destination Port sub-TLV. The 774 value field of this sub-TLV is always a two-octet field, containing 775 the port value. 777 3.4. Sub-TLVs for Aiding Tunnel Selection 779 3.4.1. Protocol Type Sub-TLV 781 The protocol type sub-TLV MAY be included in a given TLV to indicate 782 the type of the payload packets that may be encapsulated with the 783 tunnel parameters that are being signaled in the TLV. The value 784 field of the sub-TLV contains a 2-octet value from IANA's ethertype 785 registry [Ethertypes]. 787 For example, if we want to use three L2TPv3 sessions, one carrying 788 IPv4 packets, one carrying IPv6 packets, and one carrying MPLS 789 packets, the egress router will include three TLVs of L2TPv3 790 encapsulation type, each specifying a different Session ID and a 791 different payload type. The protocol type sub-TLV for these will be 792 IPv4 (protocol type = 0x0800), IPv6 (protocol type = 0x86dd), and 793 MPLS (protocol type = 0x8847), respectively. This informs the 794 ingress routers of the appropriate encapsulation information to use 795 with each of the given protocol types. Insertion of the specified 796 Session ID at the ingress routers allows the egress to process the 797 incoming packets correctly, according to their protocol type. 799 3.4.2. Color Sub-TLV 801 The color sub-TLV MAY be encoded as a way to "color" the 802 corresponding tunnel TLV. The value field of the sub-TLV is eight 803 octets long, and consists of a Color Extended Community, as defined 804 in Section 4.3. For the use of this sub-TLV and Extended Community, 805 please see Section 7. 807 Note that the high-order octet of this sub-TLV's value field MUST be 808 set to 3, and the next octet MUST be set to 0x0b. (Otherwise the 809 value field is not identical to a Color Extended Community.) 811 If a Color sub-TLV is not of the proper length, or the first two 812 octets of its value field are not 0x030b, the sub-TLV should be 813 treated as if it were an unrecognized sub-TLV (see Section 11). 815 3.5. Embedded Label Handling Sub-TLV 817 Certain BGP address families (corresponding to particular AFI/SAFI 818 pairs, e.g., 1/4, 2/4, 1/128, 2/128) have MPLS labels embedded in 819 their NLRIs. We will use the term "embedded label" to refer to the 820 MPLS label that is embedded in an NLRI, and the term "labeled address 821 family" to refer to any AFI/SAFI that has embedded labels. 823 Some of the tunnel types (e.g., VXLAN, VXLAN-GPE, and NVGRE) that can 824 be specified in the Tunnel Encapsulation attribute have an 825 encapsulation header containing "Virtual Network" identifier of some 826 sort. The Encapsulation sub-TLVs for these tunnel types may 827 optionally specify a value for the virtual network identifier. 829 Suppose a Tunnel Encapsulation attribute is attached to an UPDATE of 830 an embedded address family, and it is decided to use a particular 831 tunnel (specified in one of the attribute's TLVs) for transmitting a 832 packet that is being forwarded according to that UPDATE. When 833 forming the encapsulation header for that packet, different 834 deployment scenarios require different handling of the embedded label 835 and/or the virtual network identifier. The Embedded Label Handling 836 sub-TLV can be used to control the placement of the embedded label 837 and/or the virtual network identifier in the encapsulation. 839 The Embedded Label Handling sub-TLV may be included in any TLV of the 840 Tunnel Encapsulation attribute. If the Tunnel Encapsulation 841 attribute is attached to an UPDATE of a non-labeled address family, 842 the sub-TLV is treated as a no-op. If the sub-TLV is contained in a 843 TLV whose tunnel type does not have a virtual network identifier in 844 its encapsulation header, the sub-TLV is treated as a no-op. In 845 those cases where the sub-TLV is treated as a no-op, it SHOULD NOT be 846 stripped from the TLV before the UPDATE is forwarded. 848 The sub-TLV's Length field always contains the value 1, and its value 849 field consists of a single octet. The following values are defined: 851 1: The payload will be an MPLS packet with the embedded label at the 853 top of its label stack. 855 2: The embedded label is not carried in the payload, but is carried 856 either in the virtual network identifier field of the 857 encapsulation header, or else is ignored entirely. 859 Please see Section 8 for the details of how this sub-TLV is used when 860 it is carried by an UPDATE of a labeled address family. 862 3.6. MPLS Label Stack Sub-TLV 864 This sub-TLV allows an MPLS label stack ([RFC3032]) to be associated 865 with a particular tunnel. 867 The value field of this sub-TLV is a sequence of MPLS label stack 868 entries. The first entry in the sequence is the "topmost" label, the 869 final entry in the sequence is the "bottommost" label. When this 870 label stack is pushed onto a packet, this ordering MUST be preserved. 872 Each label stack entry has the following format: 874 0 1 2 3 875 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 877 | Label | TC |S| TTL | 878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 880 Figure 9: MPLS Label Stack Sub-TLV 882 If a packet is to be sent through the tunnel identified in a 883 particular TLV, and if that TLV contains an MPLS Label Stack sub-TLV, 884 then the label stack appearing in the sub-TLV MUST be pushed onto the 885 packet. This label stack MUST be pushed onto the packet before any 886 other labels are pushed onto the packet. 888 In particular, if the Tunnel Encapsulation attribute is attached to a 889 BGP UPDATE of a labeled address family, the contents of the MPLS 890 Label Stack sub-TLV MUST be pushed onto the packet before the label 891 embedded in the NLRI is pushed onto the packet. 893 If the MPLS label stack sub-TLV is included in a TLV identifying a 894 tunnel type that uses virtual network identifiers (see Section 8), 895 the contents of the MPLS label stack sub-TLV MUST be pushed onto the 896 packet before the procedures of Section 8 are applied. 898 The number of label stack entries in the sub-TLV MUST be determined 899 from the sub-TLV length field. Thus it is not necessary to set the S 900 bit in any of the label stack entries of the sub-TLV, and the setting 901 of the S bit is ignored when parsing the sub-TLV. When the label 902 stack entries are pushed onto a packet that already has a label 903 stack, the S bits of all the entries MUST be cleared. When the label 904 stack entries are pushed onto a packet that does not already have a 905 label stack, the S bit of the bottommost label stack entry MUST be 906 set, and the S bit of all the other label stack entries MUST be 907 cleared. 909 By default, the TC (Traffic Class) field ([RFC3032], [RFC5462]) of 910 each label stack entry is set to 0. This may of course be changed by 911 policy at the originator of the sub-TLV. When pushing the label 912 stack onto a packet, the TC of the label stack entries is preserved 913 by default. However, local policy at the router that is pushing on 914 the stack MAY cause modification of the TC values. 916 By default, the TTL (Time to Live) field of each label stack entry is 917 set to 255. This may be changed by policy at the originator of the 918 sub-TLV. When pushing the label stack onto a packet, the TTL of the 919 label stack entries is preserved by default. However, local policy 920 at the router that is pushing on the stack MAY cause modification of 921 the TTL values. If any label stack entry in the sub-TLV has a TTL 922 value of zero, the router that is pushing the stack on a packet MUST 923 change the value to a non-zero value. 925 Note that this sub-TLV can appear within a TLV identifying any type 926 of tunnel, not just within a TLV identifying an MPLS tunnel. 927 However, if this sub-TLV appears within a TLV identifying an MPLS 928 tunnel (or an MPLS-in-X tunnel), this sub-TLV plays the same role 929 that would be played by an MPLS Encapsulation sub-TLV. Therefore, an 930 MPLS Encapsulation sub-TLV is not defined. 932 3.7. Prefix-SID Sub-TLV 934 [I-D.ietf-idr-bgp-prefix-sid] defines a BGP Path attribute known as 935 the "Prefix-SID Attribute". This attribute is defined to contain a 936 sequence of one or more TLVs, where each TLV is either a "Label- 937 Index" TLV, an "IPv6 SID (Segment Identifier)" TLV, or an "Originator 938 SRGB (Source Routing Global Block)" TLV. 940 In this document, we define a Prefix-SID sub-TLV. The value field of 941 the Prefix-SID sub-TLV can be set to any valid value of the value 942 field of a BGP Prefix-SID attribute, as defined in 943 [I-D.ietf-idr-bgp-prefix-sid]. 945 The Prefix-SID sub-TLV can occur in a TLV identifying any type of 946 tunnel. If an Originator SRGB is specified in the sub-TLV, that SRGB 947 MUST be interpreted to be the SRGB used by the tunnel's Remote 948 Endpoint. The Label-Index, if present, is the Segment Routing SID 949 that the tunnel's Remote Endpoint uses to represent the prefix 950 appearing in the NLRI field of the BGP UPDATE to which the Tunnel 951 Encapsulation attribute is attached. 953 If a Label-Index is present in the prefix-SID sub-TLV, then when a 954 packet is sent through the tunnel identified by the TLV, the 955 corresponding MPLS label MUST be pushed on the packet's label stack. 956 The corresponding MPLS label is computed from the Label-Index value 957 and the SRGB of the route's originator. 959 If the Originator SRGB is not present, it is assumed that the 960 originator's SRGB is known by other means. Such "other means" are 961 outside the scope of this document. 963 The corresponding MPLS label is pushed on after the processing of the 964 MPLS Label Stack sub-TLV, if present, as specified in Section 3.6. 965 It is pushed on before any other labels (e.g., a label embedded in 966 UPDATE's NLRI, or a label determined by the procedures of Section 8 967 are pushed on the stack. 969 The Prefix-SID sub-TLV has slightly different semantics than the 970 Prefix-SID attribute. When the Prefix-SID attribute is attached to a 971 given route, the BGP speaker that originally attached the attribute 972 is expected to be in the same Segment Routing domain as the BGP 973 speakers who receive the route with the attached attribute. The 974 Label-Index tells the receiving BGP speakers that the prefix-SID is 975 for the advertised prefix in that Segment Routing domain. When the 976 Prefix-SID sub-TLV is used, the BGP speaker at the head end of the 977 tunnel need even not be in the same Segment Routing Domain as the 978 tunnel's Remote Endpoint, and there is no implication that the 979 prefix-SID for the advertised prefix is the same in the Segment 980 Routing domains of the BGP speaker that originated the sub-TLV and 981 the BGP speaker that received it. 983 4. Extended Communities Related to the Tunnel Encapsulation Attribute 985 4.1. Encapsulation Extended Community 987 The Encapsulation Extended Community is a Transitive Opaque Extended 988 Community. This Extended Community may be attached to a route of any 989 AFI/SAFI to which the Tunnel Encapsulation attribute may be attached. 990 Each such Extended Community identifies a particular tunnel type. If 991 the Encapsulation Extended Community identifies a particular tunnel 992 type, its semantics are exactly equivalent to the semantics of a 993 Tunnel Encapsulation attribute Tunnel TLV for which the following 994 three conditions all hold: 996 1. it identifies the same tunnel type, 997 2. it has a Remote Endpoint sub-TLV for which one of the following 998 two conditions holds: 1000 A. its "Address Family" subfield contains zero, or 1002 B. its "Address" subfield contains the same IP address that 1003 appears in the next hop field of the route to which the 1004 Tunnel Encapsulation attribute is attached 1006 3. it has no other sub-TLVs. 1008 We will refer to such a Tunnel TLV as a "barebones" Tunnel TLV. 1010 The Encapsulation Extended Community was first defined in [RFC5512]. 1011 While it provides only a small subset of the functionality of the 1012 Tunnel Encapsulation attribute, it is used in a number of deployed 1013 applications, and is still needed for backwards compatibility. To 1014 ensure backwards compatibility, this specification establishes the 1015 following rules: 1017 1. If the Tunnel Encapsulation attribute of a given route contains a 1018 barebones Tunnel TLV identifying a particular tunnel type, an 1019 Encapsulation Extended Community identifying the same tunnel type 1020 SHOULD be attached to the route. 1022 2. If the Encapsulation Extended Community identifying a particular 1023 tunnel type is attached to a given route, the corresponding 1024 barebones Tunnel TLV MAY be omitted from the Tunnel Encapsulation 1025 attribute. 1027 3. Suppose a particular route has both (a) an Encapsulation Extended 1028 Community specifying a particular tunnel type, and (b) a Tunnel 1029 Encapsulation attribute with a barebones Tunnel TLV specifying 1030 that same tunnel type. Both (a) and (b) MUST be interpreted as 1031 denoting the same tunnel. 1033 In short, in situations where one could use either the Encapsulation 1034 Extended Community or a barebones Tunnel TLV, one may use either or 1035 both. However, to ensure backwards compatibility with applications 1036 that do not support the Tunnel Encapsulation attribute, it is 1037 preferable to use the Encapsulation Extended Community. If the 1038 Extended Community (identifying a particular tunnel type) is present, 1039 the corresponding Tunnel TLV is optional. 1041 Note that for tunnel types of the form "X-in-Y", e.g., MPLS-in-GRE, 1042 the Encapsulation Extended Community implies that only packets of the 1043 specified payload type "X" are to be carried through the tunnel of 1044 type "Y". 1046 In the remainder of this specification, when we speak of a route as 1047 containing a Tunnel Encapsulation attribute with a TLV identifying a 1048 particular tunnel type, we are implicitly including the case where 1049 the route contains a Tunnel Encapsulation Extended Community 1050 identifying that tunnel type. 1052 4.2. Router's MAC Extended Community 1054 [I-D.ietf-bess-evpn-inter-subnet-forwarding] defines a Router's MAC 1055 Extended Community. This Extended Community provides information 1056 that may conflict with information in one or more of the 1057 Encapsulation Sub-TLVs of a Tunnel Encapsulation attribute. In case 1058 of such a conflict, the information in the Encapsulation Sub-TLV 1059 takes precedence. 1061 4.3. Color Extended Community 1063 The Color Extended Community is a Transitive Opaque Extended 1064 Community with the following encoding: 1066 0 1 2 3 1067 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1068 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1069 | 0x03 | 0x0b | Reserved | 1070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1071 | Color Value | 1072 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1074 Figure 10: Color Extended Community 1076 For the use of this Extended Community please see Section 7. 1078 5. Semantics and Usage of the Tunnel Encapsulation attribute 1080 [RFC5512] specifies the use of the Tunnel Encapsulation attribute in 1081 BGP UPDATE messages of AFI/SAFI 1/7 and 2/7. That document restricts 1082 the use of this attribute to UPDATE messsages of those SAFIs. This 1083 document removes that restriction. 1085 The BGP Tunnel Encapsulation attribute MAY be carried in any BGP 1086 UPDATE message whose AFI/SAFI is 1/1 (IPv4 Unicast), 2/1 (IPv6 1087 Unicast), 1/4 (IPv4 Labeled Unicast), 2/4 (IPv6 Labeled Unicast), 1088 1/128 (VPN-IPv4 Labeled Unicast), 2/128 (VPN-IPv6 Labeled Unicast), 1089 or 25/70 (Ethernet VPN, usually known as EVPN)). Use of the Tunnel 1090 Encapsulation attribute in BGP UPDATE messages of other AFI/SAFIs is 1091 outside the scope of this document. 1093 It has been suggested that it may sometimes be useful to attach a 1094 Tunnel Encapsulation attribute to a BGP UPDATE message that is also 1095 carrying a PMSI (Provider Multicast Service Interface) Tunnel 1096 attribute [RFC6514]. If the PMSI Tunnel attribute specifies an IP 1097 tunnel, the Tunnel Encapsulation attribute could be used to provide 1098 additional information about the IP tunnel. The usage of the Tunnel 1099 Encapsulation attribute in combination with the PMSI Tunnel attribute 1100 is outside the scope of this document. 1102 The decision to attach a Tunnel Encapsulation attribute to a given 1103 BGP UPDATE is determined by policy. The set of TLVs and sub-TLVs 1104 contained in the attribute is also determined by policy. 1106 When the Tunnel Encapsulation attribute is carried in an UPDATE of 1107 one of the AFI/SAFIs specified in the previous paragraph, each TLV 1108 MUST have a Remote Endpoint sub-TLV. If a TLV that does not have a 1109 Remote Endpoint sub-TLV, that TLV should be treated as if it had a 1110 malformed Remote Endpoint sub-TLV (see Section 3.1). 1112 Suppose that: 1114 o a given packet P must be forwarded by router R; 1116 o the path along which P is to be forwarded is determined by BGP 1117 UPDATE U; 1119 o UPDATE U has a Tunnel Encapsulation attribute, containing at least 1120 one TLV that identifies a "feasible tunnel" for packet P. A 1121 tunnel is considered feasible if it has the following three 1122 properties: 1124 * The tunnel type is supported (i.e., router R knows how to set 1125 up tunnels of that type, how to create the encapsulation header 1126 for tunnels of that type, etc.) 1128 * The tunnel is of a type that can be used to carry packet P 1129 (e.g., an MPLS-in-UDP tunnel would not be a feasible tunnel for 1130 carrying an IP packet, UNLESS the IP packet can first be 1131 converted to an MPLS packet). 1133 * The tunnel is specified in a TLV whose Remote Endpoint sub-TLV 1134 identifies an IP address that is reachable. 1136 Then router R MUST send packet P through one of the feasible tunnels 1137 identified in the Tunnel Encapsulation attribute of UPDATE U. 1139 If the Tunnel Encapsulation attribute contains several TLVs (i.e., if 1140 it specifies several tunnels), router R may choose any one of those 1141 tunnels, based upon local policy. If any tunnel TLV contains one or 1142 more Color sub-TLVs (Section 3.4.2) and/or the Protocol Type sub-TLV 1143 (Section 3.4.1), the choice of tunnel may be influenced by these sub- 1144 TLVs. 1146 If a particular tunnel is not feasible at some moment because its 1147 Remote Endpoint cannot be reached at that moment, the tunnel may 1148 become feasible at a later time (when its endpoint becomes 1149 reachable). Router R should take note of this. If router R is 1150 already using a different tunnel, it MAY switch to the tunnel that 1151 just became feasible, or it MAY decide to continue using the tunnel 1152 that it is already using. How this decision is made is outside the 1153 scope of this document. 1155 In addition to the sub-TLVs already defined, additional sub-TLVs may 1156 be defined that affect the choice of tunnel to be used, or that 1157 affect the contents of the tunnel encapsulation header. The 1158 documents that define any such additional sub-TLVs must specify the 1159 effect that including the sub-TLV is to have. 1161 Once it is determined to send a packet through the tunnel specified 1162 in a particular TLV of a particular Tunnel Encapsulation attribute, 1163 then the tunnel's remote endpoint address is the IP address contained 1164 in the sub-TLV. If the TLV contains a Remote Endpoint sub-TLV whose 1165 value field is all zeroes, then the tunnel's remote endpoint is the 1166 IP address specified as the Next Hop of the BGP Update containing the 1167 Tunnel Encapsulation attribute. The address of the remote endpoint 1168 generally appears in a "destination address" field of the 1169 encapsulation. 1171 The full set of procedures for sending a packet through a particular 1172 tunnel type to a particular remote endpoint depends upon the tunnel 1173 type, and is outside the scope of this document. Note that some 1174 tunnel types may require the execution of an explicit tunnel setup 1175 protocol before they can be used for carrying data. Other tunnel 1176 types may not require any tunnel setup protocol. 1178 Sending a packet through a tunnel always requires that the packet be 1179 encapsulated, with an encapsulation header that is appropriate for 1180 the tunnel type. The contents of the tunnel encapsulation header MAY 1181 be influenced by the Encapsulation sub-TLV. If there is no 1182 Encapsulation sub-TLV present, the router transmitting the packet 1183 through the tunnel must have a priori knowledge (e.g., by 1184 provisioning) of how to fill in the various fields in the 1185 encapsulation header. 1187 Whenever a new Tunnel Type TLV is defined, the specification of that 1188 TLV should describe (or reference) the procedures for creating the 1189 encapsulation header used to forward packets through that tunnel 1190 type. If a tunnel type codepoint is assigned in the IANA "BGP Tunnel 1191 Encapsulation Tunnel Types" registry, but there is no corresponding 1192 specification that defines an Encapsulation sub-TLV for that tunnel 1193 type, the transmitting endpoint of such a tunnel is presumed to know 1194 a priori how to form the encapsulation header for that tunnel type. 1196 If a Tunnel Encapsulation attribute specifies several tunnels, the 1197 way in which a router chooses which one to use is a matter of policy, 1198 subject to the following constraint: if a router can determine that a 1199 given tunnel is not functional, it MUST NOT use that tunnel. In 1200 particular, if the tunnel is identified in a TLV that has a Remote 1201 Endpoint sub-TLV, and if the IP address specified in the sub-TLV is 1202 not reachable from router R, then the tunnel MUST be considered non- 1203 functional. Other means of determining whether a given tunnel is 1204 functional MAY be used; specification of such means is outside the 1205 scope of this specification. Of course, if a non-functional tunnel 1206 later becomes functional, router R SHOULD reevaluate its choice of 1207 tunnels. 1209 If router R determines that it cannot use any of the tunnels 1210 specified in the Tunnel Encapsulation attribute, it MAY either drop 1211 packet P, or it MAY transmit packet P as it would had the Tunnel 1212 Encapsulation attribute not been present. This is a matter of local 1213 policy. By default, the packet SHOULD be transmitted as if the 1214 Tunnel Encapsulation attribute had not been present. 1216 A Tunnel Encapsulation attribute may contain several TLVs that all 1217 specify the same tunnel type. Each TLV should be considered as 1218 specifying a different tunnel. Two tunnels of the same type may have 1219 different Remote Endpoint sub-TLVs, different Encapsulation sub-TLVs, 1220 etc. Choosing between two such tunnels is a matter of local policy. 1222 Once router R has decided to send packet P through a particular 1223 tunnel, it encapsulates packet P appropriately and then forwards it 1224 according to the route that leads to the tunnel's remote endpoint. 1225 This route may itself be a BGP route with a Tunnel Encapsulation 1226 attribute. If so, the encapsulated packet is treated as the payload 1227 and is encapsulated according to the Tunnel Encapsulation attribute 1228 of that route. That is, tunnels may be "stacked". 1230 Notwithstanding anything said in this document, a BGP speaker MAY 1231 have local policy that influences the choice of tunnel, and the way 1232 the encapsulation is formed. A BGP speaker MAY also have a local 1233 policy that tells it to ignore the Tunnel Encapsulation attribute 1234 entirely or in part. Of course, interoperability issues must be 1235 considered when such policies are put into place. 1237 6. Routing Considerations 1239 6.1. Impact on BGP Decision Process 1241 The presence of the Tunnel Encapsulation attribute affects the BGP 1242 bestpath selection algorithm. For all the tunnels described in the 1243 Tunnel Encapsulation attribute for a path, if no Remote Tunnel 1244 Endpoint address is feasible, then that path MUST NOT be considered 1245 resolvable for the purposes of Route Resolvability Condition 1246 [RFC4271] section 9.1.2.1. 1248 6.2. Looping, Infinite Stacking, Etc. 1250 Consider a packet destined for address X. Suppose a BGP UPDATE for 1251 address prefix X carries a Tunnel Encapsulation attribute that 1252 specifies a remote tunnel endpoint of Y. And suppose that a BGP 1253 UPDATE for address prefix Y carries a Tunnel Encapsulation attribute 1254 that specifies a Remote Endpoint of X. It is easy to see that this 1255 will cause an infinite number of encapsulation headers to be put on 1256 the given packet. 1258 This could happen as a result of misconfiguration, either accidental 1259 or intentional. It could also happen if the Tunnel Encapsulation 1260 attribute were altered by a malicious agent. Implementations should 1261 be aware of this. This document does not specify a maximum number of 1262 recursions; that is an implementation-specific matter. 1264 Improper setting (or malicious altering) of the Tunnel Encapsulation 1265 attribute could also cause data packets to loop. Suppose a BGP 1266 UPDATE for address prefix X carries a Tunnel Encapsulation attribute 1267 that specifies a remote tunnel endpoint of Y. Suppose router R 1268 receives and processes the update. When router R receives a packet 1269 destined for X, it will apply the encapsulation and send the 1270 encapsulated packet to Y. Y will decapsulate the packet and forward 1271 it further. If Y is further away from X than is router R, it is 1272 possible that the path from Y to X will traverse R. This would cause 1273 a long-lasting routing loop. The control plane itself cannot detect 1274 this situation, though a TTL field in the payload packets would 1275 presumably prevent any given packet from looping infinitely. 1277 These possibilities must also be kept in mind whenever the Remote 1278 Endpoint for a given prefix differs from the BGP next hop for that 1279 prefix. 1281 7. Recursive Next Hop Resolution 1283 Suppose that: 1285 o a given packet P must be forwarded by router R1; 1287 o the path along which P is to be forwarded is determined by BGP 1288 UPDATE U1; 1290 o UPDATE U1 does not have a Tunnel Encapsulation attribute; 1292 o the next hop of UPDATE U1 is router R2; 1294 o the best path to router R2 is a BGP route that was advertised in 1295 UPDATE U2; 1297 o UPDATE U2 has a Tunnel Encapsulation attribute. 1299 Then packet P MUST be sent through one of the tunnels identified in 1300 the Tunnel Encapsulation attribute of UPDATE U2. See Section 5 for 1301 further details. 1303 However, suppose that one of the TLVs in U2's Tunnel Encapsulation 1304 attribute contains the Color Sub-TLV. In that case, packet P MUST 1305 NOT be sent through the tunnel identified in that TLV, unless U1 is 1306 carrying the Color Extended Community that is identified in U2's 1307 Color Sub-TLV. 1309 Note that if UPDATE U1 and UPDATE U2 both have Tunnel Encapsulation 1310 attributes, packet P will be carried through a pair of nested 1311 tunnels. P will first be encapsulated based on the Tunnel 1312 Encapsulation attribute of U1. This encapsulated packet then becomes 1313 the payload, and is encapsulated based on the Tunnel Encapsulation 1314 attribute of U2. This is another way of "stacking" tunnels (see also 1315 Section 5). 1317 The procedures in this section presuppose that U1's next hop resolves 1318 to a BGP route, and that U2's next hop resolves (perhaps after 1319 further recursion) to a non-BGP route. 1321 8. Use of Virtual Network Identifiers and Embedded Labels when Imposing 1322 a Tunnel Encapsulation 1324 If the TLV specifying a tunnel contains an MPLS Label Stack sub-TLV, 1325 then when sending a packet through that tunnel, the procedures of 1326 Section 3.6 are applied before the procedures of this section. 1328 If the TLV specifying a tunnel contains a Prefix-SID sub-TLV, the 1329 procedures of Section 3.7 are applied before the procedures of this 1330 section. If the TLV also contains an MPLS Label Stack sub-TLV, the 1331 procedures of Section 3.6 are applied before the procedures of 1332 Section 3.7. 1334 8.1. Tunnel Types without a Virtual Network Identifier Field 1336 If a Tunnel Encapsulation attribute is attached to an UPDATE of a 1337 labeled address family, there will be one or more labels specified in 1338 the UPDATE's NLRI. 1340 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1341 is 1, the label or labels from the NLRI are pushed on the packet's 1342 label stack. 1344 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1345 if it contains an Embedded Label Handling sub-TLV whose value is 1346 2, the embedded label is ignored completely. The tunnel is 1347 assumed to have terminated at the corresponding VRF. 1349 The resulting MPLS packet is then further encapsulated, as specified 1350 by the TLV. 1352 8.2. Tunnel Types with a Virtual Network Identifier Field 1354 Three of the tunnel types that can be specified in a Tunnel 1355 Encapsulation TLV have virtual network identifier fields in their 1356 encapsulation headers. In the VXLAN and VXLAN-GPE encapsulations, 1357 this field is called the VNI (Virtual Network Identifier) field; in 1358 the NVGRE encapsulation, this field is called the VSID (Virtual 1359 Subnet Identifier) field. 1361 When one of these tunnel encapsulations is imposed on a packet, the 1362 setting of the virtual network identifier field in the encapsulation 1363 header depends upon the contents of the Encapsulation sub-TLV (if one 1364 is present). When the Tunnel Encapsulation attribute is being 1365 carried on a BGP UPDATE of a labeled address family, the setting of 1366 the virtual network identifier field also depends upon the contents 1367 of the Embedded Label Handling sub-TLV (if present). 1369 This section specifies the procedures for choosing the value to set 1370 in the virtual network identifier field of the encapsulation header. 1371 These procedures apply only when the tunnel type is VXLAN, VXLAN-GPE, 1372 or NVGRE. 1374 8.2.1. Unlabeled Address Families 1376 This sub-section applies when: 1378 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of 1379 an unlabeled address family, and 1381 o at least one of the attribute's TLVs identifies a tunnel type that 1382 uses a virtual network identifier, and 1384 o it has been determined to send a packet through one of those 1385 tunnels. 1387 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 1388 whose V bit is set, the virtual network identifier field of the 1389 encapsulation header is set to the value of the virtual network 1390 identifier field of the Encapsulation sub-TLV. 1392 Otherwise, the virtual network identifier field of the encapsulation 1393 header is set to a configured value; if there is no configured value, 1394 the tunnel cannot be used. 1396 8.2.2. Labeled Address Families 1398 This sub-section applies when: 1400 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of a 1401 labeled address family, and 1403 o at least one of the attribute's TLVs identifies a tunnel type that 1404 uses a virtual network identifier, and 1406 o it has been determined to send a packet through one of those 1407 tunnels. 1409 8.2.2.1. When a Valid VNI has been Signaled 1411 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 1412 whose V bit is set, the virtual network identifier field of the 1413 encapsulation header is set as follows: 1415 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1416 is 1, then the virtual network identifier field of the 1417 encapsulation header is set to the value of the virtual network 1418 identifier field of the Encapsulation sub-TLV. 1420 The embedded label (from the NLRI of the route that is carrying 1421 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1422 label stack in the encapsulation payload. 1424 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1425 if contains an Embedded Label Handling sub-TLV whose value is 2, 1426 the embedded label is ignored entirely, and the virtual network 1427 identifier field of the encapsulation header is set to the value 1428 of the virtual network identifier field of the Encapsulation sub- 1429 TLV. 1431 8.2.2.2. When a Valid VNI has not been Signaled 1433 If the TLV identifying the tunnel does not contain an Encapsulation 1434 sub-TLV whose V bit is set, the virtual network identifier field of 1435 the encapsulation header is set as follows: 1437 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1438 is 1, then the virtual network identifier field of the 1439 encapsulation header is set to a configured value. 1441 If there is no configured value, the tunnel cannot be used. 1443 The embedded label (from the NLRI of the route that is carrying 1444 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1445 label stack in the encapsulation payload. 1447 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1448 if it contains an Embedded Label Handling sub-TLV whose value is 1449 2, the embedded label is copied into the virtual network 1450 identifier field of the encapsulation header. 1452 In this case, the payload may or may not contain an MPLS label 1453 stack, depending upon other factors. If the payload does contain 1454 an MPLS label stack, the embedded label does not appear in that 1455 stack. 1457 9. Applicability Restrictions 1459 In a given UPDATE of a labeled address family, the label embedded in 1460 the NLRI is generally a label that is meaningful only to the router 1461 whose address appears as the next hop. Certain of the procedures of 1462 Section 8.2.2.1 or Section 8.2.2.2 cause the embedded label to be 1463 carried by a data packet to the router whose address appears in the 1464 Remote Endpoint sub-TLV. If the Remote Endpoint sub-TLV does not 1465 identify the same router that is the next hop, sending the packet 1466 through the tunnel may cause the label to be misinterpreted at the 1467 tunnel's remote endpoint. This may cause misdelivery of the packet. 1469 Therefore the embedded label MUST NOT be carried by a data packet 1470 traveling through a tunnel unless it is known that the label will be 1471 properly interpreted at the tunnel's remote endpoint. How this is 1472 known is outside the scope of this document. 1474 Note that if the Tunnel Encapsulation attribute is attached to a VPN- 1475 IP route [RFC4364], and if Inter-AS "option b" (see section 10 of 1476 [RFC4364]) is being used, and if the Remote Endpoint sub-TLV contains 1477 an IP address that is not in same AS as the router receiving the 1478 route, it is very likely that the embedded label has been changed. 1479 Therefore use of the Tunnel Encapsulation attribute in an "Inter-AS 1480 option b" scenario is not supported. 1482 10. Scoping 1484 The Tunnel Encapsulation attribute is defined as a transitive 1485 attribute, so that it may be passed along by BGP speakers that do not 1486 recognize it. However, it is intended that the Tunnel Encapsulation 1487 attribute be used only within a well-defined scope, e.g., within a 1488 set of Autonomous Systems that belong to a single administrative 1489 entity. If the attribute is distributed beyond its intended scope, 1490 packets may be sent through tunnels in a manner that is not intended. 1492 To prevent the Tunnel Encapsulation attribute from being distributed 1493 beyond its intended scope, any BGP speaker that understands the 1494 attribute MUST be able to filter the attribute from incoming BGP 1495 UPDATE messages. When the attribute is filtered from an incoming 1496 UPDATE, the attribute is neither processed nor redistributed. This 1497 filtering SHOULD be possible on a per-BGP-session basis. For each 1498 session, filtering of the attribute on incoming UPDATEs MUST be 1499 enabled by default. 1501 In addition, any BGP speaker that understands the attribute MUST be 1502 able to filter the attribute from outgoing BGP UPDATE messages. This 1503 filtering SHOULD be possible on a per-BGP-session basis. For each 1504 session, filtering of the attribute on outgoing UPDATEs MUST be 1505 enabled by default. 1507 11. Error Handling 1509 The Tunnel Encapsulation attribute is a sequence of TLVs, each of 1510 which is a sequence of sub-TLVs. The final octet of a TLV is 1511 determined by its length field. Similarly, the final octet of a sub- 1512 TLV is determined by its length field. The final octet of a TLV MUST 1513 also be the final octet of its final sub-TLV. If this is not the 1514 case, the TLV MUST be considered to be malformed. A TLV that is 1515 found to be malformed for this reason MUST NOT be processed, and MUST 1516 be stripped from the Tunnel Encapsulation attribute before the 1517 attribute is propagated. Subsequent TLVs in the Tunnel Encapsulation 1518 attribute may still be valid, in which case they MUST be processed 1519 and redistributed normally. 1521 If a Tunnel Encapsulation attribute does not have any valid TLVs, or 1522 it does not have the transitive bit set, the "Attribute Discard" 1523 procedure of [RFC7606] is applied. 1525 If a Tunnel Encapsulation attribute can be parsed correctly, but 1526 contains a TLV whose tunnel type is not recognized by a particular 1527 BGP speaker, that BGP speaker MUST NOT consider the attribute to be 1528 malformed. Rather, the TLV with the unrecognized tunnel type MUST be 1529 ignored, and the BGP speaker MUST interpret the attribute as if that 1530 TLV had not been present. If the route carrying the Tunnel 1531 Encapsulation attribute is propagated with the attribute, the 1532 unrecognized TLV MUST remain in the attribute. 1534 If a TLV of a Tunnel Encapsulation attribute contains a sub-TLV that 1535 is not recognized by a particular BGP speaker, the BGP speaker MUST 1536 process that TLV as if the unrecognized sub-TLV had not been present. 1537 If the route carrying the Tunnel Encapsulation attribute is 1538 propagated with the attribute, the unrecognized TLV MUST remain in 1539 the attribute. 1541 If the type code of a sub-TLV appears as "reserved" in the IANA "BGP 1542 Tunnel Encapsulation Attribute Sub-TLVs" registry, the sub-TLV MUST 1543 be treated as an unrecognized sub-TLV. 1545 In general, if a TLV contains a sub-TLV that is malformed (e.g., 1546 contains a length field whose value is not legal for that sub-TLV), 1547 the sub-TLV should be treated as if it were an unrecognized sub-TLV. 1548 This document specifies one exception to this rule -- within a tunnel 1549 encapsulation attribute that is carried by a BGP UPDATE whose AFI/ 1550 SAFI is one of those explicitly listed in the second paragraph of 1551 Section 5, if a TLV contains a malformed Remote Endpoint sub-TLV (as 1552 defined in Section 3.1), the entire TLV MUST be ignored, and MUST be 1553 removed from the Tunnel Encapsulation attribute before the route 1554 carrying that attribute is redistributed. 1556 Within a tunnel encapsulation attribute that is carried by a BGP 1557 UPDATE whose AFI/SAFI is one of those explicitly listed in the second 1558 paragraph of Section 5, a TLV that does not contain exactly one 1559 Remote Endpoint sub-TLV MUST be treated as if it contained a 1560 malformed Remote Endpoint sub-TLV. 1562 A TLV identifying a particular tunnel type may contain a sub-TLV that 1563 is meaningless for that tunnel type. For example, perhaps the TLV 1564 contains a "UDP Destination Port" sub-TLV, but the identified tunnel 1565 type does not use UDP encapsulation at all. Sub-TLVs of this sort 1566 MUST be treated as a no-op. That is, they MUST NOT affect the 1567 creation of the encapsulation header. However, the sub-TLV MUST NOT 1568 be considered to be malformed, and MUST NOT be removed from the TLV 1569 before the route carrying the Tunnel Encapsulation attribute is 1570 redistributed. (This allows for the possibility that such sub-TLVs 1571 may be given a meaning, in the context of the specified tunnel type, 1572 in the future.) 1574 There is no significance to the order in which the TLVs occur within 1575 the Tunnel Encapsulation attribute. Multiple TLVs may occur for a 1576 given tunnel type; each such TLV is regarded as describing a 1577 different tunnel. 1579 The following sub-TLVs defined in this document MUST NOT occur more 1580 than once in a given Tunnel TLV: Remote Endpoint (discussed above), 1581 Encapsulation, IPv4 DS, UDP Destination Port, Embedded Label 1582 Handling, MPLS Label Stack, Prefix-SID. If a Tunnel TLV has more 1583 than one of any of these sub-TLVs, all but the first occurrence of 1584 each such sub-TLV type MUST be treated as a no-op. However, the 1585 Tunnel TLV containing them MUST NOT be considered to be malformed, 1586 and all the sub-TLVs MUST be propagated if the route carrying the 1587 Tunnel Encapsulation attribute is propagated. 1589 The following sub-TLVs defined in this document may appear zero or 1590 more times in a given Tunnel TLV: Protocol Type, Color. Each 1591 occurrence of such sub-TLVs is meaningful. For example, the Color 1592 sub-TLV may appear multiple times to assign multiple colors to a 1593 tunnel. 1595 12. IANA Considerations 1597 12.1. Subsequent Address Family Identifiers 1599 IANA is requested to modify the "Subsequent Address Family 1600 Identifiers" registry to indicate that the Encapsulation SAFI is 1601 deprecated. This document should be the reference. 1603 12.2. BGP Path Attributes 1605 IANA has previously assigned value 23 from the "BGP Path Attributes" 1606 Registry to "Tunnel Encapsulation Attribute". IANA is requested to 1607 add this document as a reference. 1609 12.3. Extended Communities 1611 IANA has previously assigned values from the "Transitive Opaque 1612 Extended Community" type Registry to the "Color Extended Community" 1613 (sub-type 0x0b), and to the "Encapsulation Extended 1614 Community"(0x030c). IANA is requested to add this document as a 1615 reference for both assignments. 1617 12.4. BGP Tunnel Encapsulation Attribute Sub-TLVs 1619 IANA is requested to add the following note to the "BGP Tunnel 1620 Encapsulation Attribute Sub-TLVs" registry: 1622 If the Sub-TLV Type is in the range from 0 to 127 inclusive, the 1623 Sub-TLV Length field contains one octet. If the Sub-TLV Type is 1624 in the range from 128-255 inclusive, the Sub-TLV Length field 1625 contains two octets. 1627 IANA is requested to change the registration policy of the "BGP 1628 Tunnel Encapsulation Attribute Sub-TLVs" registry to the following: 1630 o The values 0 and 255 are reserved. 1632 o The values in the range 1-63 and 128-191 are to be allocated using 1633 the "Standards Action" registration procedure. 1635 o The values in the range 64-125 and 192-252 are to be allocated 1636 using the "First Come, First Served" registration procedure. 1638 o The values in the range 126-127 and 253-254 are reserved for 1639 experimental use; IANA shall not allocate values from this range. 1641 IANA has assigned the following codepoints in the "BGP Tunnel 1642 Encapsulation Attribute Sub-TLVs registry: 1644 6: Remote Endpoint 1646 7: IPv4 DS Field 1648 8: UDP Destination Port 1650 9: Embedded Label Handling 1652 10: MPLS Label Stack 1654 11: Prefix SID 1656 IANA has previously assigned codepoints from the "BGP Tunnel 1657 Encapsulation Attribute Sub-TLVs" registry for "Encapsulation", 1658 "Protocol Type", and "Color". IANA is requested to add this document 1659 as a reference. 1661 12.5. Tunnel Types 1663 IANA is requested to add this document as a reference for tunnel 1664 types 8 (VXLAN), 9 (NVGRE), 11 (MPLS-in-GRE), and 12 (VXLAN-GPE) in 1665 the "BGP Tunnel Encapsulation Tunnel Types" registry. 1667 IANA is requested to add this document as a reference for tunnel 1668 types 1 (L2TPv3), 2 (GRE), and 7 (IP in IP) in the "BGP Tunnel 1669 Encapsulation Tunnel Types" registry. 1671 12.6. Flags Field of Vxlan Encapsulation sub-TLV 1673 IANA is requested to add this document as a reference for creating 1674 the flags field of the Vxlan Encapsulation sub-TLV registry. 1676 IANA is requested to add this document as a reference for flag bits V 1677 and M in the "Flags field of Vxlan Encapsulation sub-TLV" registry. 1679 12.7. Flags Field of Vxlan-GPE Encapsulation sub-TLV 1681 IANA is requested to add this document as a reference for creating 1682 the flags field of the Vxlan-GPE Encapsulation sub-TLV registry. 1684 IANA is requested to add this document as a reference for flag bit V 1685 in the "Flags field of Vxlan-GPE Encapsulation sub-TLV" registry. 1687 12.8. Flags Field of NVGRE Encapsulation sub-TLV 1689 IANA is requested to add this document as a reference for creating 1690 the flags field of the NVGRE Encapsulation sub-TLV registry. 1692 IANA is requested to add this document as a reference for flag bits V 1693 and M in the "Flags field of NVGRE Encapsulation sub-TLV" registry. 1695 12.9. Embedded Label Handling sub-TLV 1697 IANA is requested to add this document as a reference for creating 1698 the sub-TLV's value field of the Embedded Label Handling sub-TLV 1699 registry. 1701 IANA is requested to add this document as a reference for value of 1 1702 (Payload of MPLS with embedded label) and 2 (no embedded label in 1703 payload) in the "sub-TLV's value field of the Embedded Label Handling 1704 sub-TLV" registry. 1706 13. Security Considerations 1708 The Tunnel Encapsulation attribute can cause traffic to be diverted 1709 from its normal path, especially when the Remote Endpoint sub-TLV is 1710 used. This can have serious consequences if the attribute is added 1711 or modified illegitimately, as it enables traffic to be "hijacked". 1713 The Remote Endpoint sub-TLV contains both an IP address and an AS 1714 number. BGP Origin Validation [RFC6811] can be used to obtain 1715 assurance that the given IP address belongs to the given AS. While 1716 this provides some protection against misconfiguration, it does not 1717 prevent a malicious agent from inserting a sub-TLV that will appear 1718 valid. 1720 Before sending a packet through the tunnel identified in a particular 1721 TLV of a Tunnel Encapsulation attribute, it may be advisable to use 1722 BGP Origin Validation to obtain the following additional assurances: 1724 o the origin AS of the route carrying the Tunnel Encapsulation 1725 attribute is correct; 1727 o the origin AS of the route to the IP address specified in the 1728 Remote Endpoint sub-TLV is correct, and is the same AS that is 1729 specified in the Remote Endpoint sub-TLV. 1731 One then has some level of assurance that the tunneled traffic is 1732 going to the same destination AS that it would have gone to had the 1733 Tunnel Encapsulation attribute not been present. However, this may 1734 not suit all use cases, and in any event is not very strong 1735 protection against hijacking. 1737 For these reasons, BGP Origin Validation should not be relied upon 1738 exclusively, and the filtering procedures of Section 10 should always 1739 be in place. 1741 Increased protection can be obtained by using BGPSEC [RFC8205] to 1742 ensure that the route carrying the Tunnel Encapsulation attribute, 1743 and the routes to the Remote Endpoint of each specified tunnel, have 1744 not been altered illegitimately. 1746 If BGP Origin Validation is used as specified above, and the tunnel 1747 specified in a particular TLV of a Tunnel Encapsulation attribute is 1748 therefore regarded as "suspicious", that tunnel should not be used. 1749 Other tunnels specified in (other TLVs of) the Tunnel Encapsulation 1750 attribute may still be used. 1752 14. Acknowledgments 1754 This document contains text from RFC5512, co-authored by Pradosh 1755 Mohapatra. The authors of the current document wish to thank Pradosh 1756 for his contribution. RFC5512 itself built upon prior work by Gargi 1757 Nalawade, Ruchi Kapoor, Dan Tappan, David Ward, Scott Wainner, Simon 1758 Barber, and Chris Metz, whom we also thank for their contributions. 1760 The authors wish to thank Lou Berger, Ron Bonica, Martin Djernaes, 1761 John Drake, Satoru Matsushima, Dhananjaya Rao, John Scudder, Ravi 1762 Singh, Thomas Morin, Xiaohu Xu, and Zhaohui Zhang for their review, 1763 comments, and/or helpful discussions. 1765 15. Contributor Addresses 1767 Below is a list of other contributing authors in alphabetical order: 1769 Randy Bush 1770 Internet Initiative Japan 1771 5147 Crystal Springs 1772 Bainbridge Island, Washington 98110 1773 United States 1775 Email: randy@psg.com 1777 Robert Raszuk 1778 Bloomberg LP 1779 731 Lexington Ave 1780 New York City, NY 10022 1781 United States 1783 Email: robert@raszuk.net 1785 16. References 1787 16.1. Normative References 1789 [I-D.ietf-idr-bgp-prefix-sid] 1790 Previdi, S., Filsfils, C., Lindem, A., Sreekantiah, A., 1791 and H. Gredler, "Segment Routing Prefix SID extensions for 1792 BGP", draft-ietf-idr-bgp-prefix-sid-27 (work in progress), 1793 June 2018. 1795 [I-D.ietf-nvo3-vxlan-gpe] 1796 Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol 1797 Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-07 (work 1798 in progress), April 2019. 1800 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1801 Requirement Levels", BCP 14, RFC 2119, 1802 DOI 10.17487/RFC2119, March 1997, 1803 . 1805 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1806 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1807 DOI 10.17487/RFC2784, March 2000, 1808 . 1810 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 1811 RFC 2890, DOI 10.17487/RFC2890, September 2000, 1812 . 1814 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1815 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1816 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1817 . 1819 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1820 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1821 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1822 . 1824 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1825 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1826 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1827 . 1829 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1830 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1831 DOI 10.17487/RFC4271, January 2006, 1832 . 1834 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1835 "Multiprotocol Extensions for BGP-4", RFC 4760, 1836 DOI 10.17487/RFC4760, January 2007, 1837 . 1839 [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation 1840 Subsequent Address Family Identifier (SAFI) and the BGP 1841 Tunnel Encapsulation Attribute", RFC 5512, 1842 DOI 10.17487/RFC5512, April 2009, 1843 . 1845 [RFC5566] Berger, L., White, R., and E. Rosen, "BGP IPsec Tunnel 1846 Encapsulation Attribute", RFC 5566, DOI 10.17487/RFC5566, 1847 June 2009, . 1849 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1850 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1851 eXtensible Local Area Network (VXLAN): A Framework for 1852 Overlaying Virtualized Layer 2 Networks over Layer 3 1853 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1854 . 1856 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1857 "Encapsulating MPLS in UDP", RFC 7510, 1858 DOI 10.17487/RFC7510, April 2015, 1859 . 1861 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1862 Patel, "Revised Error Handling for BGP UPDATE Messages", 1863 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1864 . 1866 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1867 Virtualization Using Generic Routing Encapsulation", 1868 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1869 . 1871 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1872 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1873 May 2017, . 1875 16.2. Informative References 1877 [Ethertypes] 1878 "IANA Ethertype Registry", 1879 . 1882 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 1883 Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. 1884 Rabadan, "Integrated Routing and Bridging in EVPN", draft- 1885 ietf-bess-evpn-inter-subnet-forwarding-08 (work in 1886 progress), March 2019. 1888 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1889 "Definition of the Differentiated Services Field (DS 1890 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1891 DOI 10.17487/RFC2474, December 1998, 1892 . 1894 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1895 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1896 2006, . 1898 [RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching 1899 (MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic 1900 Class" Field", RFC 5462, DOI 10.17487/RFC5462, February 1901 2009, . 1903 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1904 Encodings and Procedures for Multicast in MPLS/BGP IP 1905 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1906 . 1908 [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. 1909 Austein, "BGP Prefix Origin Validation", RFC 6811, 1910 DOI 10.17487/RFC6811, January 2013, 1911 . 1913 [RFC8205] Lepinski, M., Ed. and K. Sriram, Ed., "BGPsec Protocol 1914 Specification", RFC 8205, DOI 10.17487/RFC8205, September 1915 2017, . 1917 Authors' Addresses 1919 Keyur Patel 1920 Arrcus, Inc 1921 2077 Gateway Pl 1922 San Jose, CA 95110 1923 United States 1925 Email: keyur@arrcus.com 1927 Gunter Van de Velde 1928 Nokia 1929 Copernicuslaan 50 1930 Antwerpen 2018 1931 Belgium 1933 Email: gunter.van_de_velde@nokia.com 1935 Srihari R. Sangli 1936 Juniper Networks, Inc 1937 10 Technology Park Drive 1938 Westford, Massachusetts 01886 1939 United States 1941 Email: ssangli@juniper.net 1942 Eric C. Rosen