idnits 2.17.1 draft-ietf-idr-tunnel-encaps-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 30, 2019) is 1667 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC4760' is defined on line 1842, but no explicit reference was found in the text == Outdated reference: A later version (-13) exists of draft-ietf-nvo3-vxlan-gpe-07 ** Downref: Normative reference to an Informational draft: draft-ietf-nvo3-vxlan-gpe (ref. 'I-D.ietf-nvo3-vxlan-gpe') ** Obsolete normative reference: RFC 5512 (Obsoleted by RFC 9012) ** Obsolete normative reference: RFC 5566 (Obsoleted by RFC 9012) ** Downref: Normative reference to an Informational RFC: RFC 7348 ** Downref: Normative reference to an Informational RFC: RFC 7637 == Outdated reference: A later version (-15) exists of draft-ietf-bess-evpn-inter-subnet-forwarding-08 Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group K. Patel 3 Internet-Draft Arrcus, Inc 4 Obsoletes: 5512 (if approved) G. Van de Velde 5 Intended status: Standards Track Nokia 6 Expires: April 2, 2020 S. Sangli 7 Juniper Networks, Inc 8 September 30, 2019 10 The BGP Tunnel Encapsulation Attribute 11 draft-ietf-idr-tunnel-encaps-14.txt 13 Abstract 15 RFC 5512 defines a BGP Path Attribute known as the "Tunnel 16 Encapsulation Attribute". This attribute allows one to specify a set 17 of tunnels. For each such tunnel, the attribute can provide the 18 information needed to create the tunnel and the corresponding 19 encapsulation header. The attribute can also provide information 20 that aids in choosing whether a particular packet is to be sent 21 through a particular tunnel. RFC 5512 states that the attribute is 22 only carried in BGP UPDATEs that have the "Encapsulation Subsequent 23 Address Family (Encapsulation SAFI)". This document deprecates the 24 Encapsulation SAFI (which has never been used in production), and 25 specifies semantics for the attribute when it is carried in UPDATEs 26 of certain other SAFIs. This document adds support for additional 27 tunnel types, and allows a remote tunnel endpoint address to be 28 specified for each tunnel. This document also provides support for 29 specifying fields of any inner or outer encapsulations that may be 30 used by a particular tunnel. 32 This document obsoletes RFC 5512. 34 Status of This Memo 36 This Internet-Draft is submitted in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF). Note that other groups may also distribute 41 working documents as Internet-Drafts. The list of current Internet- 42 Drafts is at https://datatracker.ietf.org/drafts/current/. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on April 2, 2020. 50 Copyright Notice 52 Copyright (c) 2019 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (https://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 68 1.1. Brief Summary of RFC 5512 . . . . . . . . . . . . . . . . 4 69 1.2. Deficiencies in RFC 5512 . . . . . . . . . . . . . . . . 4 70 1.3. Brief Summary of Changes from RFC 5512 . . . . . . . . . 5 71 1.4. Impact on RFC 5566 . . . . . . . . . . . . . . . . . . . 6 72 2. The Tunnel Encapsulation Attribute . . . . . . . . . . . . . 6 73 3. Tunnel Encapsulation Attribute Sub-TLVs . . . . . . . . . . . 8 74 3.1. The Tunnel Endpoint Sub-TLV . . . . . . . . . . . . . . . 8 75 3.2. Encapsulation Sub-TLVs for Particular Tunnel Types . . . 10 76 3.2.1. VXLAN . . . . . . . . . . . . . . . . . . . . . . . . 10 77 3.2.2. VXLAN-GPE . . . . . . . . . . . . . . . . . . . . . . 12 78 3.2.3. NVGRE . . . . . . . . . . . . . . . . . . . . . . . . 13 79 3.2.4. L2TPv3 . . . . . . . . . . . . . . . . . . . . . . . 14 80 3.2.5. GRE . . . . . . . . . . . . . . . . . . . . . . . . . 15 81 3.2.6. MPLS-in-GRE . . . . . . . . . . . . . . . . . . . . . 15 82 3.2.7. IP-in-IP . . . . . . . . . . . . . . . . . . . . . . 16 83 3.3. Outer Encapsulation Sub-TLVs . . . . . . . . . . . . . . 16 84 3.3.1. IPv4 DS Field . . . . . . . . . . . . . . . . . . . . 16 85 3.3.2. UDP Destination Port . . . . . . . . . . . . . . . . 17 86 3.4. Sub-TLVs for Aiding Tunnel Selection . . . . . . . . . . 17 87 3.4.1. Protocol Type Sub-TLV . . . . . . . . . . . . . . . . 17 88 3.4.2. Color Sub-TLV . . . . . . . . . . . . . . . . . . . . 17 89 3.5. Embedded Label Handling Sub-TLV . . . . . . . . . . . . . 18 90 3.6. MPLS Label Stack Sub-TLV . . . . . . . . . . . . . . . . 19 91 3.7. Prefix-SID Sub-TLV . . . . . . . . . . . . . . . . . . . 20 92 4. Extended Communities Related to the Tunnel Encapsulation 93 Attribute . . . . . . . . . . . . . . . . . . . . . . . . . . 21 94 4.1. Encapsulation Extended Community . . . . . . . . . . . . 21 95 4.2. Router's MAC Extended Community . . . . . . . . . . . . . 23 96 4.3. Color Extended Community . . . . . . . . . . . . . . . . 23 97 5. Semantics and Usage of the Tunnel Encapsulation attribute . . 23 98 6. Routing Considerations . . . . . . . . . . . . . . . . . . . 27 99 6.1. Impact on BGP Decision Process . . . . . . . . . . . . . 27 100 6.2. Looping, Infinite Stacking, Etc. . . . . . . . . . . . . 27 101 7. Recursive Next Hop Resolution . . . . . . . . . . . . . . . . 28 102 8. Use of Virtual Network Identifiers and Embedded Labels when 103 Imposing a Tunnel Encapsulation . . . . . . . . . . . . . . . 28 104 8.1. Tunnel Types without a Virtual Network Identifier Field . 29 105 8.2. Tunnel Types with a Virtual Network Identifier Field . . 29 106 8.2.1. Unlabeled Address Families . . . . . . . . . . . . . 30 107 8.2.2. Labeled Address Families . . . . . . . . . . . . . . 30 108 9. Applicability Restrictions . . . . . . . . . . . . . . . . . 31 109 10. Scoping . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 110 11. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 32 111 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 112 12.1. Subsequent Address Family Identifiers . . . . . . . . . 34 113 12.2. BGP Path Attributes . . . . . . . . . . . . . . . . . . 34 114 12.3. Extended Communities . . . . . . . . . . . . . . . . . . 35 115 12.4. BGP Tunnel Encapsulation Attribute Sub-TLVs . . . . . . 35 116 12.5. Tunnel Types . . . . . . . . . . . . . . . . . . . . . . 36 117 12.6. Flags Field of Vxlan Encapsulation sub-TLV . . . . . . . 36 118 12.7. Flags Field of Vxlan-GPE Encapsulation sub-TLV . . . . . 36 119 12.8. Flags Field of NVGRE Encapsulation sub-TLV . . . . . . . 36 120 12.9. Embedded Label Handling sub-TLV . . . . . . . . . . . . 36 121 13. Security Considerations . . . . . . . . . . . . . . . . . . . 37 122 14. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 38 123 15. Contributor Addresses . . . . . . . . . . . . . . . . . . . . 38 124 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 38 125 16.1. Normative References . . . . . . . . . . . . . . . . . . 38 126 16.2. Informative References . . . . . . . . . . . . . . . . . 40 127 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 129 1. Introduction 131 This document obsoletes RFC 5512. The deficiencies of RFC 5512, and 132 a summary of the changes made, are discussed in Sections 1.1-1.3. 133 The material from RFC 5512 that is retained has been incorporated 134 into this document. 136 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 137 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 138 "OPTIONAL" in this document are to be interpreted as described in BCP 139 14 [RFC2119] [RFC8174] when, and only when, they appear in all 140 capitals, as shown here. 142 1.1. Brief Summary of RFC 5512 144 [RFC5512] defines a BGP Path Attribute known as the Tunnel 145 Encapsulation attribute. This attribute consists of one or more 146 TLVs. Each TLV identifies a particular type of tunnel. Each TLV 147 also contains one or more sub-TLVs. Some of the sub-TLVs, e.g., the 148 "Encapsulation sub-TLV", contain information that may be used to form 149 the encapsulation header for the specified tunnel type. Other sub- 150 TLVs, e.g., the "color sub-TLV" and the "protocol sub-TLV", contain 151 information that aids in determining whether particular packets 152 should be sent through the tunnel that the TLV identifies. 154 [RFC5512] only allows the Tunnel Encapsulation attribute to be 155 attached to BGP UPDATE messages of the Encapsulation Address Family. 156 These UPDATE messages have an AFI (Address Family Identifier) of 1 or 157 2, and a SAFI of 7. In an UPDATE of the Encapsulation SAFI, the NLRI 158 (Network Layer Reachability Information) is an address of the BGP 159 speaker originating the UPDATE. Consider the following scenario: 161 o BGP speaker R1 has received and installed UPDATE U; 163 o UPDATE U's SAFI is the Encapsulation SAFI; 165 o UPDATE U has the address R2 as its NLRI; 167 o UPDATE U has a Tunnel Encapsulation attribute. 169 o R1 has a packet, P, to transmit to destination D; 171 o R1's best path to D is a BGP route that has R2 as its next hop; 173 In this scenario, when R1 transmits packet P, it should transmit it 174 to R2 through one of the tunnels specified in U's Tunnel 175 Encapsulation attribute. The IP address of the tunnel egress 176 endpoint of each such tunnel is R2. Packet P is known as the 177 tunnel's "payload". 179 1.2. Deficiencies in RFC 5512 181 While the ability to specify tunnel information in a BGP UPDATE is 182 useful, the procedures of [RFC5512] have certain limitations: 184 o The requirement to use the "Encapsulation SAFI" presents an 185 unfortunate operational cost, as each BGP session that may need to 186 carry tunnel encapsulation information needs to be reconfigured to 187 support the Encapsulation SAFI. The Encapsulation SAFI has never 188 been used, and this requirement has served only to discourage the 189 use of the Tunnel Encapsulation attribute. 191 o There is no way to use the Tunnel Encapsulation attribute to 192 specify the tunnel egress endpoint address of a given tunnel; 193 [RFC5512] assumes that the tunnel egress endpoint of each tunnel 194 is specified as the NLRI of an UPDATE of the Encapsulation-SAFI. 196 o If the respective best paths to two different address prefixes 197 have the same next hop, [RFC5512] does not provide a 198 straightforward method to associate each prefix with a different 199 tunnel. 201 o If a particular tunnel type requires an outer IP or UDP 202 encapsulation, there is no way to signal the values of any of the 203 fields of the outer encapsulation. 205 o In [RFC5512]'s specification of the sub-TLVs, each sub-TLV has 206 one-octet length field. In some cases, a two-octet length field 207 may be needed. 209 1.3. Brief Summary of Changes from RFC 5512 211 In this document we address these deficiencies by: 213 o Deprecating the Encapsulation SAFI. 215 o Defining a new "Tunnel Endpoint sub-TLV" that can be included in 216 any of the TLVs contained in the Tunnel Encapsulation attribute. 217 This sub-TLV can be used to specify the remote endpoint address of 218 a particular tunnel. 220 o Allowing the Tunnel Encapsulation attribute to be carried by BGP 221 UPDATEs of additional AFI/SAFIs. Appropriate semantics are 222 provided for this way of using the attribute. 224 o Defining a number of new sub-TLVs that provide additional 225 information that is useful when forming the encapsulation header 226 used to send a packet through a particular tunnel. 228 o Defining the sub-TLV type field so that a sub-TLV whose type is in 229 the range from 0 to 127 inclusive has a one-octet length field, 230 but a sub-TLV whose type is in the range from 128 to 255 inclusive 231 has a two-octet length field. 233 One of the sub-TLVs defined in [RFC5512] is the "Encapsulation sub- 234 TLV". For a given tunnel, the encapsulation sub-TLV specifies some 235 of the information needed to construct the encapsulation header used 236 when sending packets through that tunnel. This document defines 237 encapsulation sub-TLVs for a number of tunnel types not discussed in 238 [RFC5512]: VXLAN (Virtual Extensible Local Area Network, [RFC7348]), 239 VXLAN-GPE (Generic Protocol Extension for VXLAN, 240 [I-D.ietf-nvo3-vxlan-gpe]), NVGRE (Network Virtualization Using 241 Generic Routing Encapsulation [RFC7637]), and MPLS-in-GRE (MPLS in 242 Generic Routing Encapsulation [RFC2784], [RFC2890], [RFC4023]). 243 MPLS-in-UDP [RFC7510] is also supported, but an Encapsulation sub-TLV 244 for it is not needed. 246 Some of the encapsulations mentioned in the previous paragraph need 247 to be further encapsulated inside UDP and/or IP. [RFC5512] provides 248 no way to specify that certain information is to appear in these 249 outer IP and/or UDP encapsulations. This document provides a 250 framework for including such information in the TLVs of the Tunnel 251 Encapsulation attribute. 253 When the Tunnel Encapsulation attribute is attached to a BGP UPDATE 254 whose AFI/SAFI identifies one of the labeled address families, it is 255 not always obvious whether the label embedded in the NLRI is to 256 appear somewhere in the tunnel encapsulation header (and if so, 257 where), or whether it is to appear in the payload, or whether it can 258 be omitted altogether. This is especially true if the tunnel 259 encapsulation header itself contains a "virtual network identifier". 260 This document provides a mechanism that allows one to signal (by 261 using sub-TLVs of the Tunnel Encapsulation attribute) how one wants 262 to use the embedded label when the tunnel encapsulation has its own 263 virtual network identifier field. 265 [RFC5512] defines a Tunnel Encapsulation Extended Community, that can 266 be used instead of the Tunnel Encapsulation attribute under certain 267 circumstances. This document addresses the issue of how to handle a 268 BGP UPDATE that carries both a Tunnel Encapsulation attribute and one 269 or more Tunnel Encapsulation Extended Communities. 271 1.4. Impact on RFC 5566 273 [RFC5566] uses the mechanisms defined in [RFC5512]. While this 274 document obsoletes [RFC5512], it does not address the issue of how to 275 use the mechanisms of [RFC5566] without also using the Encapsulation 276 SAFI. Those issues are considered to be outside the scope of this 277 document. 279 2. The Tunnel Encapsulation Attribute 281 The Tunnel Encapsulation attribute is an optional transitive BGP Path 282 attribute. IANA has assigned the value 23 as the type code of the 283 attribute. The attribute is composed of a set of Type-Length-Value 284 (TLV) encodings. Each TLV contains information corresponding to a 285 particular tunnel type. A TLV is structured as shown in Figure 1: 287 0 1 2 3 288 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 290 | Tunnel Type (2 Octets) | Length (2 Octets) | 291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 | | 293 | Value | 294 | | 295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 297 Figure 1: Tunnel Encapsulation TLV Value Field 299 o Tunnel Type (2 octets): identifies a type of tunnel. The field 300 contains values from the IANA Registry "BGP Tunnel Encapsulation 301 Attribute Tunnel Types". 303 Note that for tunnel types whose names are of the form "X-in-Y", 304 e.g., "MPLS-in-GRE", only packets of the specified payload type 305 "X" are to be carried through the tunnel of type "Y". This is the 306 equivalent of specifying a tunnel type "Y" and including in its 307 TLV a Protocol Type sub-TLV (see Section 3.4.1) specifying 308 protocol "X". If the tunnel type is "X-in-Y", it is unnecessary, 309 though harmless, to include a Protocol Type sub-TLV specifying 310 "X". 312 o Length (2 octets): the total number of octets of the value field. 314 o Value (variable): comprised of multiple sub-TLVs. 316 Each sub-TLV consists of three fields: a 1-octet type, a 1-octet or 317 2-octet length field (depending on the type), and zero or more octets 318 of value. A sub-TLV is structured as shown in Figure 2: 320 +--------------------------------+ 321 | Sub-TLV Type (1 Octet) | 322 +--------------------------------+ 323 | Sub-TLV Length (1 or 2 Octets) | 324 +--------------------------------+ 325 | Sub-TLV Value (Variable) | 326 +--------------------------------+ 328 Table 1: Tunnel Encapsulation Sub-TLV Format 330 o Sub-TLV Type (1 octet): each sub-TLV type defines a certain 331 property about the tunnel TLV that contains this sub-TLV. 333 o Sub-TLV Length (1 or 2 octets): the total number of octets of the 334 sub-TLV value field. The Sub-TLV Length field contains 1 octet if 335 the Sub-TLV Type field contains a value in the range from 0-127. 336 The Sub-TLV Length field contains two octets if the Sub-TLV Type 337 field contains a value in the range from 128-255. 339 o Sub-TLV Value (variable): encodings of the value field depend on 340 the sub-TLV type as enumerated above. The following sub-sections 341 define the encoding in detail. 343 3. Tunnel Encapsulation Attribute Sub-TLVs 345 In this section, we specify a number of sub-TLVs. These sub-TLVs can 346 be included in a TLV of the Tunnel Encapsulation attribute. 348 3.1. The Tunnel Endpoint Sub-TLV 350 The Tunnel Endpoint sub-TLV specifies the address of the endpoint of 351 the tunnel, that is, the address of the router that will decapsulate 352 the payload. It is a sub-TLV whose value field contains three sub- 353 fields: 355 1. a four-octet Autonomous System (AS) number sub-field 357 2. a two-octet Address Family sub-field 359 3. an address sub-field, whose length depends upon the Address 360 Family. 362 0 1 2 3 363 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 365 | Autonomous System Number | 366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 367 | Address Family | Address ~ 368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 369 ~ ~ 370 | | 371 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 373 Figure 2: Tunnel Endpoint Sub-TLV Value Field 375 The Address Family subfield contains a value from IANA's "Address 376 Family Numbers" registry. In this document, we assume that the 377 Address Family is either IPv4 or IPv6; use of other address families 378 is outside the scope of this document. 380 If the Address Family subfield contains the value for IPv4, the 381 address subfield must contain an IPv4 address (a /32 IPv4 prefix). 383 In this case, the length field of Tunnel Endpoint sub-TLV must 384 contain the value 10 (0xa). 386 If the Address Family subfield contains the value for IPv6, the 387 address sub-field must contain an IPv6 address (a /128 IPv6 prefix). 388 In this case, the length field of Tunnel Endpoint sub-TLV must 389 contain the value 22 (0x16). IPv6 link local addresses are not valid 390 values of the IP address field. 392 In a given BGP UPDATE, the address family (IPv4 or IPv6) of a Tunnel 393 Endpoint sub-TLV is independent of the address family of the UPDATE 394 itself. For example, an UPDATE whose NLRI is an IPv4 address may 395 have a Tunnel Encapsulation attribute containing Tunnel Endpoint sub- 396 TLVs that contain IPv6 addresses. Also, different tunnels 397 represented in the Tunnel Encapsulation attribute may have Tunnel 398 Endpoints of different address families. 400 A two-octet AS number can be carried in the AS number field by 401 setting the two high order octets to zero, and carrying the number in 402 the two low order octets of the field. 404 The AS number in the sub-TLV MUST be the number of the AS to which 405 the IP address in the sub-TLV belongs. 407 There is one special case: the Tunnel Endpoint sub-TLV MAY have a 408 value field whose Address Family subfield contains 0. This means 409 that the tunnel's egress endpoint is the UPDATE's BGP next hop. If 410 the Address Family subfield contains 0, the Address subfield is 411 omitted, and the Autonomous System number field is set to 0. 413 If any of the following conditions hold, the Tunnel Endpoint sub-TLV 414 is considered to be "malformed": 416 o The sub-TLV contains the value for IPv4 in its Address Family 417 subfield, but the length of the sub-TLV's value field is other 418 than 10 (0xa). 420 o The sub-TLV contains the value for IPv6 in its Address Family 421 subfield, but the length of the sub-TLV's value field is other 422 than 22 (0x16). 424 o The sub-TLV contains the value zero in its Address Family field, 425 but the length of the sub-TLV's value field is other than 6, or 426 the Autonomous System subfield is not set to zero. 428 o The IP address in the sub-TLV's address subfield is not a valid IP 429 address (e.g., it's an IPv4 broadcast address). 431 o It can be determined that the IP address in the sub-TLV's address 432 subfield does not belong to the non-zero AS whose number is in the 433 its Autonomous System subfield. (See section Section 13 for 434 discussion of one way to determine this.) 436 If the Tunnel Endpoint sub-TLV is malformed, the TLV containing it is 437 also considered to be malformed, and the entire TLV MUST be ignored. 438 However, the Tunnel Encapsulation attribute MUST NOT be considered to 439 be malformed in this case; other TLVs in the attribute MUST be 440 processed (if they can be parsed correctly). 442 When redistributing a route that is carrying a Tunnel Encapsulation 443 attribute containing a TLV that itself contains a malformed Tunnel 444 Endpoint sub-TLV, the TLV MUST be removed from the attribute before 445 redistribution. 447 See Section 11 for further discussion of how to handle errors that 448 are encountered when parsing the Tunnel Encapsulation attribute. 450 If the Tunnel Endpoint sub-TLV contains an IPv4 or IPv6 address that 451 is valid but not reachable, the sub-TLV is NOT considered to be 452 malformed. 454 3.2. Encapsulation Sub-TLVs for Particular Tunnel Types 456 This section defines Tunnel Encapsulation sub-TLVs for the following 457 tunnel types: VXLAN ([RFC7348]), VXLAN-GPE 458 ([I-D.ietf-nvo3-vxlan-gpe]), NVGRE ([RFC7637]), MPLS-in-GRE 459 ([RFC2784], [RFC2890], [RFC4023]), L2TPv3 ([RFC3931]), and GRE 460 ([RFC2784], [RFC2890], [RFC4023]). 462 Rules for forming the encapsulation based on the information in a 463 given TLV are given in Sections 5 and 8. 465 There are also tunnel types for which it is not necessary to define 466 an Encapsulation sub-TLV, because there are no fields in the 467 encapsulation header whose values need to be signaled from the tunnel 468 egress endpoint. 470 3.2.1. VXLAN 472 This document defines an encapsulation sub-TLV for VXLAN tunnels. 473 When the tunnel type is VXLAN, the following is the structure of the 474 value field in the encapsulation sub-TLV: 476 0 1 2 3 477 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 478 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 480 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 | MAC Address (4 Octets) | 482 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 483 | MAC Address (2 Octets) | Reserved | 484 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 486 Figure 3: VXLAN Encapsulation Sub-TLV 488 V: This bit is set to 1 to indicate that a "valid" VN-ID (Virtual 489 Network Identifier) is present in the encapsulation sub-TLV. 490 Please see Section 8. 492 M: This bit is set to 1 to indicate that a valid MAC Address is 493 present in the encapsulation sub-TLV. 495 R: The remaining bits in the 8-bit flags field are reserved for 496 further use. They MUST always be set to 0 by the originator of 497 the sub-TLV. Intermediate routers MUST propagate them without 498 modification. Any receiving routers MUST ignore these bits upon a 499 receipt of the sub-TLV. 501 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 502 ID value. If the V bit is not set, the VN-id field MUST be set to 503 zero. 505 MAC Address: If the M bit is set, this field contains a 6 octet 506 Ethernet MAC address. If the M bit is not set, this field MUST be 507 set to all zeroes. 509 When forming the VXLAN encapsulation header: 511 o The values of the V, M, and R bits are NOT copied into the flags 512 field of the VXLAN header. The flags field of the VXLAN header is 513 set as per [RFC7348]. 515 o If the M bit is set, the MAC Address is copied into the Inner 516 Destination MAC Address field of the Inner Ethernet Header (see 517 section 5 of [RFC7348]). 519 If the M bit is not set, and the payload being sent through the 520 VXLAN tunnel is an ethernet frame, the Destination MAC Address 521 field of the Inner Ethernet Header is just the Destination MAC 522 Address field of the payload's ethernet header. 524 If the M bit is not set, and the payload being sent through the 525 VXLAN tunnel is an IP or MPLS packet, the Inner Destination MAC 526 address field is set to a configured value; if there is no 527 configured value, the VXLAN tunnel cannot be used. 529 o See Section 8 to see how the VNI field of the VXLAN encapsulation 530 header is set. 532 Note that in order to send an IP packet or an MPLS packet through a 533 VXLAN tunnel, the packet must first be encapsulated in an ethernet 534 header, which becomes the "inner ethernet header" described in 535 [RFC7348]. The VXLAN Encapsulation sub-TLV may contain information 536 (e.g.,the MAC address) that is used to form this ethernet header. 538 3.2.2. VXLAN-GPE 540 This document defines an encapsulation sub-TLV for VXLAN tunnels. 541 When the tunnel type is VXLAN-GPE, the following is the structure of 542 the value field in the encapsulation sub-TLV: 544 0 1 2 3 545 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 547 |Ver|V|R|R|R|R|R| Reserved | 548 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 549 | VN-ID | Reserved | 550 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 552 Figure 4: VXLAN GPE Encapsulation Sub-TLV 554 V: This bit is set to 1 to indicate that a "valid" VN-ID is 555 present in the encapsulation sub-TLV. Please see Section 8. 557 R: The bits designated "R" above are reserved for future use. 558 They MUST always be set to 0 by the originator of the sub-TLV. 559 Intermediate routers MUST propagate them without modification. 560 Any receiving routers MUST ignore these bits upon a receipt of the 561 sub-TLV. 563 Version (Ver): Indicates VXLAN GPE protocol version. (See the 564 "Version Bits" section of [I-D.ietf-nvo3-vxlan-gpe].) If the 565 indicated version is not supported, the TLV that contains this 566 Encapsulation sub-TLV MUST be treated as specifying an unsupported 567 tunnel type. The value of this field will be copied into the 568 corresponding field of the VXLAN encapsulation header. 570 VN-ID: If the V bit is set, this field contains a 3 octet VN-ID 571 value. If the V bit is not set, this field MUST be set to zero. 573 When forming the VXLAN-GPE encapsulation header: 575 o The values of the V and R bits are NOT copied into the flags field 576 of the VXLAN-GPE header. However, the values of the Ver bits are 577 copied into the VXLAN-GPE header. Other bits in the flags field 578 of the VXLAN-GPE header are set as per [I-D.ietf-nvo3-vxlan-gpe]. 580 o See Section 8 to see how the VNI field of the VXLAN-GPE 581 encapsulation header is set. 583 3.2.3. NVGRE 585 This document defines an encapsulation sub-TLV for NVGRE tunnels. 586 When the tunnel type is NVGRE, the following is the structure of the 587 value field in the encapsulation sub-TLV: 589 0 1 2 3 590 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 |V|M|R|R|R|R|R|R| VN-ID (3 Octets) | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 | MAC Address (4 Octets) | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 596 | MAC Address (2 Octets) | Reserved | 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 Figure 5: NVGRE Encapsulation Sub-TLV 601 V: This bit is set to 1 to indicate that a "valid" VN-ID is 602 present in the encapsulation sub-TLV. Please see Section 8. 604 M: This bit is set to 1 to indicate that a valid MAC Address is 605 present in the encapsulation sub-TLV. 607 R: The remaining bits in the 8-bit flags field are reserved for 608 further use. They MUST always be set to 0 by the originator of 609 the sub-TLV. Intermediate routers MUST propagate them without 610 modification. Any receiving routers MUST ignore these bits upon a 611 receipt of the sub-TLV. 613 VN-ID: If the V bit is set, the VN-id field contains a 3 octet VN- 614 ID value. If the V bit is not set, the VN-id field MUST be set to 615 zero. 617 MAC Address: If the M bit is set, this field contains a 6 octet 618 Ethernet MAC address. If the M bit is not set, this field MUST be 619 set to all zeroes. 621 When forming the NVGRE encapsulation header: 623 o The values of the V, M, and R bits are NOT copied into the flags 624 field of the NVGRE header. The flags field of the VXLAN header is 625 set as per [RFC7637]. 627 o If the M bit is set, the MAC Address is copied into the Inner 628 Destination MAC Address field of the Inner Ethernet Header (see 629 section 3.2 of [RFC7637]). 631 If the M bit is not set, and the payload being sent through the 632 NVGRE tunnel is an ethernet frame, the Destination MAC Address 633 field of the Inner Ethernet Header is just the Destination MAC 634 Address field of the payload's ethernet header. 636 If the M bit is not set, and the payload being sent through the 637 NVGRE tunnel is an IP or MPLS packet, the Inner Destination MAC 638 address field is set to a configured value; if there is no 639 configured value, the NVGRE tunnel cannot be used. 641 o See Section 8 to see how the VSID (Virtual Subnet Identifier) 642 field of the NVGRE encapsulation header is set. 644 3.2.4. L2TPv3 646 When the tunnel type of the TLV is L2TPv3 over IP, the following is 647 the structure of the value field of the encapsulation sub-TLV: 649 0 1 2 3 650 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 652 | Session ID (4 octets) | 653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 | | 655 | Cookie (Variable) | 656 | | 657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 659 Figure 6: L2TPv3 Encapsulation Sub-TLV 661 Session ID: a non-zero 4-octet value locally assigned by the 662 advertising router that serves as a lookup key in the incoming 663 packet's context. 665 Cookie: an optional, variable length (encoded in octets -- 0 to 8 666 octets) value used by L2TPv3 to check the association of a 667 received data message with the session identified by the Session 668 ID. Generation and usage of the cookie value is as specified in 669 [RFC3931]. 671 The length of the cookie is not encoded explicitly, but can be 672 calculated as (sub-TLV length - 4). 674 3.2.5. GRE 676 When the tunnel type of the TLV is GRE, the following is the 677 structure of the value field of the encapsulation sub-TLV: 679 0 1 2 3 680 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 682 | GRE Key (4 octets) | 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 685 Figure 7: GRE Encapsulation Sub-TLV 687 GRE Key: 4-octet field [RFC2890] that is generated by the 688 advertising router. The actual method by which the key is 689 obtained is beyond the scope of this document. The key is 690 inserted into the GRE encapsulation header of the payload packets 691 sent by ingress routers to the advertising router. It is intended 692 to be used for identifying extra context information about the 693 received payload. 695 Note that the key is optional. Unless a key value is being 696 advertised, the GRE encapsulation sub-TLV MUST NOT be present. 698 3.2.6. MPLS-in-GRE 700 When the tunnel type is MPLS-in-GRE, the following is the structure 701 of the value field in an optional encapsulation sub-TLV: 703 0 1 2 3 704 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 706 | GRE-Key (4 Octets) | 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 709 Figure 8: MPLS-in-GRE Encapsulation Sub-TLV 711 GRE-Key: 4-octet field [RFC2890] that is generated by the 712 advertising router. The actual method by which the key is 713 obtained is beyond the scope of this document. The key is 714 inserted into the GRE encapsulation header of the payload packets 715 sent by ingress routers to the advertising router. It is intended 716 to be used for identifying extra context information about the 717 received payload. Note that the key is optional. Unless a key 718 value is being advertised, the MPLS-in-GRE encapsulation sub-TLV 719 MUST NOT be present. 721 Note that the GRE tunnel type defined in Section 3.2.5 can be used 722 instead of the MPLS-in-GRE tunnel type when it is necessary to 723 encapsulate MPLS in GRE. Including a TLV of the MPLS-in-GRE tunnel 724 type is equivalent to including a TLV of the GRE tunnel type that 725 also includes a Protocol Type sub-TLV (Section 3.4.1) specifying MPLS 726 as the protocol to be encapsulated. That is, if a TLV specifies 727 MPLS-in-GRE or if it includes a Protocol Type sub-TLV specifying 728 MPLS, the GRE tunnel advertised in that TLV MUST NOT be used for 729 carrying IP packets. 731 While it is not really necessary to have both the GRE and MPLS-in-GRE 732 tunnel types, both are included for reasons of backwards 733 compatibility. 735 3.2.7. IP-in-IP 737 When the tunnel type of the TLV is IP-in-IP, it does not have Virtual 738 Network Identifier. See for Section 8.1 Embedded Label handling on 739 IP-in-IP tunnels. 741 3.3. Outer Encapsulation Sub-TLVs 743 The Encapsulation sub-TLV for a particular tunnel type allows one to 744 specify the values that are to be placed in certain fields of the 745 encapsulation header for that tunnel type. However, some tunnel 746 types require an outer IP encapsulation, and some also require an 747 outer UDP encapsulation. The Encapsulation sub-TLV for a given 748 tunnel type does not usually provide a way to specify values for 749 fields of the outer IP and/or UDP encapsulations. If it is necessary 750 to specify values for fields of the outer encapsulation, additional 751 sub-TLVs must be used. This document defines two such sub-TLVs. 753 If an outer encapsulation sub-TLV occurs in a TLV for a tunnel type 754 that does not use the corresponding outer encapsulation, the sub-TLV 755 is treated as if it were an unknown type of sub-TLV. 757 3.3.1. IPv4 DS Field 759 Most of the tunnel types that can be specified in the Tunnel 760 Encapsulation attribute require an outer IP encapsulation. The IPv4 761 Differentiated Services (DS) Field sub-TLV can be carried in the TLV 762 of any such tunnel type. It specifies the setting of the one-octet 763 Differentiated Services field in the outer IP encapsulation (see 764 [RFC2474]). The value field is always a single octet. 766 3.3.2. UDP Destination Port 768 Some of the tunnel types that can be specified in the Tunnel 769 Encapsulation attribute require an outer UDP encapsulation. 770 Generally there is a standard UDP Destination Port value for a 771 particular tunnel type. However, sometimes it is useful to be able 772 to use a non-standard UDP destination port. If a particular tunnel 773 type requires an outer UDP encapsulation, and it is desired to use a 774 UDP destination port other than the standard one, the port to be used 775 can be specified by including a UDP Destination Port sub-TLV. The 776 value field of this sub-TLV is always a two-octet field, containing 777 the port value. 779 3.4. Sub-TLVs for Aiding Tunnel Selection 781 3.4.1. Protocol Type Sub-TLV 783 The protocol type sub-TLV MAY be included in a given TLV to indicate 784 the type of the payload packets that may be encapsulated with the 785 tunnel parameters that are being signaled in the TLV. The value 786 field of the sub-TLV contains a 2-octet value from IANA's ethertype 787 registry [Ethertypes]. 789 For example, if we want to use three L2TPv3 sessions, one carrying 790 IPv4 packets, one carrying IPv6 packets, and one carrying MPLS 791 packets, the egress router will include three TLVs of L2TPv3 792 encapsulation type, each specifying a different Session ID and a 793 different payload type. The protocol type sub-TLV for these will be 794 IPv4 (protocol type = 0x0800), IPv6 (protocol type = 0x86dd), and 795 MPLS (protocol type = 0x8847), respectively. This informs the 796 ingress routers of the appropriate encapsulation information to use 797 with each of the given protocol types. Insertion of the specified 798 Session ID at the ingress routers allows the egress to process the 799 incoming packets correctly, according to their protocol type. 801 3.4.2. Color Sub-TLV 803 The color sub-TLV MAY be encoded as a way to "color" the 804 corresponding tunnel TLV. The value field of the sub-TLV is eight 805 octets long, and consists of a Color Extended Community, as defined 806 in Section 4.3. For the use of this sub-TLV and Extended Community, 807 please see Section 7. 809 Note that the high-order octet of this sub-TLV's value field MUST be 810 set to 3, and the next octet MUST be set to 0x0b. (Otherwise the 811 value field is not identical to a Color Extended Community.) 813 If a Color sub-TLV is not of the proper length, or the first two 814 octets of its value field are not 0x030b, the sub-TLV should be 815 treated as if it were an unrecognized sub-TLV (see Section 11). 817 3.5. Embedded Label Handling Sub-TLV 819 Certain BGP address families (corresponding to particular AFI/SAFI 820 pairs, e.g., 1/4, 2/4, 1/128, 2/128) have MPLS labels embedded in 821 their NLRIs. We will use the term "embedded label" to refer to the 822 MPLS label that is embedded in an NLRI, and the term "labeled address 823 family" to refer to any AFI/SAFI that has embedded labels. 825 Some of the tunnel types (e.g., VXLAN, VXLAN-GPE, and NVGRE) that can 826 be specified in the Tunnel Encapsulation attribute have an 827 encapsulation header containing "Virtual Network" identifier of some 828 sort. The Encapsulation sub-TLVs for these tunnel types may 829 optionally specify a value for the virtual network identifier. 831 Suppose a Tunnel Encapsulation attribute is attached to an UPDATE of 832 an embedded address family, and it is decided to use a particular 833 tunnel (specified in one of the attribute's TLVs) for transmitting a 834 packet that is being forwarded according to that UPDATE. When 835 forming the encapsulation header for that packet, different 836 deployment scenarios require different handling of the embedded label 837 and/or the virtual network identifier. The Embedded Label Handling 838 sub-TLV can be used to control the placement of the embedded label 839 and/or the virtual network identifier in the encapsulation. 841 The Embedded Label Handling sub-TLV may be included in any TLV of the 842 Tunnel Encapsulation attribute. If the Tunnel Encapsulation 843 attribute is attached to an UPDATE of a non-labeled address family, 844 the sub-TLV is treated as a no-op. If the sub-TLV is contained in a 845 TLV whose tunnel type does not have a virtual network identifier in 846 its encapsulation header, the sub-TLV is treated as a no-op. In 847 those cases where the sub-TLV is treated as a no-op, it SHOULD NOT be 848 stripped from the TLV before the UPDATE is forwarded. 850 The sub-TLV's Length field always contains the value 1, and its value 851 field consists of a single octet. The following values are defined: 853 1: The payload will be an MPLS packet with the embedded label at the 855 top of its label stack. 857 2: The embedded label is not carried in the payload, but is carried 858 either in the virtual network identifier field of the 859 encapsulation header, or else is ignored entirely. 861 Please see Section 8 for the details of how this sub-TLV is used when 862 it is carried by an UPDATE of a labeled address family. 864 3.6. MPLS Label Stack Sub-TLV 866 This sub-TLV allows an MPLS label stack ([RFC3032]) to be associated 867 with a particular tunnel. 869 The value field of this sub-TLV is a sequence of MPLS label stack 870 entries. The first entry in the sequence is the "topmost" label, the 871 final entry in the sequence is the "bottommost" label. When this 872 label stack is pushed onto a packet, this ordering MUST be preserved. 874 Each label stack entry has the following format: 876 0 1 2 3 877 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 879 | Label | TC |S| TTL | 880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 882 Figure 9: MPLS Label Stack Sub-TLV 884 If a packet is to be sent through the tunnel identified in a 885 particular TLV, and if that TLV contains an MPLS Label Stack sub-TLV, 886 then the label stack appearing in the sub-TLV MUST be pushed onto the 887 packet. This label stack MUST be pushed onto the packet before any 888 other labels are pushed onto the packet. 890 In particular, if the Tunnel Encapsulation attribute is attached to a 891 BGP UPDATE of a labeled address family, the contents of the MPLS 892 Label Stack sub-TLV MUST be pushed onto the packet before the label 893 embedded in the NLRI is pushed onto the packet. 895 If the MPLS label stack sub-TLV is included in a TLV identifying a 896 tunnel type that uses virtual network identifiers (see Section 8), 897 the contents of the MPLS label stack sub-TLV MUST be pushed onto the 898 packet before the procedures of Section 8 are applied. 900 The number of label stack entries in the sub-TLV MUST be determined 901 from the sub-TLV length field. Thus it is not necessary to set the S 902 bit in any of the label stack entries of the sub-TLV, and the setting 903 of the S bit is ignored when parsing the sub-TLV. When the label 904 stack entries are pushed onto a packet that already has a label 905 stack, the S bits of all the entries MUST be cleared. When the label 906 stack entries are pushed onto a packet that does not already have a 907 label stack, the S bit of the bottommost label stack entry MUST be 908 set, and the S bit of all the other label stack entries MUST be 909 cleared. 911 By default, the TC (Traffic Class) field ([RFC3032], [RFC5462]) of 912 each label stack entry is set to 0. This may of course be changed by 913 policy at the originator of the sub-TLV. When pushing the label 914 stack onto a packet, the TC of the label stack entries is preserved 915 by default. However, local policy at the router that is pushing on 916 the stack MAY cause modification of the TC values. 918 By default, the TTL (Time to Live) field of each label stack entry is 919 set to 255. This may be changed by policy at the originator of the 920 sub-TLV. When pushing the label stack onto a packet, the TTL of the 921 label stack entries is preserved by default. However, local policy 922 at the router that is pushing on the stack MAY cause modification of 923 the TTL values. If any label stack entry in the sub-TLV has a TTL 924 value of zero, the router that is pushing the stack on a packet MUST 925 change the value to a non-zero value. 927 Note that this sub-TLV can appear within a TLV identifying any type 928 of tunnel, not just within a TLV identifying an MPLS tunnel. 929 However, if this sub-TLV appears within a TLV identifying an MPLS 930 tunnel (or an MPLS-in-X tunnel), this sub-TLV plays the same role 931 that would be played by an MPLS Encapsulation sub-TLV. Therefore, an 932 MPLS Encapsulation sub-TLV is not defined. 934 3.7. Prefix-SID Sub-TLV 936 [I-D.ietf-idr-bgp-prefix-sid] defines a BGP Path attribute known as 937 the "Prefix-SID Attribute". This attribute is defined to contain a 938 sequence of one or more TLVs, where each TLV is either a "Label- 939 Index" TLV, an "IPv6 SID (Segment Identifier)" TLV, or an "Originator 940 SRGB (Source Routing Global Block)" TLV. 942 In this document, we define a Prefix-SID sub-TLV. The value field of 943 the Prefix-SID sub-TLV can be set to any valid value of the value 944 field of a BGP Prefix-SID attribute, as defined in 945 [I-D.ietf-idr-bgp-prefix-sid]. 947 The Prefix-SID sub-TLV can occur in a TLV identifying any type of 948 tunnel. If an Originator SRGB is specified in the sub-TLV, that SRGB 949 MUST be interpreted to be the SRGB used by the tunnel's egress 950 endpoint. The Label-Index, if present, is the Segment Routing SID 951 that the tunnel's egress endpoint uses to represent the prefix 952 appearing in the NLRI field of the BGP UPDATE to which the Tunnel 953 Encapsulation attribute is attached. 955 If a Label-Index is present in the prefix-SID sub-TLV, then when a 956 packet is sent through the tunnel identified by the TLV, the 957 corresponding MPLS label MUST be pushed on the packet's label stack. 958 The corresponding MPLS label is computed from the Label-Index value 959 and the SRGB of the route's originator. 961 If the Originator SRGB is not present, it is assumed that the 962 originator's SRGB is known by other means. Such "other means" are 963 outside the scope of this document. 965 The corresponding MPLS label is pushed on after the processing of the 966 MPLS Label Stack sub-TLV, if present, as specified in Section 3.6. 967 It is pushed on before any other labels (e.g., a label embedded in 968 UPDATE's NLRI, or a label determined by the procedures of Section 8 969 are pushed on the stack. 971 The Prefix-SID sub-TLV has slightly different semantics than the 972 Prefix-SID attribute. When the Prefix-SID attribute is attached to a 973 given route, the BGP speaker that originally attached the attribute 974 is expected to be in the same Segment Routing domain as the BGP 975 speakers who receive the route with the attached attribute. The 976 Label-Index tells the receiving BGP speakers that the prefix-SID is 977 for the advertised prefix in that Segment Routing domain. When the 978 Prefix-SID sub-TLV is used, the BGP speaker at the head end of the 979 tunnel need even not be in the same Segment Routing Domain as the 980 tunnel's egress endpoint, and there is no implication that the 981 prefix-SID for the advertised prefix is the same in the Segment 982 Routing domains of the BGP speaker that originated the sub-TLV and 983 the BGP speaker that received it. 985 4. Extended Communities Related to the Tunnel Encapsulation Attribute 987 4.1. Encapsulation Extended Community 989 The Encapsulation Extended Community is a Transitive Opaque Extended 990 Community. This Extended Community may be attached to a route of any 991 AFI/SAFI to which the Tunnel Encapsulation attribute may be attached. 992 Each such Extended Community identifies a particular tunnel type. If 993 the Encapsulation Extended Community identifies a particular tunnel 994 type, its semantics are exactly equivalent to the semantics of a 995 Tunnel Encapsulation attribute Tunnel TLV for which the following 996 three conditions all hold: 998 1. it identifies the same tunnel type, 999 2. it has a Tunnel Endpoint sub-TLV for which one of the following 1000 two conditions holds: 1002 A. its "Address Family" subfield contains zero, or 1004 B. its "Address" subfield contains the same IP address that 1005 appears in the next hop field of the route to which the 1006 Tunnel Encapsulation attribute is attached 1008 3. it has no other sub-TLVs. 1010 We will refer to such a Tunnel TLV as a "barebones" Tunnel TLV. 1012 The Encapsulation Extended Community was first defined in [RFC5512]. 1013 While it provides only a small subset of the functionality of the 1014 Tunnel Encapsulation attribute, it is used in a number of deployed 1015 applications, and is still needed for backwards compatibility. To 1016 ensure backwards compatibility, this specification establishes the 1017 following rules: 1019 1. If the Tunnel Encapsulation attribute of a given route contains a 1020 barebones Tunnel TLV identifying a particular tunnel type, an 1021 Encapsulation Extended Community identifying the same tunnel type 1022 SHOULD be attached to the route. 1024 2. If the Encapsulation Extended Community identifying a particular 1025 tunnel type is attached to a given route, the corresponding 1026 barebones Tunnel TLV MAY be omitted from the Tunnel Encapsulation 1027 attribute. 1029 3. Suppose a particular route has both (a) an Encapsulation Extended 1030 Community specifying a particular tunnel type, and (b) a Tunnel 1031 Encapsulation attribute with a barebones Tunnel TLV specifying 1032 that same tunnel type. Both (a) and (b) MUST be interpreted as 1033 denoting the same tunnel. 1035 In short, in situations where one could use either the Encapsulation 1036 Extended Community or a barebones Tunnel TLV, one may use either or 1037 both. However, to ensure backwards compatibility with applications 1038 that do not support the Tunnel Encapsulation attribute, it is 1039 preferable to use the Encapsulation Extended Community. If the 1040 Extended Community (identifying a particular tunnel type) is present, 1041 the corresponding Tunnel TLV is optional. 1043 Note that for tunnel types of the form "X-in-Y", e.g., MPLS-in-GRE, 1044 the Encapsulation Extended Community implies that only packets of the 1045 specified payload type "X" are to be carried through the tunnel of 1046 type "Y". 1048 In the remainder of this specification, when we speak of a route as 1049 containing a Tunnel Encapsulation attribute with a TLV identifying a 1050 particular tunnel type, we are implicitly including the case where 1051 the route contains a Tunnel Encapsulation Extended Community 1052 identifying that tunnel type. 1054 4.2. Router's MAC Extended Community 1056 [I-D.ietf-bess-evpn-inter-subnet-forwarding] defines a Router's MAC 1057 Extended Community. This Extended Community provides information 1058 that may conflict with information in one or more of the 1059 Encapsulation Sub-TLVs of a Tunnel Encapsulation attribute. In case 1060 of such a conflict, the information in the Encapsulation Sub-TLV 1061 takes precedence. 1063 4.3. Color Extended Community 1065 The Color Extended Community is a Transitive Opaque Extended 1066 Community with the following encoding: 1068 0 1 2 3 1069 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1071 | 0x03 | 0x0b | Reserved | 1072 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1073 | Color Value | 1074 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1076 Figure 10: Color Extended Community 1078 For the use of this Extended Community please see Section 7. 1080 5. Semantics and Usage of the Tunnel Encapsulation attribute 1082 [RFC5512] specifies the use of the Tunnel Encapsulation attribute in 1083 BGP UPDATE messages of AFI/SAFI 1/7 and 2/7. That document restricts 1084 the use of this attribute to UPDATE messsages of those SAFIs. This 1085 document removes that restriction. 1087 The BGP Tunnel Encapsulation attribute MAY be carried in any BGP 1088 UPDATE message whose AFI/SAFI is 1/1 (IPv4 Unicast), 2/1 (IPv6 1089 Unicast), 1/4 (IPv4 Labeled Unicast), 2/4 (IPv6 Labeled Unicast), 1090 1/128 (VPN-IPv4 Labeled Unicast), 2/128 (VPN-IPv6 Labeled Unicast), 1091 or 25/70 (Ethernet VPN, usually known as EVPN)). Use of the Tunnel 1092 Encapsulation attribute in BGP UPDATE messages of other AFI/SAFIs is 1093 outside the scope of this document. 1095 It has been suggested that it may sometimes be useful to attach a 1096 Tunnel Encapsulation attribute to a BGP UPDATE message that is also 1097 carrying a PMSI (Provider Multicast Service Interface) Tunnel 1098 attribute [RFC6514]. If the PMSI Tunnel attribute specifies an IP 1099 tunnel, the Tunnel Encapsulation attribute could be used to provide 1100 additional information about the IP tunnel. The usage of the Tunnel 1101 Encapsulation attribute in combination with the PMSI Tunnel attribute 1102 is outside the scope of this document. 1104 The decision to attach a Tunnel Encapsulation attribute to a given 1105 BGP UPDATE is determined by policy. The set of TLVs and sub-TLVs 1106 contained in the attribute is also determined by policy. 1108 When the Tunnel Encapsulation attribute is carried in an UPDATE of 1109 one of the AFI/SAFIs specified in the previous paragraph, each TLV 1110 MUST have a Tunnel Endpoint sub-TLV. If a TLV that does not have a 1111 Tunnel Endpoint sub-TLV, that TLV should be treated as if it had a 1112 malformed Tunnel Endpoint sub-TLV (see Section 3.1). 1114 Suppose that: 1116 o a given packet P must be forwarded by router R; 1118 o the path along which P is to be forwarded is determined by BGP 1119 UPDATE U; 1121 o UPDATE U has a Tunnel Encapsulation attribute, containing at least 1122 one TLV that identifies a "feasible tunnel" for packet P. A 1123 tunnel is considered feasible if it has the following three 1124 properties: 1126 * The tunnel type is supported (i.e., router R knows how to set 1127 up tunnels of that type, how to create the encapsulation header 1128 for tunnels of that type, etc.) 1130 * The tunnel is of a type that can be used to carry packet P 1131 (e.g., an MPLS-in-UDP tunnel would not be a feasible tunnel for 1132 carrying an IP packet, UNLESS the IP packet can first be 1133 converted to an MPLS packet). 1135 * The tunnel is specified in a TLV whose Tunnel Endpoint sub-TLV 1136 identifies an IP address that is reachable. 1138 Then router R MUST send packet P through one of the feasible tunnels 1139 identified in the Tunnel Encapsulation attribute of UPDATE U. 1141 If the Tunnel Encapsulation attribute contains several TLVs (i.e., if 1142 it specifies several tunnels), router R may choose any one of those 1143 tunnels, based upon local policy. If any tunnel TLV contains one or 1144 more Color sub-TLVs (Section 3.4.2) and/or the Protocol Type sub-TLV 1145 (Section 3.4.1), the choice of tunnel may be influenced by these sub- 1146 TLVs. 1148 If a particular tunnel is not feasible at some moment because its 1149 Tunnel Endpoint cannot be reached at that moment, the tunnel may 1150 become feasible at a later time (when its endpoint becomes 1151 reachable). Router R should take note of this. If router R is 1152 already using a different tunnel, it MAY switch to the tunnel that 1153 just became feasible, or it MAY decide to continue using the tunnel 1154 that it is already using. How this decision is made is outside the 1155 scope of this document. 1157 In addition to the sub-TLVs already defined, additional sub-TLVs may 1158 be defined that affect the choice of tunnel to be used, or that 1159 affect the contents of the tunnel encapsulation header. The 1160 documents that define any such additional sub-TLVs must specify the 1161 effect that including the sub-TLV is to have. 1163 Once it is determined to send a packet through the tunnel specified 1164 in a particular TLV of a particular Tunnel Encapsulation attribute, 1165 then the tunnel's egress endpoint address is the IP address contained 1166 in the sub-TLV. If the TLV contains a Tunnel Endpoint sub-TLV whose 1167 value field is all zeroes, then the tunnel's egress endpoint is the 1168 IP address specified as the Next Hop of the BGP Update containing the 1169 Tunnel Encapsulation attribute. The address of the tunnel egress 1170 endpoint generally appears in a "destination address" field of the 1171 encapsulation. 1173 The full set of procedures for sending a packet through a particular 1174 tunnel type to a particular tunnel egress endpoint depends upon the 1175 tunnel type, and is outside the scope of this document. Note that 1176 some tunnel types may require the execution of an explicit tunnel 1177 setup protocol before they can be used for carrying data. Other 1178 tunnel types may not require any tunnel setup protocol. 1180 Sending a packet through a tunnel always requires that the packet be 1181 encapsulated, with an encapsulation header that is appropriate for 1182 the tunnel type. The contents of the tunnel encapsulation header MAY 1183 be influenced by the Encapsulation sub-TLV. If there is no 1184 Encapsulation sub-TLV present, the router transmitting the packet 1185 through the tunnel must have a priori knowledge (e.g., by 1186 provisioning) of how to fill in the various fields in the 1187 encapsulation header. 1189 Whenever a new Tunnel Type TLV is defined, the specification of that 1190 TLV should describe (or reference) the procedures for creating the 1191 encapsulation header used to forward packets through that tunnel 1192 type. If a tunnel type codepoint is assigned in the IANA "BGP Tunnel 1193 Encapsulation Tunnel Types" registry, but there is no corresponding 1194 specification that defines an Encapsulation sub-TLV for that tunnel 1195 type, the transmitting endpoint of such a tunnel is presumed to know 1196 a priori how to form the encapsulation header for that tunnel type. 1198 If a Tunnel Encapsulation attribute specifies several tunnels, the 1199 way in which a router chooses which one to use is a matter of policy, 1200 subject to the following constraint: if a router can determine that a 1201 given tunnel is not functional, it MUST NOT use that tunnel. In 1202 particular, if the tunnel is identified in a TLV that has a Tunnel 1203 Endpoint sub-TLV, and if the IP address specified in the sub-TLV is 1204 not reachable from router R, then the tunnel MUST be considered non- 1205 functional. Other means of determining whether a given tunnel is 1206 functional MAY be used; specification of such means is outside the 1207 scope of this specification. Of course, if a non-functional tunnel 1208 later becomes functional, router R SHOULD reevaluate its choice of 1209 tunnels. 1211 If router R determines that it cannot use any of the tunnels 1212 specified in the Tunnel Encapsulation attribute, it MAY either drop 1213 packet P, or it MAY transmit packet P as it would had the Tunnel 1214 Encapsulation attribute not been present. This is a matter of local 1215 policy. By default, the packet SHOULD be transmitted as if the 1216 Tunnel Encapsulation attribute had not been present. 1218 A Tunnel Encapsulation attribute may contain several TLVs that all 1219 specify the same tunnel type. Each TLV should be considered as 1220 specifying a different tunnel. Two tunnels of the same type may have 1221 different Tunnel Endpoint sub-TLVs, different Encapsulation sub-TLVs, 1222 etc. Choosing between two such tunnels is a matter of local policy. 1224 Once router R has decided to send packet P through a particular 1225 tunnel, it encapsulates packet P appropriately and then forwards it 1226 according to the route that leads to the tunnel's egress endpoint. 1227 This route may itself be a BGP route with a Tunnel Encapsulation 1228 attribute. If so, the encapsulated packet is treated as the payload 1229 and is encapsulated according to the Tunnel Encapsulation attribute 1230 of that route. That is, tunnels may be "stacked". 1232 Notwithstanding anything said in this document, a BGP speaker MAY 1233 have local policy that influences the choice of tunnel, and the way 1234 the encapsulation is formed. A BGP speaker MAY also have a local 1235 policy that tells it to ignore the Tunnel Encapsulation attribute 1236 entirely or in part. Of course, interoperability issues must be 1237 considered when such policies are put into place. 1239 6. Routing Considerations 1241 6.1. Impact on BGP Decision Process 1243 The presence of the Tunnel Encapsulation attribute affects the BGP 1244 bestpath selection algorithm. For all the tunnels described in the 1245 Tunnel Encapsulation attribute for a path, if no Tunnel Endpoint 1246 address is feasible, then that path MUST NOT be considered resolvable 1247 for the purposes of Route Resolvability Condition [RFC4271] section 1248 9.1.2.1. 1250 6.2. Looping, Infinite Stacking, Etc. 1252 Consider a packet destined for address X. Suppose a BGP UPDATE for 1253 address prefix X carries a Tunnel Encapsulation attribute that 1254 specifies a tunnel egress endpoint of Y. And suppose that a BGP 1255 UPDATE for address prefix Y carries a Tunnel Encapsulation attribute 1256 that specifies a Tunnel Endpoint of X. It is easy to see that this 1257 will cause an infinite number of encapsulation headers to be put on 1258 the given packet. 1260 This could happen as a result of misconfiguration, either accidental 1261 or intentional. It could also happen if the Tunnel Encapsulation 1262 attribute were altered by a malicious agent. Implementations should 1263 be aware of this. This document does not specify a maximum number of 1264 recursions; that is an implementation-specific matter. 1266 Improper setting (or malicious altering) of the Tunnel Encapsulation 1267 attribute could also cause data packets to loop. Suppose a BGP 1268 UPDATE for address prefix X carries a Tunnel Encapsulation attribute 1269 that specifies a tunnel egress endpoint of Y. Suppose router R 1270 receives and processes the update. When router R receives a packet 1271 destined for X, it will apply the encapsulation and send the 1272 encapsulated packet to Y. Y will decapsulate the packet and forward 1273 it further. If Y is further away from X than is router R, it is 1274 possible that the path from Y to X will traverse R. This would cause 1275 a long-lasting routing loop. The control plane itself cannot detect 1276 this situation, though a TTL field in the payload packets would 1277 presumably prevent any given packet from looping infinitely. 1279 These possibilities must also be kept in mind whenever the Tunnel 1280 Endpoint for a given prefix differs from the BGP next hop for that 1281 prefix. 1283 7. Recursive Next Hop Resolution 1285 Suppose that: 1287 o a given packet P must be forwarded by router R1; 1289 o the path along which P is to be forwarded is determined by BGP 1290 UPDATE U1; 1292 o UPDATE U1 does not have a Tunnel Encapsulation attribute; 1294 o the next hop of UPDATE U1 is router R2; 1296 o the best path to router R2 is a BGP route that was advertised in 1297 UPDATE U2; 1299 o UPDATE U2 has a Tunnel Encapsulation attribute. 1301 Then packet P MUST be sent through one of the tunnels identified in 1302 the Tunnel Encapsulation attribute of UPDATE U2. See Section 5 for 1303 further details. 1305 However, suppose that one of the TLVs in U2's Tunnel Encapsulation 1306 attribute contains the Color Sub-TLV. In that case, packet P MUST 1307 NOT be sent through the tunnel identified in that TLV, unless U1 is 1308 carrying the Color Extended Community that is identified in U2's 1309 Color Sub-TLV. 1311 Note that if UPDATE U1 and UPDATE U2 both have Tunnel Encapsulation 1312 attributes, packet P will be carried through a pair of nested 1313 tunnels. P will first be encapsulated based on the Tunnel 1314 Encapsulation attribute of U1. This encapsulated packet then becomes 1315 the payload, and is encapsulated based on the Tunnel Encapsulation 1316 attribute of U2. This is another way of "stacking" tunnels (see also 1317 Section 5). 1319 The procedures in this section presuppose that U1's next hop resolves 1320 to a BGP route, and that U2's next hop resolves (perhaps after 1321 further recursion) to a non-BGP route. 1323 8. Use of Virtual Network Identifiers and Embedded Labels when Imposing 1324 a Tunnel Encapsulation 1326 If the TLV specifying a tunnel contains an MPLS Label Stack sub-TLV, 1327 then when sending a packet through that tunnel, the procedures of 1328 Section 3.6 are applied before the procedures of this section. 1330 If the TLV specifying a tunnel contains a Prefix-SID sub-TLV, the 1331 procedures of Section 3.7 are applied before the procedures of this 1332 section. If the TLV also contains an MPLS Label Stack sub-TLV, the 1333 procedures of Section 3.6 are applied before the procedures of 1334 Section 3.7. 1336 8.1. Tunnel Types without a Virtual Network Identifier Field 1338 If a Tunnel Encapsulation attribute is attached to an UPDATE of a 1339 labeled address family, there will be one or more labels specified in 1340 the UPDATE's NLRI. 1342 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1343 is 1, the label or labels from the NLRI are pushed on the packet's 1344 label stack. 1346 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1347 if it contains an Embedded Label Handling sub-TLV whose value is 1348 2, the embedded label is ignored completely. The tunnel is 1349 assumed to have terminated at the corresponding VRF. 1351 The resulting MPLS packet is then further encapsulated, as specified 1352 by the TLV. 1354 8.2. Tunnel Types with a Virtual Network Identifier Field 1356 Three of the tunnel types that can be specified in a Tunnel 1357 Encapsulation TLV have virtual network identifier fields in their 1358 encapsulation headers. In the VXLAN and VXLAN-GPE encapsulations, 1359 this field is called the VNI (Virtual Network Identifier) field; in 1360 the NVGRE encapsulation, this field is called the VSID (Virtual 1361 Subnet Identifier) field. 1363 When one of these tunnel encapsulations is imposed on a packet, the 1364 setting of the virtual network identifier field in the encapsulation 1365 header depends upon the contents of the Encapsulation sub-TLV (if one 1366 is present). When the Tunnel Encapsulation attribute is being 1367 carried on a BGP UPDATE of a labeled address family, the setting of 1368 the virtual network identifier field also depends upon the contents 1369 of the Embedded Label Handling sub-TLV (if present). 1371 This section specifies the procedures for choosing the value to set 1372 in the virtual network identifier field of the encapsulation header. 1373 These procedures apply only when the tunnel type is VXLAN, VXLAN-GPE, 1374 or NVGRE. 1376 8.2.1. Unlabeled Address Families 1378 This sub-section applies when: 1380 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of 1381 an unlabeled address family, and 1383 o at least one of the attribute's TLVs identifies a tunnel type that 1384 uses a virtual network identifier, and 1386 o it has been determined to send a packet through one of those 1387 tunnels. 1389 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 1390 whose V bit is set, the virtual network identifier field of the 1391 encapsulation header is set to the value of the virtual network 1392 identifier field of the Encapsulation sub-TLV. 1394 Otherwise, the virtual network identifier field of the encapsulation 1395 header is set to a configured value; if there is no configured value, 1396 the tunnel cannot be used. 1398 8.2.2. Labeled Address Families 1400 This sub-section applies when: 1402 o the Tunnel Encapsulation attribute is carried on a BGP UPDATE of a 1403 labeled address family, and 1405 o at least one of the attribute's TLVs identifies a tunnel type that 1406 uses a virtual network identifier, and 1408 o it has been determined to send a packet through one of those 1409 tunnels. 1411 8.2.2.1. When a Valid VNI has been Signaled 1413 If the TLV identifying the tunnel contains an Encapsulation sub-TLV 1414 whose V bit is set, the virtual network identifier field of the 1415 encapsulation header is set as follows: 1417 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1418 is 1, then the virtual network identifier field of the 1419 encapsulation header is set to the value of the virtual network 1420 identifier field of the Encapsulation sub-TLV. 1422 The embedded label (from the NLRI of the route that is carrying 1423 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1424 label stack in the encapsulation payload. 1426 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1427 if contains an Embedded Label Handling sub-TLV whose value is 2, 1428 the embedded label is ignored entirely, and the virtual network 1429 identifier field of the encapsulation header is set to the value 1430 of the virtual network identifier field of the Encapsulation sub- 1431 TLV. 1433 8.2.2.2. When a Valid VNI has not been Signaled 1435 If the TLV identifying the tunnel does not contain an Encapsulation 1436 sub-TLV whose V bit is set, the virtual network identifier field of 1437 the encapsulation header is set as follows: 1439 o If the TLV contains an Embedded Label Handling sub-TLV whose value 1440 is 1, then the virtual network identifier field of the 1441 encapsulation header is set to a configured value. 1443 If there is no configured value, the tunnel cannot be used. 1445 The embedded label (from the NLRI of the route that is carrying 1446 the Tunnel Encapsulation attribute) appears at the top of the MPLS 1447 label stack in the encapsulation payload. 1449 o If the TLV does not contain an Embedded Label Handling sub-TLV, or 1450 if it contains an Embedded Label Handling sub-TLV whose value is 1451 2, the embedded label is copied into the virtual network 1452 identifier field of the encapsulation header. 1454 In this case, the payload may or may not contain an MPLS label 1455 stack, depending upon other factors. If the payload does contain 1456 an MPLS label stack, the embedded label does not appear in that 1457 stack. 1459 9. Applicability Restrictions 1461 In a given UPDATE of a labeled address family, the label embedded in 1462 the NLRI is generally a label that is meaningful only to the router 1463 whose address appears as the next hop. Certain of the procedures of 1464 Section 8.2.2.1 or Section 8.2.2.2 cause the embedded label to be 1465 carried by a data packet to the router whose address appears in the 1466 Tunnel Endpoint sub-TLV. If the Tunnel Endpoint sub-TLV does not 1467 identify the same router that is the next hop, sending the packet 1468 through the tunnel may cause the label to be misinterpreted at the 1469 tunnel's egress endpoint. This may cause misdelivery of the packet. 1471 Therefore the embedded label MUST NOT be carried by a data packet 1472 traveling through a tunnel unless it is known that the label will be 1473 properly interpreted at the tunnel's egress endpoint. How this is 1474 known is outside the scope of this document. 1476 Note that if the Tunnel Encapsulation attribute is attached to a VPN- 1477 IP route [RFC4364], and if Inter-AS "option b" (see section 10 of 1478 [RFC4364]) is being used, and if the Tunnel Endpoint sub-TLV contains 1479 an IP address that is not in same AS as the router receiving the 1480 route, it is very likely that the embedded label has been changed. 1481 Therefore use of the Tunnel Encapsulation attribute in an "Inter-AS 1482 option b" scenario is not supported. 1484 10. Scoping 1486 The Tunnel Encapsulation attribute is defined as a transitive 1487 attribute, so that it may be passed along by BGP speakers that do not 1488 recognize it. However, it is intended that the Tunnel Encapsulation 1489 attribute be used only within a well-defined scope, e.g., within a 1490 set of Autonomous Systems that belong to a single administrative 1491 entity. If the attribute is distributed beyond its intended scope, 1492 packets may be sent through tunnels in a manner that is not intended. 1494 To prevent the Tunnel Encapsulation attribute from being distributed 1495 beyond its intended scope, any BGP speaker that understands the 1496 attribute MUST be able to filter the attribute from incoming BGP 1497 UPDATE messages. When the attribute is filtered from an incoming 1498 UPDATE, the attribute is neither processed nor redistributed. This 1499 filtering SHOULD be possible on a per-BGP-session basis. For each 1500 session, filtering of the attribute on incoming UPDATEs MUST be 1501 enabled by default. 1503 In addition, any BGP speaker that understands the attribute MUST be 1504 able to filter the attribute from outgoing BGP UPDATE messages. This 1505 filtering SHOULD be possible on a per-BGP-session basis. For each 1506 session, filtering of the attribute on outgoing UPDATEs MUST be 1507 enabled by default. 1509 11. Error Handling 1511 The Tunnel Encapsulation attribute is a sequence of TLVs, each of 1512 which is a sequence of sub-TLVs. The final octet of a TLV is 1513 determined by its length field. Similarly, the final octet of a sub- 1514 TLV is determined by its length field. The final octet of a TLV MUST 1515 also be the final octet of its final sub-TLV. If this is not the 1516 case, the TLV MUST be considered to be malformed. A TLV that is 1517 found to be malformed for this reason MUST NOT be processed, and MUST 1518 be stripped from the Tunnel Encapsulation attribute before the 1519 attribute is propagated. Subsequent TLVs in the Tunnel Encapsulation 1520 attribute may still be valid, in which case they MUST be processed 1521 and redistributed normally. 1523 If a Tunnel Encapsulation attribute does not have any valid TLVs, or 1524 it does not have the transitive bit set, the "Attribute Discard" 1525 procedure of [RFC7606] is applied. 1527 If a Tunnel Encapsulation attribute can be parsed correctly, but 1528 contains a TLV whose tunnel type is not recognized by a particular 1529 BGP speaker, that BGP speaker MUST NOT consider the attribute to be 1530 malformed. Rather, the TLV with the unrecognized tunnel type MUST be 1531 ignored, and the BGP speaker MUST interpret the attribute as if that 1532 TLV had not been present. If the route carrying the Tunnel 1533 Encapsulation attribute is propagated with the attribute, the 1534 unrecognized TLV MUST remain in the attribute. 1536 If a TLV of a Tunnel Encapsulation attribute contains a sub-TLV that 1537 is not recognized by a particular BGP speaker, the BGP speaker MUST 1538 process that TLV as if the unrecognized sub-TLV had not been present. 1539 If the route carrying the Tunnel Encapsulation attribute is 1540 propagated with the attribute, the unrecognized TLV MUST remain in 1541 the attribute. 1543 If the type code of a sub-TLV appears as "reserved" in the IANA "BGP 1544 Tunnel Encapsulation Attribute Sub-TLVs" registry, the sub-TLV MUST 1545 be treated as an unrecognized sub-TLV. 1547 In general, if a TLV contains a sub-TLV that is malformed (e.g., 1548 contains a length field whose value is not legal for that sub-TLV), 1549 the sub-TLV should be treated as if it were an unrecognized sub-TLV. 1550 This document specifies one exception to this rule -- within a tunnel 1551 encapsulation attribute that is carried by a BGP UPDATE whose AFI/ 1552 SAFI is one of those explicitly listed in the second paragraph of 1553 Section 5, if a TLV contains a malformed Tunnel Endpoint sub-TLV (as 1554 defined in Section 3.1), the entire TLV MUST be ignored, and MUST be 1555 removed from the Tunnel Encapsulation attribute before the route 1556 carrying that attribute is redistributed. 1558 Within a tunnel encapsulation attribute that is carried by a BGP 1559 UPDATE whose AFI/SAFI is one of those explicitly listed in the second 1560 paragraph of Section 5, a TLV that does not contain exactly one 1561 Tunnel Endpoint sub-TLV MUST be treated as if it contained a 1562 malformed Tunnel Endpoint sub-TLV. 1564 A TLV identifying a particular tunnel type may contain a sub-TLV that 1565 is meaningless for that tunnel type. For example, perhaps the TLV 1566 contains a "UDP Destination Port" sub-TLV, but the identified tunnel 1567 type does not use UDP encapsulation at all. Sub-TLVs of this sort 1568 MUST be treated as a no-op. That is, they MUST NOT affect the 1569 creation of the encapsulation header. However, the sub-TLV MUST NOT 1570 be considered to be malformed, and MUST NOT be removed from the TLV 1571 before the route carrying the Tunnel Encapsulation attribute is 1572 redistributed. (This allows for the possibility that such sub-TLVs 1573 may be given a meaning, in the context of the specified tunnel type, 1574 in the future.) 1576 There is no significance to the order in which the TLVs occur within 1577 the Tunnel Encapsulation attribute. Multiple TLVs may occur for a 1578 given tunnel type; each such TLV is regarded as describing a 1579 different tunnel. 1581 The following sub-TLVs defined in this document MUST NOT occur more 1582 than once in a given Tunnel TLV: Tunnel Endpoint (discussed above), 1583 Encapsulation, IPv4 DS, UDP Destination Port, Embedded Label 1584 Handling, MPLS Label Stack, Prefix-SID. If a Tunnel TLV has more 1585 than one of any of these sub-TLVs, all but the first occurrence of 1586 each such sub-TLV type MUST be treated as a no-op. However, the 1587 Tunnel TLV containing them MUST NOT be considered to be malformed, 1588 and all the sub-TLVs MUST be propagated if the route carrying the 1589 Tunnel Encapsulation attribute is propagated. 1591 The following sub-TLVs defined in this document may appear zero or 1592 more times in a given Tunnel TLV: Protocol Type, Color. Each 1593 occurrence of such sub-TLVs is meaningful. For example, the Color 1594 sub-TLV may appear multiple times to assign multiple colors to a 1595 tunnel. 1597 12. IANA Considerations 1599 12.1. Subsequent Address Family Identifiers 1601 IANA is requested to modify the "Subsequent Address Family 1602 Identifiers" registry to indicate that the Encapsulation SAFI is 1603 deprecated. This document should be the reference. 1605 12.2. BGP Path Attributes 1607 IANA has previously assigned value 23 from the "BGP Path Attributes" 1608 Registry to "Tunnel Encapsulation Attribute". IANA is requested to 1609 add this document as a reference. 1611 12.3. Extended Communities 1613 IANA has previously assigned values from the "Transitive Opaque 1614 Extended Community" type Registry to the "Color Extended Community" 1615 (sub-type 0x0b), and to the "Encapsulation Extended 1616 Community"(0x030c). IANA is requested to add this document as a 1617 reference for both assignments. 1619 12.4. BGP Tunnel Encapsulation Attribute Sub-TLVs 1621 IANA is requested to add the following note to the "BGP Tunnel 1622 Encapsulation Attribute Sub-TLVs" registry: 1624 If the Sub-TLV Type is in the range from 0 to 127 inclusive, the 1625 Sub-TLV Length field contains one octet. If the Sub-TLV Type is 1626 in the range from 128-255 inclusive, the Sub-TLV Length field 1627 contains two octets. 1629 IANA is requested to change the registration policy of the "BGP 1630 Tunnel Encapsulation Attribute Sub-TLVs" registry to the following: 1632 o The values 0 and 255 are reserved. 1634 o The values in the range 1-63 and 128-191 are to be allocated using 1635 the "Standards Action" registration procedure. 1637 o The values in the range 64-125 and 192-252 are to be allocated 1638 using the "First Come, First Served" registration procedure. 1640 o The values in the range 126-127 and 253-254 are reserved for 1641 experimental use; IANA shall not allocate values from this range. 1643 IANA has assigned the following codepoints in the "BGP Tunnel 1644 Encapsulation Attribute Sub-TLVs registry: 1646 6: Remote Endpoint 1648 IANA is requested to change the name of "Remote Endpoint" to 1649 "Tunnel Egress Endpoint". 1651 7: IPv4 DS Field 1653 8: UDP Destination Port 1655 9: Embedded Label Handling 1657 10: MPLS Label Stack 1658 11: Prefix SID 1660 IANA has previously assigned codepoints from the "BGP Tunnel 1661 Encapsulation Attribute Sub-TLVs" registry for "Encapsulation", 1662 "Protocol Type", and "Color". IANA is requested to add this document 1663 as a reference. 1665 12.5. Tunnel Types 1667 IANA is requested to add this document as a reference for tunnel 1668 types 8 (VXLAN), 9 (NVGRE), 11 (MPLS-in-GRE), and 12 (VXLAN-GPE) in 1669 the "BGP Tunnel Encapsulation Tunnel Types" registry. 1671 IANA is requested to add this document as a reference for tunnel 1672 types 1 (L2TPv3), 2 (GRE), and 7 (IP in IP) in the "BGP Tunnel 1673 Encapsulation Tunnel Types" registry. 1675 12.6. Flags Field of Vxlan Encapsulation sub-TLV 1677 IANA is requested to add this document as a reference for creating 1678 the flags field of the Vxlan Encapsulation sub-TLV registry. 1680 IANA is requested to add this document as a reference for flag bits V 1681 and M in the "Flags field of Vxlan Encapsulation sub-TLV" registry. 1683 12.7. Flags Field of Vxlan-GPE Encapsulation sub-TLV 1685 IANA is requested to add this document as a reference for creating 1686 the flags field of the Vxlan-GPE Encapsulation sub-TLV registry. 1688 IANA is requested to add this document as a reference for flag bit V 1689 in the "Flags field of Vxlan-GPE Encapsulation sub-TLV" registry. 1691 12.8. Flags Field of NVGRE Encapsulation sub-TLV 1693 IANA is requested to add this document as a reference for creating 1694 the flags field of the NVGRE Encapsulation sub-TLV registry. 1696 IANA is requested to add this document as a reference for flag bits V 1697 and M in the "Flags field of NVGRE Encapsulation sub-TLV" registry. 1699 12.9. Embedded Label Handling sub-TLV 1701 IANA is requested to add this document as a reference for creating 1702 the sub-TLV's value field of the Embedded Label Handling sub-TLV 1703 registry. 1705 IANA is requested to add this document as a reference for value of 1 1706 (Payload of MPLS with embedded label) and 2 (no embedded label in 1707 payload) in the "sub-TLV's value field of the Embedded Label Handling 1708 sub-TLV" registry. 1710 13. Security Considerations 1712 The Tunnel Encapsulation attribute can cause traffic to be diverted 1713 from its normal path, especially when the Tunnel Endpoint sub-TLV is 1714 used. This can have serious consequences if the attribute is added 1715 or modified illegitimately, as it enables traffic to be "hijacked". 1717 The Tunnel Endpoint sub-TLV contains both an IP address and an AS 1718 number. BGP Origin Validation [RFC6811] can be used to obtain 1719 assurance that the given IP address belongs to the given AS. While 1720 this provides some protection against misconfiguration, it does not 1721 prevent a malicious agent from inserting a sub-TLV that will appear 1722 valid. 1724 Before sending a packet through the tunnel identified in a particular 1725 TLV of a Tunnel Encapsulation attribute, it may be advisable to use 1726 BGP Origin Validation to obtain the following additional assurances: 1728 o the origin AS of the route carrying the Tunnel Encapsulation 1729 attribute is correct; 1731 o the origin AS of the route to the IP address specified in the 1732 Tunnel Endpoint sub-TLV is correct, and is the same AS that is 1733 specified in the Tunnel Endpoint sub-TLV. 1735 One then has some level of assurance that the tunneled traffic is 1736 going to the same destination AS that it would have gone to had the 1737 Tunnel Encapsulation attribute not been present. However, this may 1738 not suit all use cases, and in any event is not very strong 1739 protection against hijacking. 1741 For these reasons, BGP Origin Validation should not be relied upon 1742 exclusively, and the filtering procedures of Section 10 should always 1743 be in place. 1745 Increased protection can be obtained by using BGPSEC [RFC8205] to 1746 ensure that the route carrying the Tunnel Encapsulation attribute, 1747 and the routes to the Tunnel Endpoint of each specified tunnel, have 1748 not been altered illegitimately. 1750 If BGP Origin Validation is used as specified above, and the tunnel 1751 specified in a particular TLV of a Tunnel Encapsulation attribute is 1752 therefore regarded as "suspicious", that tunnel should not be used. 1754 Other tunnels specified in (other TLVs of) the Tunnel Encapsulation 1755 attribute may still be used. 1757 14. Acknowledgments 1759 This document contains text from RFC5512, co-authored by Pradosh 1760 Mohapatra. The authors of the current document wish to thank Pradosh 1761 for his contribution. RFC5512 itself built upon prior work by Gargi 1762 Nalawade, Ruchi Kapoor, Dan Tappan, David Ward, Scott Wainner, Simon 1763 Barber, Lili Wang, and Chris Metz, whom we also thank for their 1764 contributions. 1766 The authors wish to thank Lou Berger, Ron Bonica, Martin Djernaes, 1767 John Drake, Satoru Matsushima, Dhananjaya Rao, John Scudder, Ravi 1768 Singh, Thomas Morin, Xiaohu Xu, and Zhaohui Zhang for their review, 1769 comments, and/or helpful discussions. 1771 15. Contributor Addresses 1773 Below is a list of other contributing authors in alphabetical order: 1775 Randy Bush 1776 Internet Initiative Japan 1777 5147 Crystal Springs 1778 Bainbridge Island, Washington 98110 1779 United States 1781 Email: randy@psg.com 1783 Robert Raszuk 1784 Bloomberg LP 1785 731 Lexington Ave 1786 New York City, NY 10022 1787 United States 1789 Email: robert@raszuk.net 1791 Eric C. Rosen 1793 16. References 1795 16.1. Normative References 1797 [I-D.ietf-idr-bgp-prefix-sid] 1798 Previdi, S., Filsfils, C., Lindem, A., Sreekantiah, A., 1799 and H. Gredler, "Segment Routing Prefix SID extensions for 1800 BGP", draft-ietf-idr-bgp-prefix-sid-27 (work in progress), 1801 June 2018. 1803 [I-D.ietf-nvo3-vxlan-gpe] 1804 Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol 1805 Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-07 (work 1806 in progress), April 2019. 1808 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1809 Requirement Levels", BCP 14, RFC 2119, 1810 DOI 10.17487/RFC2119, March 1997, 1811 . 1813 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 1814 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 1815 DOI 10.17487/RFC2784, March 2000, 1816 . 1818 [RFC2890] Dommety, G., "Key and Sequence Number Extensions to GRE", 1819 RFC 2890, DOI 10.17487/RFC2890, September 2000, 1820 . 1822 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1823 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1824 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1825 . 1827 [RFC3931] Lau, J., Ed., Townsley, M., Ed., and I. Goyret, Ed., 1828 "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", 1829 RFC 3931, DOI 10.17487/RFC3931, March 2005, 1830 . 1832 [RFC4023] Worster, T., Rekhter, Y., and E. Rosen, Ed., 1833 "Encapsulating MPLS in IP or Generic Routing Encapsulation 1834 (GRE)", RFC 4023, DOI 10.17487/RFC4023, March 2005, 1835 . 1837 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1838 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1839 DOI 10.17487/RFC4271, January 2006, 1840 . 1842 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 1843 "Multiprotocol Extensions for BGP-4", RFC 4760, 1844 DOI 10.17487/RFC4760, January 2007, 1845 . 1847 [RFC5512] Mohapatra, P. and E. Rosen, "The BGP Encapsulation 1848 Subsequent Address Family Identifier (SAFI) and the BGP 1849 Tunnel Encapsulation Attribute", RFC 5512, 1850 DOI 10.17487/RFC5512, April 2009, 1851 . 1853 [RFC5566] Berger, L., White, R., and E. Rosen, "BGP IPsec Tunnel 1854 Encapsulation Attribute", RFC 5566, DOI 10.17487/RFC5566, 1855 June 2009, . 1857 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 1858 L., Sridhar, T., Bursell, M., and C. Wright, "Virtual 1859 eXtensible Local Area Network (VXLAN): A Framework for 1860 Overlaying Virtualized Layer 2 Networks over Layer 3 1861 Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, 1862 . 1864 [RFC7510] Xu, X., Sheth, N., Yong, L., Callon, R., and D. Black, 1865 "Encapsulating MPLS in UDP", RFC 7510, 1866 DOI 10.17487/RFC7510, April 2015, 1867 . 1869 [RFC7606] Chen, E., Ed., Scudder, J., Ed., Mohapatra, P., and K. 1870 Patel, "Revised Error Handling for BGP UPDATE Messages", 1871 RFC 7606, DOI 10.17487/RFC7606, August 2015, 1872 . 1874 [RFC7637] Garg, P., Ed. and Y. Wang, Ed., "NVGRE: Network 1875 Virtualization Using Generic Routing Encapsulation", 1876 RFC 7637, DOI 10.17487/RFC7637, September 2015, 1877 . 1879 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1880 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1881 May 2017, . 1883 16.2. Informative References 1885 [Ethertypes] 1886 "IANA Ethertype Registry", 1887 . 1890 [I-D.ietf-bess-evpn-inter-subnet-forwarding] 1891 Sajassi, A., Salam, S., Thoria, S., Drake, J., and J. 1892 Rabadan, "Integrated Routing and Bridging in EVPN", draft- 1893 ietf-bess-evpn-inter-subnet-forwarding-08 (work in 1894 progress), March 2019. 1896 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1897 "Definition of the Differentiated Services Field (DS 1898 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1899 DOI 10.17487/RFC2474, December 1998, 1900 . 1902 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 1903 Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February 1904 2006, . 1906 [RFC5462] Andersson, L. and R. Asati, "Multiprotocol Label Switching 1907 (MPLS) Label Stack Entry: "EXP" Field Renamed to "Traffic 1908 Class" Field", RFC 5462, DOI 10.17487/RFC5462, February 1909 2009, . 1911 [RFC6514] Aggarwal, R., Rosen, E., Morin, T., and Y. Rekhter, "BGP 1912 Encodings and Procedures for Multicast in MPLS/BGP IP 1913 VPNs", RFC 6514, DOI 10.17487/RFC6514, February 2012, 1914 . 1916 [RFC6811] Mohapatra, P., Scudder, J., Ward, D., Bush, R., and R. 1917 Austein, "BGP Prefix Origin Validation", RFC 6811, 1918 DOI 10.17487/RFC6811, January 2013, 1919 . 1921 [RFC8205] Lepinski, M., Ed. and K. Sriram, Ed., "BGPsec Protocol 1922 Specification", RFC 8205, DOI 10.17487/RFC8205, September 1923 2017, . 1925 Authors' Addresses 1927 Keyur Patel 1928 Arrcus, Inc 1929 2077 Gateway Pl 1930 San Jose, CA 95110 1931 United States 1933 Email: keyur@arrcus.com 1934 Gunter Van de Velde 1935 Nokia 1936 Copernicuslaan 50 1937 Antwerpen 2018 1938 Belgium 1940 Email: gunter.van_de_velde@nokia.com 1942 Srihari R. Sangli 1943 Juniper Networks, Inc 1944 10 Technology Park Drive 1945 Westford, Massachusetts 01886 1946 United States 1948 Email: ssangli@juniper.net