idnits 2.17.1 draft-ietf-lsvr-l3dl-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 23, 2019) is 1830 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-18) exists of draft-ietf-idr-bgp-ls-segment-routing-ext-12 == Outdated reference: A later version (-19) exists of draft-ietf-idr-bgpls-segment-routing-epe-18 == Outdated reference: A later version (-29) exists of draft-ietf-lsvr-bgp-spf-04 -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-PEN' -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE802-2014' ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Arrcus & IIJ 4 Intended status: Standards Track R. Austein 5 Expires: October 25, 2019 K. Patel 6 Arrcus 7 April 23, 2019 9 Layer 3 Discovery and Liveness 10 draft-ietf-lsvr-l3dl-00 12 Abstract 14 In Massive Data Centers (MDCs), BGP-SPF and similar routing protocols 15 are used to build topology and reachability databases. These 16 protocols need to discover IP Layer 3 attributes of links, such as 17 logical link IP encapsulation abilities, IP neighbor address 18 discovery, and link liveness. The Layer 3 Discovery and Liveness 19 protocol specified in this document collects these data, which are 20 then disseminated using BGP-SPF and similar protocols. 22 Requirements Language 24 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 25 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 26 "OPTIONAL" in this document are to be interpreted as described in 27 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 28 capitals, as shown here. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on October 25, 2019. 47 Copyright Notice 49 Copyright (c) 2019 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 4. Top Level Overview . . . . . . . . . . . . . . . . . . . . . 5 68 5. Inter-Link Protocol Overview . . . . . . . . . . . . . . . . 6 69 5.1. L3DL Ladder Diagram . . . . . . . . . . . . . . . . . . . 7 70 6. Transport Layer . . . . . . . . . . . . . . . . . . . . . . . 8 71 7. The Checksum . . . . . . . . . . . . . . . . . . . . . . . . 9 72 8. TLV PDUs . . . . . . . . . . . . . . . . . . . . . . . . . . 11 73 9. Logical Link Endpoint Identifier . . . . . . . . . . . . . . 12 74 10. HELLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 75 11. OPEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 76 12. ACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 77 12.1. Retransmission . . . . . . . . . . . . . . . . . . . . . 17 78 13. The Encapsulations . . . . . . . . . . . . . . . . . . . . . 17 79 13.1. The Encapsulation PDU Skeleton . . . . . . . . . . . . . 18 80 13.2. Prim/Loop Flags . . . . . . . . . . . . . . . . . . . . 19 81 13.3. IPv4 Encapsulation . . . . . . . . . . . . . . . . . . . 19 82 13.4. IPv6 Encapsulation . . . . . . . . . . . . . . . . . . . 20 83 13.5. MPLS Label List . . . . . . . . . . . . . . . . . . . . 20 84 13.6. MPLS IPv4 Encapsulation . . . . . . . . . . . . . . . . 21 85 13.7. MPLS IPv6 Encapsulation . . . . . . . . . . . . . . . . 21 86 14. KEEPALIVE - Layer 2 Liveness . . . . . . . . . . . . . . . . 22 87 15. VENDOR - Vendor Extensions . . . . . . . . . . . . . . . . . 23 88 16. Layers 2.5 and 3 Liveness . . . . . . . . . . . . . . . . . . 23 89 17. The North/South Protocol . . . . . . . . . . . . . . . . . . 24 90 17.1. Use BGP-LS as Much as Possible . . . . . . . . . . . . . 24 91 17.2. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . 24 92 18. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 25 93 18.1. HELLO Discussion . . . . . . . . . . . . . . . . . . . . 25 94 18.2. HELLO versus KEEPALIVE . . . . . . . . . . . . . . . . . 25 96 19. VLANs/SVIs/Sub-interfaces . . . . . . . . . . . . . . . . . . 25 97 20. Implementation Considerations . . . . . . . . . . . . . . . . 25 98 21. Security Considerations . . . . . . . . . . . . . . . . . . . 26 99 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 100 23. IEEE Considerations . . . . . . . . . . . . . . . . . . . . . 28 101 24. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 28 102 25. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 103 25.1. Normative References . . . . . . . . . . . . . . . . . . 28 104 25.2. Informative References . . . . . . . . . . . . . . . . . 30 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31 107 1. Introduction 109 The Massive Data Center (MDC) environment presents unusual problems 110 of scale, e.g. O(10,000) devices, while its homogeneity presents 111 opportunities for simple approaches. Approaches such as Jupiter 112 Rising [JUPITER] use a central controller to deal with scaling, while 113 BGP-SPF [I-D.ietf-lsvr-bgp-spf] provides massive scale-out without 114 centralization using a tried and tested scalable distributed control 115 plane, offering a scalable routing solution in Clos [Clos0][Clos1] 116 and similar environments. But BGP-SPF and similar higher level 117 device-spanning protocols, e.g. [I-D.malhotra-bess-evpn-lsoe], need 118 logical link state and addressing data from the network to build the 119 routing topology. They also need prompt but prudent reaction to 120 (logical) link failure. 122 Layer 3 Discovery and Liveness (L3DL) provides brutally simple 123 mechanisms for devices to 125 o Discover unique identities of devices/ports/... on a logical link, 127 o Run Layer 2 keep-alive messages for session continuity, 129 o Discover each other's unique endpoint identification, 131 o Discover mutually supported encapsulations, e.g. IP/MPLS, 133 o Discover Layer 3 IP and/or MPLS addressing of interfaces of the 134 encapsulations, 136 o Enable layer 3 link liveness such as BFD, and finally 138 o Present these data, using a very restricted profile of a BGP-LS 139 [RFC7752] API, to BGP-SPF which computes the topology and builds 140 routing and forwarding tables. 142 This protocol may be more widely applicable to a range of routing and 143 similar protocols which need layer 3 discovery and characterisation. 145 2. Terminology 147 Even though it concentrates on the inter-device layer, this document 148 relies heavily on routing terminology. The following attempts to 149 clarify the use of some possibly confusing terms: 151 ASN: Autonomous System Number [RFC4271], a BGP identifier for 152 an originator of Layer 3 routes, particularly BGP 153 announcements. 154 BGP-LS: A mechanism by which link-state and TE information can be 155 collected from networks and shared with external 156 components using the BGP routing protocol. See [RFC7752]. 157 BGP-SPF A hybrid protocol using BGP transport but a Dijkstra SPF 158 decision process. See [I-D.ietf-lsvr-bgp-spf]. 159 Clos: A hierarchic subset of a crossbar switch topology commonly 160 used in data centers. 161 Datagram: The L3DL content of a single Layer 2 frame. A full L3DL 162 PDU may be packaged in multiple Datagrams. 163 Encapsulation: Address Family Indicator and Subsequent Address 164 Family Indicator (AFI/SAFI). I.e. classes of layer 2.5 165 and 3 addresses such as IPv4, IPv6, MPLS, ... 166 Frame: A Layer 2 packet. 167 Link or Logical Link: A logical connection between two logical ports 168 on two devices. E.g. two VLANs between the same two ports 169 are two links. 170 LLEI: Logical Link Endpoint Identifier, the unique identifier of 171 one end of a logical link, see Section 9. 172 MAC Address: 48-bit Layer 2 addresses are assumed since they are 173 used by all widely deployed Layer 2 network technologies 174 of interest, especially Ethernet. See [IEEE.802_2001]. 175 MDC: Massive Data Center, commonly thousands of TORs. 176 MTU: Maximum Transmission Unit, the size in octets of the 177 largest packet that can be sent on a medium, see [RFC1122] 178 1.3.3. 179 PDU: Protocol Data Unit, an L3DL application layer message. A 180 PDU may need to be broken into multiple Datagrams to make 181 it through MTU or other restrictions. 182 RouterID: An 32-bit identifier unique in the current routing domain, 183 see [RFC4271] updated by [RFC6286]. 184 Session: An established, via OPEN PDUs, session between two L3DL 185 capable link end-points, 186 SPF: Shortest Path First, an algorithm for finding the shortest 187 paths between nodes in a graph; AKA Dijkstra's algorithm. 188 System Identifier: An eight octet ISO System Identifier a la 189 [RFC1629] System ID 190 TOR: Top Of Rack switch, aggregates the servers in a rack and 191 connects to aggregation layers of the Clos tree, AKA the 192 Clos spine. 194 ZTP: Zero Touch Provisioning gives devices initial addresses, 195 credentials, etc. on boot/restart. 197 3. Background 199 L3DL assumes a Clos type datacenter scale and topology, but can 200 accommodate richer topologies which contain potential cycles. 202 While L3DL is designed for the MDC, there are no inherent reasons it 203 could not run on a WAN. The authentication and authorization needed 204 to run safely on a WAN need to be considered, and the appropriate 205 level of security options chosen. 207 L3DL assumes a new IEEE assigned EtherType (TBD). 209 The number of addresses of the Encapsulations on a link may be fairly 210 large given a TOR with more than 20 servers, each server possibly 211 having on the order of a hundred micro-services resulting in an 212 inordinate number of addresses. And security will further add to the 213 length of PDUs. PDUs with lengths over 10,000 octets are likely or 214 quite possible. 216 4. Top Level Overview 218 o Devices discover each other on logical links 220 o Logical Link Endpoint Identifiers are exchanged 222 o Layer 2 Liveness Checks may be started 224 o Encapsulation data are exchanged and IP-Level Liveness Checks 225 enabled 227 o A BGP-like upper layer protocol is assumed to use these data to 228 discover and build a topology database 230 +-------------------+ +-------------------+ +-------------------+ 231 | Device | | Device | | Device | 232 | | | | | | 233 |+-----------------+| |+-----------------+| |+-----------------+| 234 || || || || || || 235 || BGP-SPF <+---+> BGP-SPF <+---+> BGP-SPF || 236 || || || || || || 237 |+--------^--------+| |+--------^--------+| |+--------^--------+| 238 | | | | | | | | | 239 | | | | | | | | | 240 |+--------+--------+| |+--------+--------+| |+--------+--------+| 241 || Encapsulations || || Encapsulations || || Encapsulations || 242 || Addresses || || Addresses || || Addresses || 243 || L2 Liveness || || L2 Liveness || || L2 Liveness || 244 |+--------^--------+| |+--------^--------+| |+--------^--------+| 245 | | | | | | | | | 246 | | | | | | | | | 247 |+--------v--------+| |+--------v--------+| |+--------v--------+| 248 || || || || || || 249 ||Inter-Device PDUs<+---+>Inter-Device PDUs<+---+>Inter-Device PDUs|| 250 || || || || || || 251 |+-----------------+| |+-----------------+| |+-----------------+| 252 +-------------------+ +-------------------+ +-------------------+ 254 There are two protocols, the inter-device per-link layer 3 discovery 255 and the interface to the upper level BGP-like API: 257 o Inter-device PDUs are used to exchange device and logical link 258 identities and layer 2.5 and 3 identifiers (not payloads), e.g. 259 device IDs, port identities, VLAN IDs, Encapsulations, and IP 260 addresses. 262 o A Link Layer to BGP API presents these data up the stack to a BGP 263 protocol or an other device-spanning upper layer protocol, 264 presenting them using the BGP-LS BGP-like data format. 266 The upper layer BGP family routing protocols cross all the devices, 267 though they are not part of these L3DL protocols. 269 To simplify this document, Layer 2 framing is not shown. L3DL is 270 about layer 3. 272 5. Inter-Link Protocol Overview 274 Two devices discover each other and their respective identities by 275 sending multicast HELLO PDUs (Section 10). To allow discovery of new 276 devices coming up on a multi-link topology, devices send periodic 277 HELLOs forever, see Section 18.1. 279 Once a new device is recognized, both devices attempt to negotiate 280 and establish peering by sending unicast OPEN PDUs (Section 11). In 281 an established peering, Encapsulations (Section 13) may be announced 282 and modified. When two devices on a link have compatible 283 Encapsulations and addresses, i.e. the same AFI/SAFI and the same 284 subnet, the link is announced via the BGP-LS API. 286 5.1. L3DL Ladder Diagram 288 The HELLO, Section 10, is a priming message. It is a small L3DL PDU 289 encapsulated in an Ethernet multicast frame with the simple goal of 290 discovering the identities of logical link endpoint(s) reachable from 291 a Logical Link Endpoint, Section 9. 293 The HELLO and OPEN, Section 11, PDUs, which are used to discover and 294 exchange detailed Logical Link Endpoint Identifiers, LLEIs, and the 295 ACK/ERROR PDU, are mandatory; other PDUs are optional; though at 296 least one encapsulation MUST be agreed at some point. 298 The following is a ladder-style sketch of the L3DL protocol 299 exchanges: 301 | HELLO | Logical Link Peer discovery 302 |---------------------------->| 303 | HELLO | Mandatory 304 |<----------------------------| 305 | | 306 | | 307 | OPEN | MACs, IDs, and Capabilities 308 |---------------------------->| 309 | OPEN | Mandatory 310 |<----------------------------| 311 | | 312 | | 313 | Interface IPv4 Addresses | Interface IPv4 Addresses 314 |---------------------------->| Optional 315 | ACK | 316 |<----------------------------| 317 | | 318 | Interface IPv4 Addresses | 319 |<----------------------------| 320 | ACK | 321 |---------------------------->| 322 | | 323 | | 324 | Interface IPv6 Addresses | Interface IPv6 Addresses 325 |---------------------------->| Optional 326 | ACK | 327 |<----------------------------| 328 | | 329 | Interface IPv6 Addresses | 330 |<----------------------------| 331 | ACK | 332 |---------------------------->| 333 | | 334 | | 335 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 336 |---------------------------->| Optional 337 | ACK | 338 |<----------------------------| 339 | | 340 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 341 |<----------------------------| Optional 342 | ACK | 343 |---------------------------->| 344 | | 345 | | 346 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 347 |---------------------------->| Optional 348 | ACK | 349 |<----------------------------| 350 | | 351 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 352 |<----------------------------| Optional 353 | ACK | 354 |---------------------------->| 355 | | 356 | | 357 | L3DL KEEPALIVE | Layer 2 Liveness 358 |---------------------------->| Optional 359 | L3DL KEEPALIVE | 360 |<----------------------------| 362 6. Transport Layer 364 L3DL PDUs are carried by a simple transport layer which allows long 365 PDUs to occupy many Ethernet frames. The L3DL data in each frame is 366 referred to as a Datagram. 368 The L3DL Transport Layer encapsulates each Datagram using a common 369 transport header. 371 If a PDU does not fit in a single datagram, it is broken into 372 multiple datagrams and reassembled by the receiver a la [RFC0791]. 374 0 1 2 3 375 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 377 | Version |L|Datagram Num.| Datagram Length | 378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 379 | Checksum | 380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 382 The fields of the L3DL Transport Header are as follows: 384 Version: Version number of the protocol, currently 0. Values other 385 than 0 are treated as errors. 387 L: A bit that set to one if this Datagram is the last Datagram of the 388 PDU. For a PDU which fits in only one Datagram, it is set to one. 389 Note that this is the inverse of the marking technique used by 390 [RFC0791]. 392 Datagram Number: 0..127, a monotonically increasing value, modulo 393 128, see [RFC1982] which starts at 0 for each PDU. Note that this 394 does not limit an L3DL PDU to 128 frames. 396 Datagram Length: Total number of octets in the Datagram including 397 all payloads and fields. 399 Checksum: A 32 bit hash over the Datagram to detect bit flips, see 400 Section 7. 402 7. The Checksum 404 There is a reason conservative folk use a checksum in UDP. And as 405 many operators stretch to jumbo frames (over 1,500 octets) longer 406 checksums are the prudent approach. 408 For the purpose of computing a checksum, the checksum field itself is 409 assumed to be zero. 411 The following code describes the suggested algorithm. 413 Sum up 32-bit unsigned ints in a 64-bit long, then take the high- 414 order section, shift it right, rotate, add it in, repeat until zero. 416 417 #include 418 #include 420 /* The F table from Skipjack, and it would work for the S-Box. */ 421 static const uint8_t sbox[256] = { 422 0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78, 423 0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e, 424 0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0, 425 0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53, 426 0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5, 427 0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b, 428 0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85, 429 0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90, 430 0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56, 431 0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20, 432 0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e, 433 0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18, 434 0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9, 435 0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87, 436 0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73, 437 0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5, 438 0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e, 439 0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1, 440 0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe, 441 0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac, 442 0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01, 443 0x05,0x59,0x2a,0x46 444 }; 446 /* non-normative example C code, constant time even */ 448 uint32_t sbox_checksum_32(const uint8_t *b, const size_t n) 449 { 450 uint32_t sum[4] = {0, 0, 0, 0}; 451 uint64_t result = 0; 452 for (size_t i = 0; i < n; i++) 453 sum[i & 3] += sbox[*b++]; 454 for (int i = 0; i < sizeof(sum)/sizeof(*sum); i++) 455 result = (result << 8) + sum[i]; 456 result = (result >> 32) + (result & 0xFFFFFFFF); 457 result = (result >> 32) + (result & 0xFFFFFFFF); 458 return (uint32_t) result; 459 } 460 462 8. TLV PDUs 464 The basic L3DL application layer PDU is a typical TLV (Type Length 465 Value) PDU. It includes a signature to provide optional integrity 466 and authentication. It may be broken into multiple Datagrams, see 467 Section 6. 469 0 1 2 3 470 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Type | Payload Length | ~ 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 474 ~ Payload ... ~ 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | Sig Type | Signature Length | ~ 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 478 ~ Signature ~ 479 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 481 The fields of the basic L3DL header are as follows: 483 Type: An integer differentiating PDU payload types 485 0 - HELLO 486 1 - OPEN 487 2 - KEEPALIVE 488 3 - ACK 489 4 - IPv4 Announcement 490 5 - IPv6 Announcement 491 6 - MPLS IPv4 Announcement 492 7 - MPLS IPv6 Announcement 493 8-254 Reserved 494 255 - VENDOR 496 Payload Length: Total number of octets in the Payload field. 498 Payload: The application layer content of the L3DL PDU. 500 Sig Type: The type of the Signature. Type 0, a null signature, is 501 defined in this document. 503 Sig Type 0 indicates a null Signature. For very short PDUs, the 504 underlying Datagram checksums may be sufficient for integrity, if 505 not for authentication. 507 Other Sig Types may be defined in other documents. 509 Signature Length: The length of the Signature, possibly including 510 padding, in octets. If Sig Type is 0, Signature Length must be 0. 512 Signature: The result of running the signature algorithm specified 513 in Sig Type over all octets of the PDU except for the Signature 514 itself. 516 9. Logical Link Endpoint Identifier 518 L3DL discovers neighbors on logical links and establishes sessions 519 between the two ends of all consenting discovered logical links. A 520 logical link is described by a pair of Logical Link Endpoint 521 Identifiers, LLEIs. 523 An LLEI is a variable length descriptor which could be an ASN, a 524 classic RouterID, a catenation of the two, an eight octet ISO System 525 Identifier [RFC1629], or any other identifier unique to a single 526 logical link endpoint in the topology. 528 An L3DL deployment will choose and define an LLEI which suits their 529 needs, simple or complex. Two extremes are as follows: 531 A simplistic view of a link between two devices is two ports, 532 identified by unique MAC addresses, carrying a layer 3 protocol 533 conversation. In this case, the MAC addresses might suffice for the 534 LLEIs. 536 Unfortunately, things can get more complex. Multiple VLANs can run 537 between those two MAC addresses. In practice, many real devices use 538 the same MAC address on multiple ports and/or sub-interfaces. 540 Therefore, in the general circumstance, a fully described LLEI might 541 be as follows: 543 0 1 2 3 544 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 545 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 546 | | 547 + System Identifier + 548 | | 549 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 550 | ifIndex | 551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 553 System Identifier, a la [RFC1629], is an eight octet identifier 554 unique in the entire operational space. Routers and switches usually 555 have internal MAC Addresses which can be padded with high order zeros 556 and used if no System ID exists on the device. If no unique 557 identifier is burned into a device, the local L3DL configuration 558 SHOULD create and assign a unique one by configuration. 560 ifIndex is the SNMP identifier of the (sub-)interface, see [RFC1213]. 561 This uniquely identifies the port. 563 For a layer 3 tagged sub-interface or a VLAN/SVI interface, Ifindex 564 is that of the logical sub-interface, so no further disambiguation is 565 needed. 567 L3DL PDUs learned over VLAN-ports may be interpreted by upper layer-3 568 routing protocols as being learned on the corresponding layer-3 SVI 569 interface for the VLAN. 571 10. HELLO 573 The HELLO PDU is unique in that it is encapsulated in a multicast 574 Ethernet frame. It solicits response(s) from other LLEI(s) on the 575 link. See Section 18.1 for why multicast is used. The destination 576 multicast MAC Addressees to be used MUST be one of the following, See 577 Clause 9.2.2 of [IEEE802-2014]: 579 01-80-C2-00-00-0E: Nearest Bridge = Propagation constrained to a 580 single physical link; stopped by all types of bridges (including 581 MPRs (media converters)). 582 01-80-C2-00-00-03: Nearest non-TPMR Bridge = Propagation constrained 583 by all bridges other than TPMRs; intended for use within provider 584 bridged networks. 586 All other L3DL PDUs are encapsulated in unicast frames, as the peer's 587 destination MAC address is known after the HELLO exchange. 589 When an interface is turned up on a device, it SHOULD issue a HELLO 590 periodically. The interval is set by configuration with a default of 591 60 seconds. 593 0 1 2 3 594 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 596 | Type = 0 | Payload Length = 0 | Sig Type = 0 | 597 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 598 | Signature Length = 0 | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 601 If more than one device responds, one adjacency is formed for each 602 unique source LLEI response. L3DL treats each adjacency as a 603 separate logical link. 605 When a HELLO is received from a source LLEI with which there is no 606 established L3DL adjacency, the receiver SHOULD respond with an OPEN 607 PDU. The two devices establish an L3DL adjacency by exchanging OPEN 608 PDUs. 610 The Payload Length is zero as there is no payload. 612 HELLO PDUs can not be signed as keying material has yet to be 613 exchanged. Hence the signature MUST always be the null type. 615 11. OPEN 617 Each device has learned the other's MAC Address from the HELLO 618 exchange, see Section 10. Therefore the OPEN and subsequent PDUs are 619 unicast, as opposed to the HELLO's multicast frame. 621 0 1 2 3 622 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 624 | Type = 1 | Payload Length | ~ 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 626 | Nonce | LLEI Length | 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 628 ~ ~ 629 ~ My LLEI ~ 630 ~ ~ 631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 632 | AttrCount | Attribute List ... | 633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 634 | Auth Type | Key Length | ~ 635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 636 ~ Key ... ~ 637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 638 | Sig Type | Signature Length | ~ 639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 640 ~ Signature ... ~ 641 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 643 The Payload Length is the number of octets in all fields of the PDU 644 from the Nonce through the Key, not including the signature fields. 646 The Nonce enables detection of a duplicate OPEN PDU. It SHOULD be 647 either a random number or the time of day. It is needed to prevent 648 session closure due to a repeated OPEN caused by a race or a dropped 649 or delayed ACK. 651 My LLEI is the sender's LLEI, see Section 9. LLEIs are big-endian. 653 AttrCount is the number of attributes in the Attribute List. 654 Attributes are single octets whose semantics are user-defined. 656 A node may have zero or more user-defined attributes, e.g. spine, 657 leaf, backbone, route reflector, arabica, ... 659 Attribute syntax and semantics are local to an operator or 660 datacenter; hence there is no global registry. Nodes exchange their 661 attributes only in the OPEN PDU. 663 Auth Type is the Signature algorithm suite, see Section 8. 665 Key Length is a 16-bit field denoting the length in octets of the 666 Key, not including the Auth Type or the Key Lengths. If there is no 667 Key, the Auth Type and key Length MUST both be zero. 669 The Key is specific to the operational environment. A failure to 670 authenticate is a failure to start the L3DL session, an ERROR PDU is 671 sent (Error Code 2), and HELLOs MUST be restarted. 673 The Signature fields are described in Section 8 and in an asymmetric 674 key environment serve as a proof of possession of the signing auth 675 data by the sender. 677 Once two logical link endpoints know each other, and have ACKed each 678 other's OPEN PDUs, Layer 2 KEEPALIVEs (see Section 14) MAY be started 679 to ensure Layer 2 liveness and keep the session semantics alive. The 680 timing and acceptable drop of KEEPALIVE PDUs are discussed in 681 Section 14. 683 If a sender of OPEN does not receive an ACK of the OPEN PDU Type, 684 then they MUST resend the same OPEN PDU, with the same Nonce. 686 Resending an unacknowledged OPEN PDU, like other ACKed PDUs, SHOULD 687 use exponential back-off, see [RFC1122]. 689 If a properly authenticated OPEN arrives with a new Nonce from an 690 LLEI with which the receiving logical link endpoint believes it 691 already has an L3DL session (OPENs have already been exchanged), the 692 receiver MUST assume that the sending LLEI or entire device has been 693 reset. All discovered encapsulation data SHOULD be withdrawn via the 694 BGP-LS API and the recipient MUST respond with a new OPEN. In this 695 circumstance encapsulations SHOULD NOT be kept because, while the new 696 OPEN is likely to be followed by new encapsulation PDUs of the same 697 data, the old session might have an encapsulation type not in the new 698 session. 700 12. ACK 702 The ACK PDU acknowledges receipt of a PDU and reports any error 703 condition which might have been raised. 705 0 1 2 3 706 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 | Type = 3 | Payload Length = 5 | PDU Type | 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 | EType | Error Code | Error Hint | 711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | Sig Type | Signature Length | ~ 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 714 ~ Signature ... ~ 715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 717 The ACK acknowledges receipt of an OPEN, Encapsulation, VENDOR PDU, 718 etc. 720 The PDU Type is the Type of the PDU being acknowledged, e.g., OPEN or 721 one of the Encapsulations. 723 If there was an error processing the received PDU, then the EType is 724 non-zero. If the EType is zero, Error Code and Error Hint MUST also 725 be zero. 727 A non-zero EType is the receiver's way of telling the PDU's sender 728 that the receiver had problems processing the PDU. The Error Code 729 and Error Hint will tell the sender more detail about the error. 731 The decimal value of EType gives a strong hint how the receiver 732 sending the ACK believes things should proceed: 734 0 - No Error, Error Code and Error Hint MUST be zero 735 1 - Warning, something not too serious happened, continue 736 2 - Session should not be continued, try to restart 737 3 - Restart is hopeless, call the operator 738 4-15 - Reserved 740 Someone stuck in the 1990s might think of the error codes as 0x1zzz, 741 0x2zzz, etc. They might be right. Or not. 743 The Error Code indicates the type of error. 745 The Error Hint is any additional data the sender of the error PDU 746 thinks will help the recipient or the debugger with the particular 747 error. 749 The Signature fields are described in Section 8. 751 12.1. Retransmission 753 If a PDU sender expects an ACK, e.g. for an OPEN, an Encapsulation, a 754 VENDOR PDU, etc., and does not receive the ACK for a configurable 755 time (default one second), and the interface is live at layer 2, the 756 sender resends the PDU using exponential back-off, see [RFC1122]. 757 This cycle MAY be repeated a configurable number of times (default 758 three) before it is considered a failure. The session MAY BE 759 considered closed in case of this ACK failure. 761 If the link is broken at layer 2, retransmission MAY BE retried when 762 the link comes back up if data have not changed in the interim. 764 13. The Encapsulations 766 Once the devices know each other's LLEIs, know each other's upper 767 layer identities, have means to ensure link state, etc., the L3DL 768 session is considered established, and the devices SHOULD exchange L3 769 interface encapsulations, L3 addresses, and L2.5 labels. 771 The Encapsulation types the peers exchange may be IPv4 Announcement 772 (Section 13.3), IPv6 Announcement (Section 13.4), MPLS IPv4 773 Announcement (Section 13.6), MPLS IPv6 Announcement (Section 13.7), 774 and/or possibly others not defined here. 776 The sender of an Encapsulation PDU MUST NOT assume that the peer is 777 capable of the same Encapsulation Type. An ACK (Section 12) merely 778 acknowledges receipt. Only if both peers have sent the same 779 Encapsulation Type is it safe to assume that they are compatible for 780 that type. 782 A receiver of an encapsulation might recognize an addressing 783 conflict, such as both ends of the link trying to use the same 784 address. In this case, the receiver SHOULD respond with an ERROR 785 (Error Code 1) instead of an ACK. As there may be other usable 786 addresses or encapsulations, this error might log and continue, 787 letting an upper layer topology builder deal with what works. 789 Further, to consider a logical link of a type to formally be 790 established so that it may be pushed up to upper layer protocols, the 791 addressing for the type must be compatible, e.g. on the same IP 792 subnet. 794 13.1. The Encapsulation PDU Skeleton 796 The header for all encapsulation PDUs is as follows: 798 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 800 | Type | Payload Length | Count | 801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 802 | ... | Encapsulation List... | 803 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 804 | Sig Type | Signature Length | ~ 805 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 806 ~ Signature ... ~ 807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 809 The 16-bit Count is the number of Encapsulations in the Encapsulation 810 list. 812 An Encapsulation PDU describes zero or more addresses of the 813 encapsulation type. 815 An Encapsulation PDU of Type T replaces all previous encapsulations 816 of Type T. 818 To remove all encapsulations of Type T, the sender uses a Count of 819 zero. 821 If an LLEI has multiple addresses for an encapsulation type, one and 822 only one address SHOULD be configured to be marked as primary, see 823 Section 13.2. 825 Loopback addresses are generally not seen directly on an external 826 interface. One or more loopback addresses MAY be exposed by 827 configuration on one or more L3DL speaking external interfaces, e.g. 828 for iBGP peering. They SHOULD be marked as such, see Section 13.2. 830 If there is exactly one non-loopback address for an encapsulation 831 type on an interface, it SHOULD be marked as primary. 833 If a sender has multiple links on the same interface, separate data, 834 ACKs, etc. must be kept for each peer. 836 Over time, multiple Encapsulation PDUs may be sent for an interface 837 as configuration changes. 839 If the length of an Encapsulation PDU exceeds the Datagram size limit 840 on media, the PDU is broken into multiple Datagrams. See Section 8. 842 The Signature fields are described in Section 8. 844 The Receiver MUST acknowledge the Encapsulation PDU with a Type=3, 845 ACK PDU (Section 12) with the Encapsulation Type being that of the 846 encapsulation being announced, see Section 12. 848 If the Sender does not receive an ACK in a configurable interval 849 (default one second), and the interface is live at layer 2, they 850 SHOULD retransmit. After a user configurable number of failures, the 851 L3DL session should be considered dead and the OPEN process SHOULD be 852 restarted. 854 If the link is broken at layer 2, retransmission MAY BE retried if 855 data have not changed in the interim. 857 13.2. Prim/Loop Flags 859 0 1 2 3 ... 7 860 +---------------+---------------+---------------+---------------+ 861 | Primary | Loopback | Reserved ... | | 862 +---------------+---------------+---------------+---------------+ 864 Each Encapsulation interface address MAY be marked as a primary 865 address, and/or a loopback, in which case the respective bit is set 866 to one. 868 Only one address MAY be marked as primary for an encapsulation type. 870 13.3. IPv4 Encapsulation 872 The IPv4 Encapsulation describes a device's ability to exchange IPv4 873 packets on one or more subnets. It does so by stating the 874 interface's addresses and the corresponding prefix lengths. 876 0 1 2 3 877 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 879 | Type = 4 | Payload Length | Count | 880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 881 | ... | PrimLoop Flags| IPv4 Address | 882 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 883 | ... | PrefixLen | more ... | 884 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 885 | Sig Type | Signature Length | ~ 886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 887 ~ Signature ... ~ 888 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 889 The 16-bit Count is the number of IPv4 Encapsulations. 891 13.4. IPv6 Encapsulation 893 The IPv6 Encapsulation describes a logical link's ability to exchange 894 IPv6 packets on one or more subnets. It does so by stating the 895 interface's addresses and the corresponding prefix lengths. 897 0 1 2 3 898 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 900 | Type = 5 | Payload Length | Count | 901 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 902 | ... | PrimLoop Flags| | 903 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 904 | | 905 + + 906 | | 907 + + 908 | IPv6 Address | 909 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 910 | | PrefixLen | more ... | 911 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 912 | Sig Type | Signature Length | ~ 913 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 914 ~ Signature ... ~ 915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 917 The 16-bit Count is the number of IPv6 Encapsulations. 919 13.5. MPLS Label List 921 As an MPLS enabled interface may have a label stack, see [RFC3032], a 922 variable length list of labels is needed. 924 0 1 2 3 925 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 927 | Label Count | Label | Exp |S| 928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 929 | Label | Exp |S| more ... | 930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 932 A Label Count of zero is an implicit withdraw of all labels for that 933 prefix on that interface. 935 13.6. MPLS IPv4 Encapsulation 937 The MPLS IPv4 Encapsulation describes a logical link's ability to 938 exchange labeled IPv4 packets on one or more subnets. It does so by 939 stating the interface's addresses the corresponding prefix lengths, 940 and the corresponding labels. 942 0 1 2 3 943 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 945 | Type = 6 | Payload Length | Count | 946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 947 | ... | PrimLoop Flags| MPLS Label List ... | 948 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 949 | ... | IPv4 Address | 950 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 951 | ... | PrefixLen | more ... | 952 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 953 | Sig Type | Signature Length | ~ 954 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 955 ~ Signature ... ~ 956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 958 The 16-bit Count is the number of MPLSv6 Encapsulations. 960 13.7. MPLS IPv6 Encapsulation 962 The MPLS IPv6 Encapsulation describes a logical link's ability to 963 exchange labeled IPv6 packets on one or more subnets. It does so by 964 stating the interface's addresses, the corresponding prefix lengths, 965 and the corresponding labels. 967 0 1 2 3 968 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 969 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 970 | Type = 7 | Payload Length | Count | 971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 972 | ... | PrimLoop Flags| MPLS Label List ... | 973 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 974 | ... | | 975 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 976 | | 977 + + 978 | | 979 + + 980 | IPv6 Address | 981 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 982 | | Prefix Len | more ... | 983 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 984 | Sig Type | Signature Length | ~ 985 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 986 ~ Signature ... ~ 987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 989 The 16-bit Count is the number of MPLSv6 Encapsulations. 991 14. KEEPALIVE - Layer 2 Liveness 993 L3DL devices SHOULD beacon frequent Layer 2 KEEPALIVE PDUs to ensure 994 session continuity. A receiver may choose to ignore KEEPALIVE PDUs. 996 An operational deployment MUST BE configured to use KEEPALIVEs or 997 not, either globally, or down to per-link granularity. Disagreement 998 MAY result in repeated session break and reestablishment. 1000 KEEPALIVEs SHOULD be beaconed at a configured frequency. One per 1001 second is the default. Layer 3 liveness, such as BFD, may be more 1002 (or less) aggressive. 1004 If a KEEPALIVE is not received from a peer with which a receiver has 1005 an open session for a configurable time (default 30 seconds), the 1006 link SHOULD BE presumed down. The devices MAY keep configuration 1007 state and restore it without retransmission if no data have changed. 1008 Otherwise, a new session SHOULD BE established and new Encapsulation 1009 PDUs exchanged. 1011 0 1 2 3 1012 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1013 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1014 | Type = 2 | Payload Length = 0 | Sig Type = 0 | 1015 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1016 | Signature Length = 0 | 1017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1019 15. VENDOR - Vendor Extensions 1021 0 1 2 3 1022 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1023 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1024 | Type = 255 | Payload Length | ... | 1025 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1026 | Enterprise Number | Ent Type | 1027 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1028 | Enterprise Data ... | 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1030 | Sig Type | Signature Length | ~ 1031 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ 1032 ~ Signature ... ~ 1033 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1035 Vendors or enterprises may define TLVs beyond the scope of L3DL 1036 standards. This is done using a Private Enterprise Number [IANA-PEN] 1037 followed by Enterprise Data in a format defined for that Enterprise 1038 Number and Ent Type. 1040 Ent Type allows a VENDOR PDU to be sub-typed in the event that the 1041 vendor/enterprise needs multiple PDU types. 1043 As with Encapsulation PDUs, a receiver of a VENDOR PDU MUST respond 1044 with an ACK or an ERROR PDU. Similarly, a VENDOR PDU MUST only be 1045 sent over an open session. 1047 16. Layers 2.5 and 3 Liveness 1049 Layer 2 liveness may be continuously tested by KEEPALIVE PDUs, see 1050 Section 14. As layer 2.5 or layer 3 connectivity could still break, 1051 liveness above layer 2 MAY be frequently tested using BFD ([RFC5880]) 1052 or a similar technique. 1054 This protocol assumes that one or more Encapsulation addresses will 1055 be used to ping, BFD, or whatever the operator configures. 1057 17. The North/South Protocol 1059 Thus far, a one-hop point-to-point logical link discovery protocol 1060 has been defined. 1062 The devices know their unique LLEIs and know the unique peer LLEIs 1063 and Encapsulations on each logical link interface. 1065 Full topology discovery is not appropriate at the L3DL layer, so 1066 Dijkstra a la IS-IS etc. is assumed to be done by higher level 1067 protocols such as BGP-SPF. 1069 Therefore the LLEIs, link Encapsulations, and state changes are 1070 pushed North via a small subset of the BGP-LS API. The upper layer 1071 routing protocol(s), e.g. BGP-SPF, learn and maintain the topology, 1072 run Dijkstra, and build the routing database(s). 1074 For example, if a neighbor's IPv4 Encapsulation address changes, the 1075 devices seeing the change push that change Northbound. 1077 17.1. Use BGP-LS as Much as Possible 1079 BGP-LS [RFC7752] defines BGP-like Datagrams describing logical link 1080 state (links, nodes, link prefixes, and many other things), and a new 1081 BGP path attribute providing Northbound transport, all of which can 1082 be ingested by upper layer protocols such as BGP-SPF; see Section 4 1083 of [I-D.ietf-lsvr-bgp-spf]. 1085 For IPv4 links, TLVs 259 and 260 are used. For IPv6 links, TLVs 261 1086 and 262. If there are multiple addresses on a link, multiple TLV 1087 pairs are pushed North, having the same ID pairs. 1089 17.2. Extensions to BGP-LS 1091 The Northbound protocol needs a few minor extensions to BGP-LS. 1092 Luckily, others have needed the same extensions. 1094 Similarly to BGP-SPF, the BGP protocol is used in the Protocol-ID 1095 field specified in table 1 of 1096 [I-D.ietf-idr-bgpls-segment-routing-epe]. The local and remote node 1097 descriptors for all NLRI are the IDs described in Section 11. This 1098 is equivalent to an adjacency SID or a node SID if the address is a 1099 loopback address. 1101 Label Sub-TLVs from [I-D.ietf-idr-bgp-ls-segment-routing-ext] 1102 Section 2.1.1, are used to associate one or more MPLS Labels with a 1103 link. 1105 18. Discussion 1107 This section explores some trade-offs taken and some considerations. 1109 18.1. HELLO Discussion 1111 A device with multiple Layer 2 interfaces, traditionally called a 1112 switch, may be used to forward frames and therefore packets from 1113 multiple devices to one logical interface (LLEI), I, on an L3DL 1114 speaking device. Interface I could discover a peer J across the 1115 switch. Later, a prospective peer K could come up across the switch. 1116 If I was not still sending and listening for HELLOs, the potential 1117 peering with K could not be discovered. Therefore, interfaces MUST 1118 continue to send HELLOs as long as they are turned up. 1120 18.2. HELLO versus KEEPALIVE 1122 Both HELLO and KEEPALIVE are periodic. KEEPALIVE might be eliminated 1123 in favor of keeping only HELLOs. But KEEPALIVEs are unicast, and 1124 thus less noisy on the network, especially if HELLO is configured to 1125 transit layer-2-only switches, see Section 18.1. 1127 19. VLANs/SVIs/Sub-interfaces 1129 One can think of the protocol as an instance (i.e. state machine) 1130 which runs on each logical link of a device. 1132 As the upper routing layer must view VLAN topologies as separate 1133 graphs, L3DL treats VLAN ports as separate links. 1135 L3DL PDUs learned over VLAN-ports may be interpreted by upper layer-3 1136 routing protocols as being learned on the corresponding layer-3 SVI 1137 interface for the VLAN. 1139 As Sub-Interfaces each have their own LLIEs, they act as separate 1140 interfaces, forming their own links. 1142 20. Implementation Considerations 1144 An implementation SHOULD provide the ability to configure a logical 1145 interface as L3DL speaking or not. 1147 An implementation SHOULD provide the ability to configure whether 1148 HELLOs on an L3DL enabled interface send Nearest Bridge or Nearest 1149 non-TPMR Bridge multicast frames from that interface; see Section 10. 1151 An implementation SHOULD provide the ability to distribute one or 1152 more loopback addresses or interfaces into L3DL on an external L3DL 1153 speaking interface. 1155 An implementation SHOULD provide the ability to configure one of the 1156 addresses of an encapsulation as primary on an L3DL speaking 1157 interface. If there is only one address for a particular 1158 encapsulation, the implementation MAY mark it as primary by default. 1160 21. Security Considerations 1162 The protocol as it is MUST NOT be used outside a datacenter or 1163 similarly closed environment due to lack of formal definition of the 1164 authentication and authorization mechanism. Sufficient mechanisms 1165 may be described in separate documents. 1167 Many MDC operators have a strange belief that physical walls and 1168 firewalls provide sufficient security. This is not credible. All 1169 MDC protocols need to be examined for exposure and attack surface. 1170 In the case of L3DL, Authentication and Integrity as provided in 1171 [draft-ymbk-l3dl-signing] is strongly recommended. 1173 It is generally unwise to assume that on the wire Layer 2 is secure. 1174 Strange/unauthorized devices may plug into a port. Mis-wiring is 1175 very common in datacenter installations. A poisoned laptop might be 1176 plugged into a device's port, form malicious sessions, etc. to 1177 divert, intercept, or drop traffic. 1179 Similarly, malicious nodes/devices could mis-announce addressing. 1181 If OPENs are not being authenticated, an attacker could forge an OPEN 1182 for an existing session and cause the session to be reset. 1184 For these reasons, the OPEN PDU's authentication data exchange SHOULD 1185 be used. 1187 22. IANA Considerations 1189 This document requests the IANA create a registry for L3DL PDU Type, 1190 which may range from 0 to 255. The name of the registry should be 1191 L3DL-PDU-Type. The policy for adding to the registry is RFC Required 1192 per [RFC5226], either standards track or experimental. The initial 1193 entries should be the following: 1195 PDU 1196 Code PDU Name 1197 ---- ------------------- 1198 0 HELLO 1199 1 OPEN 1200 2 KEEPALIVE 1201 3 ACK 1202 4 IPv4 Announcement 1203 5 IPv6 Announcement 1204 6 MPLS IPv4 Announcement 1205 7 MPLS IPv6 Announcement 1206 8-254 Reserved 1207 255 VENDOR 1209 This document requests the IANA create a registry for L3DL Signature 1210 Type, AKA Sig Type, which may range from 0 to 255. The name of the 1211 registry should be L3DL-Signature-Type. The policy for adding to the 1212 registry is RFC Required per [RFC5226], either standards track or 1213 experimental. The initial entries should be the following: 1215 Number Name 1216 ------ ------------------- 1217 0 Null 1218 1-255 Reserved 1220 This document requests the IANA create a registry for L3DL PL Flag 1221 Bits, which may range from 0 to 7. The name of the registry should 1222 be L3DL-PL-Flag-Bits. The policy for adding to the registry is RFC 1223 Required per [RFC5226], either standards track or experimental. The 1224 initial entries should be the following: 1226 Bit Bit Name 1227 ---- ------------------- 1228 0 Primary 1229 1 Loopback 1230 2-7 Reserved 1232 This document requests the IANA create a registry for L3DL Error 1233 Codes, a 16 bit integer. The name of the registry should be L3DL- 1234 Error-Codes. The policy for adding to the registry is RFC Required 1235 per [RFC5226], either standards track or experimental. The initial 1236 entries should be the following: 1238 Error 1239 Code Error Name 1240 ---- ------------------- 1241 0 Reserved 1242 1 Logical Link Addressing Conflict 1243 2 Authorization Failure in OPEN 1244 3 Signature Failure in PDU 1246 23. IEEE Considerations 1248 This document requires a new EtherType. 1250 24. Acknowledgments 1252 The authors thank Cristel Pelsser for multiple reviews, Jeff Haas for 1253 review and comments, Joe Clarke for a useful review, John Scudder for 1254 deeply serious review and comments, Larry Kreeger for a lot of layer 1255 2 clue, Martijn Schmidt for his contribution, Neeraj Malhotra for 1256 review, Russ Housley for checksum discussion and sBox, and Steve 1257 Bellovin for checksum advice. 1259 25. References 1261 25.1. Normative References 1263 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 1264 Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., 1265 and M. Chen, "BGP Link-State extensions for Segment 1266 Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-12 1267 (work in progress), March 2019. 1269 [I-D.ietf-idr-bgpls-segment-routing-epe] 1270 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 1271 S., and J. Dong, "BGP-LS extensions for Segment Routing 1272 BGP Egress Peer Engineering", draft-ietf-idr-bgpls- 1273 segment-routing-epe-18 (work in progress), March 2019. 1275 [I-D.ietf-lsvr-bgp-spf] 1276 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, 1277 "Shortest Path Routing Extensions for BGP Protocol", 1278 draft-ietf-lsvr-bgp-spf-04 (work in progress), December 1279 2018. 1281 [IANA-PEN] 1282 "IANA Private Enterprise Numbers", 1283 . 1286 [IEEE.802_2001] 1287 IEEE, "IEEE Standard for Local and Metropolitan Area 1288 Networks: Overview and Architecture", IEEE 802-2001, 1289 DOI 10.1109/ieeestd.2002.93395, July 2002, 1290 . 1292 [IEEE802-2014] 1293 Institute of Electrical and Electronics Engineers, "Local 1294 and Metropolitan Area Networks: Overview and 1295 Architecture", IEEE Std 802-2014, 2014. 1297 [RFC1213] McCloghrie, K. and M. Rose, "Management Information Base 1298 for Network Management of TCP/IP-based internets: MIB-II", 1299 STD 17, RFC 1213, DOI 10.17487/RFC1213, March 1991, 1300 . 1302 [RFC1629] Colella, R., Callon, R., Gardner, E., and Y. Rekhter, 1303 "Guidelines for OSI NSAP Allocation in the Internet", 1304 RFC 1629, DOI 10.17487/RFC1629, May 1994, 1305 . 1307 [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, 1308 DOI 10.17487/RFC1982, August 1996, 1309 . 1311 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1312 Requirement Levels", BCP 14, RFC 2119, 1313 DOI 10.17487/RFC2119, March 1997, 1314 . 1316 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1317 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1318 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1319 . 1321 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1322 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1323 DOI 10.17487/RFC4271, January 2006, 1324 . 1326 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1327 IANA Considerations Section in RFCs", RFC 5226, 1328 DOI 10.17487/RFC5226, May 2008, 1329 . 1331 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1332 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1333 . 1335 [RFC6286] Chen, E. and J. Yuan, "Autonomous-System-Wide Unique BGP 1336 Identifier for BGP-4", RFC 6286, DOI 10.17487/RFC6286, 1337 June 2011, . 1339 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1340 S. Ray, "North-Bound Distribution of Link-State and 1341 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1342 DOI 10.17487/RFC7752, March 2016, 1343 . 1345 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1346 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1347 May 2017, . 1349 25.2. Informative References 1351 [Clos0] Clos, C., "A study of non-blocking switching networks 1352 [PAYWALLED]", Bell System Technical Journal 32 (2), pp 1353 406-424, March 1953. 1355 [Clos1] "Clos Network", 1356 . 1358 [I-D.malhotra-bess-evpn-lsoe] 1359 Malhotra, N., Patel, K., and J. Rabadan, "LSoE-based PE-CE 1360 Control Plane for EVPN", draft-malhotra-bess-evpn-lsoe-00 1361 (work in progress), March 2019. 1363 [JUPITER] Singh, A., Germano, P., Kanagala, A., Liu, H., Provost, 1364 J., Simmons, J., Tanda, E., Wanderer, J., HAP.lzle, U., 1365 Stuart, S., Vahdat, A., Ong, J., Agarwal, A., Anderson, 1366 G., Armistead, A., Bannon, R., Boving, S., Desai, G., and 1367 B. Felderman, "Jupiter rising", Communications of the 1368 ACM Vol. 59, pp. 88-97, DOI 10.1145/2975159, August 2016. 1370 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1371 DOI 10.17487/RFC0791, September 1981, 1372 . 1374 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1375 Communication Layers", STD 3, RFC 1122, 1376 DOI 10.17487/RFC1122, October 1989, 1377 . 1379 Authors' Addresses 1381 Randy Bush 1382 Arrcus & IIJ 1383 5147 Crystal Springs 1384 Bainbridge Island, WA 98110 1385 United States of America 1387 Email: randy@psg.com 1389 Rob Austein 1390 Arrcus, Inc 1392 Email: sra@hactrn.net 1394 Keyur Patel 1395 Arrcus 1396 2077 Gateway Place, Suite #400 1397 San Jose, CA 95119 1398 United States of America 1400 Email: keyur@arrcus.com