idnits 2.17.1 draft-ietf-lsvr-l3dl-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 29, 2020) is 1357 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-18) exists of draft-ietf-idr-bgp-ls-segment-routing-ext-16 == Outdated reference: A later version (-29) exists of draft-ietf-lsvr-bgp-spf-10 -- Possible downref: Non-RFC (?) normative reference: ref. 'IANA-PEN' -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE802-2014' ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Arrcus & Internet Initiative Japan 4 Intended status: Standards Track R. Austein 5 Expires: January 30, 2021 K. Patel 6 Arrcus 7 July 29, 2020 9 Layer 3 Discovery and Liveness 10 draft-ietf-lsvr-l3dl-06 12 Abstract 14 In Massive Data Centers, BGP-SPF and similar routing protocols are 15 used to build topology and reachability databases. These protocols 16 need to discover IP Layer 3 attributes of links, such as neighbor IP 17 addressing, logical link IP encapsulation abilities, and link 18 liveness. This Layer 3 Discovery and Liveness protocol collects 19 these data, which may then be disseminated using BGP-SPF and similar 20 protocols. 22 Requirements Language 24 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 25 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 26 "OPTIONAL" in this document are to be interpreted as described in BCP 27 14 [RFC2119] [RFC8174] when, and only when, they appear in all 28 capitals, as shown here. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on January 30, 2021. 47 Copyright Notice 49 Copyright (c) 2020 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 66 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 4. Top Level Overview . . . . . . . . . . . . . . . . . . . . . 6 68 5. Inter-Link Protocol Overview . . . . . . . . . . . . . . . . 7 69 5.1. L3DL Ladder Diagram . . . . . . . . . . . . . . . . . . . 7 70 6. Transport Layer . . . . . . . . . . . . . . . . . . . . . . . 9 71 7. The Checksum . . . . . . . . . . . . . . . . . . . . . . . . 11 72 8. TLV PDUs . . . . . . . . . . . . . . . . . . . . . . . . . . 13 73 9. Logical Link Endpoint Identifier . . . . . . . . . . . . . . 14 74 10. HELLO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 75 11. OPEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 76 12. ACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 77 12.1. Retransmission . . . . . . . . . . . . . . . . . . . . . 20 78 13. The Encapsulations . . . . . . . . . . . . . . . . . . . . . 20 79 13.1. The Encapsulation PDU Skeleton . . . . . . . . . . . . . 21 80 13.2. Encapsulaion Flags . . . . . . . . . . . . . . . . . . . 22 81 13.3. IPv4 Encapsulation . . . . . . . . . . . . . . . . . . . 22 82 13.4. IPv6 Encapsulation . . . . . . . . . . . . . . . . . . . 23 83 13.5. MPLS Label List . . . . . . . . . . . . . . . . . . . . 24 84 13.6. MPLS IPv4 Encapsulation . . . . . . . . . . . . . . . . 24 85 13.7. MPLS IPv6 Encapsulation . . . . . . . . . . . . . . . . 25 86 14. VENDOR - Vendor Extensions . . . . . . . . . . . . . . . . . 25 87 15. KEEPALIVE - Layer 2 Liveness . . . . . . . . . . . . . . . . 26 88 16. Layers 2.5 and 3 Liveness . . . . . . . . . . . . . . . . . . 27 89 17. The North/South Protocol . . . . . . . . . . . . . . . . . . 27 90 17.1. Use BGP-LS as Much as Possible . . . . . . . . . . . . . 28 91 17.2. Extensions to BGP-LS . . . . . . . . . . . . . . . . . . 28 92 18. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 28 93 18.1. HELLO Discussion . . . . . . . . . . . . . . . . . . . . 28 94 18.2. HELLO versus KEEPALIVE . . . . . . . . . . . . . . . . . 29 96 19. VLANs/SVIs/Sub-interfaces . . . . . . . . . . . . . . . . . . 29 97 20. Implementation Considerations . . . . . . . . . . . . . . . . 29 98 21. Security Considerations . . . . . . . . . . . . . . . . . . . 30 99 22. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 100 22.1. PDU Types . . . . . . . . . . . . . . . . . . . . . . . 30 101 22.2. Signature Type . . . . . . . . . . . . . . . . . . . . . 31 102 22.3. Flag Bits . . . . . . . . . . . . . . . . . . . . . . . 31 103 22.4. Error Codes . . . . . . . . . . . . . . . . . . . . . . 31 104 23. IEEE Considerations . . . . . . . . . . . . . . . . . . . . . 32 105 24. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32 106 25. References . . . . . . . . . . . . . . . . . . . . . . . . . 32 107 25.1. Normative References . . . . . . . . . . . . . . . . . . 32 108 25.2. Informative References . . . . . . . . . . . . . . . . . 34 109 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35 111 1. Introduction 113 The Massive Data Center (MDC) environment presents unusual problems 114 of scale, e.g. O(10,000) forwarding devices, while its homogeneity 115 presents opportunities for simple approaches. Approaches such as 116 Jupiter Rising [JUPITER] use a central controller to deal with 117 scaling, while BGP-SPF [I-D.ietf-lsvr-bgp-spf] provides massive 118 scale-out without centralization using a tried and tested scalable 119 distributed control plane, offering a scalable routing solution in 120 Clos [Clos0][Clos1] and similar environments. But BGP-SPF and 121 similar higher level device-spanning protocols, e.g. 122 [I-D.malhotra-bess-evpn-lsoe], need logical link state and addressing 123 data from the network to build the routing topology. They also need 124 prompt but prudent reaction to (logical) link failure. 126 Layer 3 Discovery and Liveness (L3DL) provides brutally simple 127 mechanisms for devices to 129 o Discover each other's unique endpoint identification, 131 o Discover mutually supported layer 3 encapsulations, e.g. IP/MPLS, 133 o Discover Layer 3 IP and/or MPLS addressing of interfaces of the 134 encapsulations, 136 o Present these data, using a very restricted profile of a BGP-LS 137 [RFC7752] API, to BGP-SPF which computes the topology and builds 138 routing and forwarding tables, 140 o Enable Layer 3 link liveness such as BFD, 142 o Provide Layer 2 keep-alive messages for session continuity, and 143 finally 145 o Provide for authenticity verification of protocol messages. 147 In this document, the use case for L3DL is for point to point links 148 in a datacenter Clos in order to exchange the data needed for BGP-SPF 149 [I-D.ietf-lsvr-bgp-spf] bootstrap and continuity. Once layer two 150 connectivity has been leveraged to get layer three addressability and 151 forwarding capabilities, normal layer three forwarding and routing 152 can take over. 154 L3DL might be found to be more widely applicable to a range of 155 routing and similar protocols which need layer three discovery and 156 characterisation. 158 2. Terminology 160 Even though it concentrates on the inter-device layer, this document 161 relies heavily on routing terminology. The following attempts to 162 clarify the use of some possibly confusing terms: 164 ASN: Autonomous System Number [RFC4271], a BGP identifier for 165 an originator of Layer 3 routes, particularly BGP 166 announcements. 167 BGP-LS: A mechanism by which link-state and TE information can be 168 collected from networks and shared with external 169 components using the BGP routing protocol. See [RFC7752]. 170 BGP-SPF A hybrid protocol using BGP transport but a Dijkstra 171 Shortest Path First decision process. See 172 [I-D.ietf-lsvr-bgp-spf]. 173 Clos: A hierarchic subset of a crossbar switch topology commonly 174 used in data centers. 175 Datagram: The L3DL content of a single Layer 2 frame, sans Ethernet 176 framing. A full L3DL PDU may be packaged in multiple 177 Datagrams. 178 Encapsulation: Address Family Indicator and Subsequent Address 179 Family Indicator (AFI/SAFI). I.e. classes of layer 2.5 180 and 3 addresses such as IPv4, IPv6, MPLS, etc. 181 Frame: A Layer 2 Ethernet packet. 182 Link or Logical Link: A logical connection between two logical ports 183 on two devices. E.g. two VLANs between the same two ports 184 are two links. 185 LLEI: Logical Link Endpoint Identifier, the unique identifier of 186 one end of a logical link, see Section 9. 187 MAC Address: 48-bit Layer 2 addresses are assumed since they are 188 used by all widely deployed Layer 2 network technologies 189 of interest, especially Ethernet. See [IEEE.802_2001]. 190 MDC: Massive Data Center, commonly composed of thousands of Top 191 of Rack Switches (TORs). 193 MTU: Maximum Transmission Unit, the size in octets of the 194 largest packet that can be sent on a medium, see [RFC1122] 195 1.3.3. 196 PDU: Protocol Data Unit, an L3DL application layer message. A 197 PDU's content may need to be broken into multiple 198 Datagrams to make it through MTU or other restrictions. 199 RouterID: An 32-bit identifier unique in the current routing domain, 200 see [RFC6286]. 201 Session: An established, via OPEN PDUs, session between two L3DL 202 capable link end-points, 203 SPF: Shortest Path First, an algorithm for finding the shortest 204 paths between nodes in a graph; AKA Dijkstra's algorithm. 205 System Identifier: An eight octet ISO System Identifier a la 206 [RFC1629] System ID 207 TOR: Top Of Rack switch, aggregates the servers in a rack and 208 connects to aggregation layers of the Clos tree, AKA the 209 Clos spine. 210 ZTP: Zero Touch Provisioning gives devices initial addresses, 211 credentials, etc. on boot/restart. 213 3. Background 215 L3DL is primarily designed for a Clos type datacenter scale and 216 topology, but can accommodate richer topologies which contain 217 potential cycles. 219 While L3DL is designed for the MDC, there are no inherent reasons it 220 could not run on a WAN. The authentication and authorization needed 221 to run safely on a WAN need to be considered, and the appropriate 222 level of security options chosen. 224 L3DL assumes a new IEEE assigned EtherType (TBD). 226 The number of addresses of one Encapsulation type on an interface 227 link may be quite large given a TOR with tens of servers, each server 228 having a few hundred micro-services, resulting in an inordinate 229 number of addresses. And highly automated micro-service migration 230 can cause serious address prefix disaggregation, resulting in 231 interfaces with thousands of disaggregated prefixes. 233 Therefore the L3DL protocol is session oriented and uses incremental 234 announcement and withdrawal with session restart, a la BGP 235 ([RFC4271]). 237 4. Top Level Overview 239 o Devices discover each other on logical links 241 o Logical Link Endpoint Identifiers (LLEIs) are exchanged 243 o Layer 2 Liveness checks may be started 245 o Encapsulation data are exchanged and IP-Level Liveness checks 246 enabled 248 o A BGP-like upper layer protocol is assumed to use the identifiers 249 and encapsulation data to discover and build a topology database 251 +-------------------+ +-------------------+ +-------------------+ 252 | Device | | Device | | Device | 253 | | | | | | 254 |+-----------------+| |+-----------------+| |+-----------------+| 255 || || || || || || 256 || BGP-SPF <+---+> BGP-SPF <+---+> BGP-SPF || 257 || || || || || || 258 |+--------^--------+| |+--------^--------+| |+--------^--------+| 259 | | | | | | | | | 260 | | | | | | | | | 261 |+--------+--------+| |+--------+--------+| |+--------+--------+| 262 || Encapsulations || || Encapsulations || || Encapsulations || 263 || Addresses || || Addresses || || Addresses || 264 || L2 Liveness || || L2 Liveness || || L2 Liveness || 265 |+--------^--------+| |+--------^--------+| |+--------^--------+| 266 | | | | | | | | | 267 | | | | | | | | | 268 |+--------v--------+| |+--------v--------+| |+--------v--------+| 269 || || || || || || 270 ||Inter-Device PDUs<+---+>Inter-Device PDUs<+---+>Inter-Device PDUs|| 271 || || || || || || 272 |+-----------------+| |+-----------------+| |+-----------------+| 273 +-------------------+ +-------------------+ +-------------------+ 275 There are two protocols, the inter-device (left-right in the diagram) 276 per-link layer 3 discovery and the API to the upper level BGP-like 277 routing protocol (up-down in the above diagram): 279 o Inter-device PDUs are used to exchange device and logical link 280 identities and layer 2.5 (MPLS) and 3 identifiers (not payloads), 281 e.g. device IDs, port identities, VLAN IDs, Encapsulations, and IP 282 addresses. 284 o A Link Layer to BGP API presents these data up the stack to a BGP 285 protocol or an other device-spanning upper layer protocol, 286 presenting them using the BGP-LS BGP-like data format. 288 The upper layer BGP family routing protocols cross all the devices, 289 though they are not part of these L3DL protocols. 291 To simplify this document, Layer 2 framing is not shown. L3DL is 292 about layer 3. 294 5. Inter-Link Protocol Overview 296 Two devices discover each other and their respective identities by 297 sending multicast HELLO PDUs (Section 10). To assure discovery of 298 new devices coming up on a multi-link topology, devices on such a 299 topology, and only on a multi-link topology, send periodic HELLOs 300 forever, see Section 18.1. 302 Once a new device is recognized, both devices attempt to negotiate 303 and establish a session by sending unicast OPEN PDUs (Section 11) to 304 the source MAC addresses (plus VIDs if VLANs) of the received HELLOs. 305 Once a session is established through the OPEN exchange, the 306 Encapsulations (Section 13) configured on an end point may be 307 announced and modified. Note that these are only the encapsulation 308 and addresses configured on the announcing interface; though a 309 device's loopback and overlay interface(s) may also be announced. 310 When two devices on a link have compatible Encapsulations and 311 addresses, i.e. the same AFI/SAFI and the same subnet, the link is 312 announced via the BGP-LS API. 314 5.1. L3DL Ladder Diagram 316 The HELLO, Section 10, is a priming message sent on all configured 317 logical links. It is a small L3DL PDU encapsulated in an Ethernet 318 multicast frame with the simple goal of discovering the identities of 319 logical link endpoint(s) reachable from a Logical Link Endpoint, 320 Section 9. 322 The HELLO and OPEN, Section 11, PDUs, which are used to discover and 323 exchange detailed Logical Link Endpoint Identifiers, LLEIs, and the 324 ACK/ERROR PDU, are mandatory; other PDUs are optional; though at 325 least one encapsulation SHOULD be agreed at some point. 327 The following is a ladder-style diagram of the L3DL protocol 328 exchanges: 330 | HELLO | Logical Link Peer discovery 331 |---------------------------->| 332 | HELLO | Mandatory 333 |<----------------------------| 334 | | 335 | | 336 | OPEN | MACs, IDs, etc. 337 |---------------------------->| 338 | ACK | 339 |<----------------------------| 340 | | 341 | OPEN | Mandatory 342 |<----------------------------| 343 | ACK | 344 |---------------------------->| 345 | | 346 | | 347 | Interface IPv4 Addresses | Interface IPv4 Addresses 348 |---------------------------->| Optional 349 | ACK | 350 |<----------------------------| 351 | | 352 | Interface IPv4 Addresses | 353 |<----------------------------| 354 | ACK | 355 |---------------------------->| 356 | | 357 | | 358 | Interface IPv6 Addresses | Interface IPv6 Addresses 359 |---------------------------->| Optional 360 | ACK | 361 |<----------------------------| 362 | | 363 | Interface IPv6 Addresses | 364 |<----------------------------| 365 | ACK | 366 |---------------------------->| 367 | | 368 | | 369 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 370 |---------------------------->| Optional 371 | ACK | 372 |<----------------------------| 373 | | 374 | Interface MPLSv4 Labels | Interface MPLSv4 Labels 375 |<----------------------------| Optional 376 | ACK | 377 |---------------------------->| 378 | | 379 | | 380 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 381 |---------------------------->| Optional 382 | ACK | 383 |<----------------------------| 384 | | 385 | Interface MPLSv6 Labels | Interface MPLSv6 Labels 386 |<----------------------------| Optional 387 | ACK | 388 |---------------------------->| 389 | | 390 | | 391 | L3DL KEEPALIVE | Layer 2 Liveness 392 |---------------------------->| Optional 393 | L3DL KEEPALIVE | 394 |<----------------------------| 396 6. Transport Layer 398 L3DL PDUs are carried by a simple transport layer which allows long 399 PDUs to occupy many Ethernet frames. The L3DL content of a single 400 Ethernet frame, exclusive of Ethernet framing data, is referred to as 401 a Datagram. 403 The L3DL Transport Layer encapsulates each Datagram using a common 404 transport header. 406 If a PDU does not fit in a single datagram, it is broken into 407 multiple Datagrams and reassembled by the receiver a la [RFC0791] 408 Section 2.3 Fragmentation. 410 This is not classic 'fragmentation', but rather decomposition at the 411 origin to allow PDU payloads larger than the frame allows. There are 412 no intermediate devices capable of further fragmentation or 413 reassembly. 415 A PDU might need a large number of frames to be sent. As fragments 416 are not ACK paced (as PDUs are), to avoid overwhelming bursts, the 417 sender should pace fragments of a large PDU. 419 L3DL is carrying relatively small amounts of data on relatively high 420 bandwidth links, and at a time when the link is not active with other 421 data as it does not yet have layer three connectivity. So congestion 422 is not considered a sufficiently significant risk to warrant 423 additional complexity. 425 Should a PDU need to be retransmitted, it MUST BE sent as the 426 identical Datagram set as the original transmission. The 427 Transmission Sequence Number informs the receiver that it is the same 428 PDU. 430 0 1 2 3 431 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 432 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 433 | Version | Transmission Sequence Number |L| Dtgm Number ~ 434 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 435 ~ Datagram Number (contd) | Datagram Length | 436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 437 | Checksum | 438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 439 | Payload... | 440 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 The fields of the L3DL Transport Header are as follows: 444 Version: Eight-bit Version number of the protocol, currently 0. 445 Values other than 0 MUST BE treated as an error. The protocol 446 version needs to be in one and only one place, so it is in the 447 datagram as opposed to, for example, the PDU header. 449 Transmission Sequence Number: A 16-bit strictly increasing unsigned 450 integer identifying this PDU, possibly across retransmissions, 451 that wraps from 2^16-1 to 0. The initial value is arbitrary. See 452 [RFC1982] on DNS Serial Number Arithmetic for too much detail on 453 comparing and incrementing a wrapping sequence number. 455 L: A bit that set to one if this Datagram is the last Datagram of the 456 PDU. For a PDU which fits in only one Datagram, it is set to one. 457 Note that this is the inverse of the marking technique used by 458 [RFC0791]. 460 Datagram Number: A monotonically increasing 23-bit value which 461 starts at zero for each PDU. This is used to reassemble frames 462 into PDUs a la [RFC0791] Section 2.3. Note that this limits an 463 L3DL PDU to 2^24 frames. 465 Datagram Length: Total number of octets in the Datagram including 466 all payloads and fields. Note that this limits a datagram to 2^16 467 octets; though Ethernet framing is likely to impose a smaller 468 limit. 470 Checksum: A 32 bit hash over the Datagram to detect bit flips, see 471 Section 7. 473 If a Datagram fails checksum verification, the datagram is invalid 474 and should be silently discarded. The sender will retransmit the 475 PDU, and the receiver can assemble it. 477 Payload: The PDU being transported or a fragment thereof. 479 To avoid the need for a receiver to reassemble two PDUs at the same 480 time, a sender MUST NOT send a subsequent PDU when a PDU is already 481 in flight and not yet acknowledged; assuming it is an ACKed PDU Type. 483 7. The Checksum 485 There is a reason conservative folk use a checksum in UDP. And as 486 many operators stretch to jumbo frames (over 1,500 octets) longer 487 checksums are the prudent approach. 489 For the purpose of computing a checksum, the checksum field itself is 490 assumed to be zero. 492 The following code describes a suggested algorithm. This 493 specification avoids mandatory to implement, algorithm agility, etc. 494 What matters is that the same algorithm is used consistently in any 495 deployment. 497 Sum up 32-bit unsigned ints in a 64-bit long, then take the high- 498 order section, shift it right filling on the left with zeros, rotate, 499 add it in, repeat until the high order 32 bits are all zero. 501 502 #include 503 #include 505 /* The F table from Skipjack, and it would work for the S-Box. */ 506 static const uint8_t sbox[256] = { 507 0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78, 508 0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e, 509 0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0, 510 0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53, 511 0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5, 512 0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b, 513 0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85, 514 0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90, 515 0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56, 516 0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20, 517 0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e, 518 0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18, 519 0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9, 520 0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87, 521 0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73, 522 0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5, 523 0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e, 524 0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1, 525 0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe, 526 0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac, 527 0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01, 528 0x05,0x59,0x2a,0x46 529 }; 531 /* non-normative example C code, constant time even */ 533 uint32_t sbox_checksum_32(const uint8_t *b, const size_t n) 534 { 535 uint32_t sum[4] = {0, 0, 0, 0}; 536 uint64_t result = 0; 537 for (size_t i = 0; i < n; i++) 538 sum[i & 3] += sbox[*b++]; 539 for (int i = 0; i < sizeof(sum)/sizeof(*sum); i++) 540 result = (result << 8) + sum[i]; 541 result = (result >> 32) + (result & 0xFFFFFFFFU); 542 result = (result >> 32) + (result & 0xFFFFFFFFU); 543 return (uint32_t) result; 544 } 545 547 8. TLV PDUs 549 The basic L3DL application layer PDU is a typical TLV (Type Length 550 Value) PDU. It includes a signature to provide optional integrity 551 and authentication. It may be broken into multiple Datagrams, see 552 Section 6. 554 0 1 2 3 555 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 556 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 557 | PDU Type | Payload Length ~ 558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 559 ~ | Payload ... | 560 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 561 | Sig Type | Signature Length | ~ 562 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 563 ~ Signature ~ 564 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566 The fields of the basic L3DL header are as follows: 568 PDU Type: An integer differentiating PDU payload types. See 569 Section 22.1. 571 Payload Length: Total number of octets in the Payload field. 573 Payload: The application layer content of the L3DL PDU. 575 Sig Type: The type of the Signature, see Section 22.2. Type 0, a 576 null signature, is defined in this document. 578 Sig Type 0 indicates a null Signature. For a trivial PDU such as 579 KEEPALIVE, the underlying Datagram checksum may be sufficient for 580 integrity, though it lacks authenticity. 582 Other Sig Types may be defined in other documents, cf. 583 [I-D.ymbk-lsvr-l3dl-signing]. 585 Signature Length: The length of the Signature, possibly including 586 padding, in octets. If Sig Type is 0, Signature Length MUST BE 0. 588 Signature: The result of running the signature algorithm specified 589 in Sig Type over all octets of the PDU except for the Signature 590 itself. 592 9. Logical Link Endpoint Identifier 594 L3DL discovers neighbors on logical links and establishes sessions 595 between the two ends of all consenting discovered logical links. A 596 logical link is described by a pair of Logical Link Endpoint 597 Identifiers, LLEIs. 599 An LLEI is a variable length descriptor which could be an ASN, a 600 classic RouterID, a catenation of the two, an eight octet ISO System 601 Identifier [RFC1629], or any other identifier unique to a single 602 logical link endpoint in the topology. 604 An L3DL deployment will choose and define an LLEI which suits its 605 needs, simple or complex. Examples of two extremes follow: 607 A simplistic view of a link between two devices is two ports, 608 identified by unique MAC addresses, carrying a layer 3 protocol 609 conversation. In this case, the MAC addresses might suffice for the 610 LLEIs. 612 Unfortunately, things can get more complex. Multiple VLANs can run 613 between those two MAC addresses. In practice, many real devices use 614 the same MAC address on multiple ports and/or sub-interfaces. 616 Therefore, in the general circumstance, a fully described LLEI might 617 be as follows: 619 0 1 2 3 620 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 622 | | 623 + System Identifier + 624 | | 625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 626 | ifIndex | 627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 629 System Identifier, a la [RFC1629], is an eight octet identifier 630 unique in the entire operational space. Routers and switches usually 631 have internal MAC Addresses which can be padded with high order zeros 632 and used if no System ID exists on the device. If no unique 633 identifier is burned into a device, the local L3DL configuration 634 SHOULD create and assign a unique one, likely by configuration. 636 ifIndex is the SNMP identifier of the (sub-)interface, see [RFC1213]. 637 This uniquely identifies the port. 639 For a layer 3 tagged sub-interface or a VLAN/SVI interface, Ifindex 640 is that of the logical sub-interface, so no further disambiguation is 641 needed. 643 L3DL PDUs learned over VLAN-ports may be interpreted by upper layer-3 644 routing protocols as being learned on the corresponding layer-3 SVI 645 interface for the VLAN. 647 LLEIs are big-endian. 649 10. HELLO 651 The HELLO PDU is unique in that it is encapsulated in a multicast 652 Ethernet frame. It solicits response(s) from other LLEI(s) on the 653 link. See Section 18.1 for why multicast is used. The destination 654 multicast MAC Addressees to be used MUST be one of the following, See 655 Clause 9.2.2 of [IEEE802-2014]: 657 01-80-C2-00-00-0E: Nearest Bridge = Propagation constrained to a 658 single physical link; stopped by all types of bridges (including 659 MPRs (media converters)). This SHOULD BE used when the link is 660 known to be a simple point to point link. 661 To Be Assigned: When a switch receives a frame with a multicast 662 destination MAC it does not recognize, it forwards to all ports. 663 This destination MAC is to be sent when the interface is known to 664 be connected to a switch. See Section 23. This SHOULD BE used 665 when the link may be a multi-point link. 667 All other L3DL PDUs are encapsulated in unicast frames, as the peer's 668 destination MAC address is known after the HELLO exchange. 670 When an interface is turned up on a device, it SHOULD issue a HELLO 671 if it is to participate in L3DL sessions. 673 If a constrained Nearest Bridge destination address has been 674 configured for a point-to-point interface, see above, then the HELLO 675 SHOULD NOT be repeated once a session has been created by an exchange 676 of OPENs. 678 If the configured destination address is one that is propagated by 679 switches, the HELLO SHOULD be repeated at a configured interval, with 680 a default of 60 seconds. This allows discovery by new devices which 681 come up on the layer-2 mesh. In this multi-link scenario, the 682 operator should be aware of the trade-off between timer tuning and 683 network noise and adjust the inter-HELLO timer accordingly. 685 0 1 2 3 686 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 688 | PDU Type = 0 | Payload Length = 0 ~ 689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 690 ~ | Sig Type = 0 | Signature Length = 0 | 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 693 If more than one device responds, one adjacency is formed for each 694 unique source LLEI response. L3DL treats each adjacency as a 695 separate logical link. 697 When a HELLO is received from a source MAC address (plus VID if VLAN) 698 with which there is no established L3DL session, the receiver SHOULD 699 respond by sending an OPEN PDU to the source MAC address (plus VID). 700 The two devices establish an L3DL session by exchanging OPEN PDUs. 702 To ameliorate possible load spikes during bootstrap or event 703 recovery, there SHOULD be a jittered delay between receipt of a HELLO 704 and issue of the OPEN. The default delay range SHOULD BE zero to 705 five seconds, and MUST be configurable. 707 If a HELLO is received from a MAC address with which there is an 708 established session, the HELLO should be dropped. 710 The Payload Length is zero as there is no payload. 712 HELLO PDUs can not be signed as keying material has yet to be 713 exchanged. Hence the signature MUST always be the null type. 715 11. OPEN 717 Each device has learned the other's MAC Address from the HELLO 718 exchange, see Section 10. Therefore the OPEN and all subsequent PDUs 719 MUST BE unicast, as opposed to the HELLO's multicast frame. 721 0 1 2 3 722 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 724 | PDU Type = 1 | Payload Length ~ 725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 ~ | Nonce ~ 727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 728 ~ | LLEI Length | My LLEI | 729 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-~ 730 ~ | AttrCount | ~ 731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 732 ~ Attribute List ... | Auth Type | Key Length ~ 733 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 734 ~ | Key ... | 735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 736 | Serial Number | 737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 738 | Sig Type | Signature Length | Signature ... | 739 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 741 The Payload Length is the number of octets in all fields of the PDU 742 from the Nonce through the Serial Number, not including the three 743 final signature fields. 745 The Nonce enables detection of a duplicate OPEN PDU. It SHOULD be 746 either a random number or a high resolution timestamp. It is needed 747 to prevent session closure due to a repeated OPEN caused by a race or 748 a dropped or delayed ACK. 750 My LLEI is the sender's LLEI, see Section 9. 752 AttrCount is the number of attributes in the Attribute List. 753 Attributes are single octets the semantics of which are operator- 754 defined. 756 A node may have zero or more operator-defined attributes, e.g.: 757 spine, leaf, backbone, route reflector, arabica, ... 759 Attribute syntax and semantics are local to an operator or 760 datacenter; hence there is no global registry. Nodes exchange their 761 attributes only in the OPEN PDU. 763 Auth Type is the Signature algorithm suite, see Section 8. 765 Key Length is a 16-bit field denoting the length in octets of the Key 766 itself, not including the Auth Type or the Key Length. If the Auth 767 Type is zero, then the Key Length MUST also be zero, and there MUST 768 BE no Key data. 770 The Key is specific to the operational environment. A failure to 771 authenticate is a failure to start the L3DL session, an ERROR PDU 772 MUST BE sent (Error Code 3), and HELLOs MUST be restarted. 774 Although delay and jitter in responding with an OPEN were specified 775 above, beware of load created by long strings of authentication 776 failures and retries. A configurable failure count limit (default 8) 777 SHOULD result in giving up on the connection attempt. 779 The Serial Number is that of the last received and processed PDU. 780 This allows a receiver sending an OPEN to tell the sender that the 781 receiver wants to resume a session and the sender only needs to send 782 data more recent than the Serial Number. If this OPEN is not trying 783 to restart a lost session, the Serial Number MUST BE set to zero. 785 The Signature fields are described in Section 8 and in an asymmetric 786 key environment serve as a proof of possession of the signing auth 787 data by the sender. 789 Once two logical link endpoints know each other, and have ACKed each 790 other's OPEN PDUs, Layer 2 KEEPALIVEs (see Section 15) MAY be started 791 to ensure Layer 2 liveness and keep the session semantics alive. The 792 timing and acceptable drop of KEEPALIVE PDUs are discussed in 793 Section 15. 795 If a sender of OPEN does not receive an ACK of the OPEN PDU, then 796 they MUST resend the same OPEN PDU, with the same Nonce. Resending 797 an unacknowledged OPEN PDU, like other ACKed PDUs, SHOULD use 798 exponential back-off, see [RFC1122]. 800 If a properly authenticated OPEN arrives at L3DL speaker A with a new 801 Nonce from an LLEI, speaker B, with which A believes it already has 802 an L3DL session (OPENs have already been exchanged), and the Serial 803 Number in the OPEN PDU is non-zero, speaker A SHOULD establish a new 804 session by sending an OPEN with the Serial Number being the same as 805 that of A's last sent and ACKed PDU. Each party MUST resume sending 806 encapsulations etc. subsequent to the other party's Sequence Number. 807 And each MUST retain all previously discovered encapsulation and 808 other data. 810 If a properly authenticated OPEN arrives with a new Nonce from an 811 LLEI with which the receiving logical link endpoint believes it 812 already has an L3DL session (OPENs have already been exchanged), and 813 the Serial Number in the OPEN is zero, then the receiver MUST assume 814 that the sending LLEI or entire device has been reset. All 815 previously discovered encapsulation data MUST NOT be kept and MUST BE 816 withdrawn via the BGP-LS API and the recipient MUST respond with a 817 new OPEN. 819 12. ACK 821 The ACK PDU acknowledges receipt of a PDU and reports any error 822 condition which might have been raised. 824 0 1 2 3 825 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 826 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 827 | PDU Type = 3 | Payload Length = 5 ~ 828 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 829 ~ | ACKed PDU | EType | Error Code | 830 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 831 | Error Hint | Sig Type |Signature Leng.~ 832 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 833 ~ | Signature ... | 834 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 836 The ACK acknowledges receipt of an OPEN, Encapsulation, VENDOR PDU, 837 etc. 839 The ACKed PDU is the PDU Type of the PDU being acknowledged, e.g., 840 OPEN, one of the Encapsulations, etc. 842 If there was an error processing the received PDU, then the EType is 843 non-zero. If the EType is zero, Error Code and Error Hint MUST also 844 be zero. 846 A non-zero EType is the receiver's way of telling the PDU's sender 847 that the receiver had problems processing the PDU. The Error Code 848 and Error Hint will tell the sender more detail about the error. 850 The decimal value of EType gives a strong hint how the receiver 851 sending the ACK believes things should proceed: 853 0 - No Error, Error Code and Error Hint MUST be zero 854 1 - Warning, something not too serious happened, continue 855 2 - Session should not be continued, try to restart 856 3 - Restart is hopeless, call the operator 857 4-15 - Reserved 859 The Error Codes, noting protocol failures, are listed in 860 Section 22.4. Someone stuck in the 1990s might think the catenation 861 of EType and Error Code as an echo of 0x1zzz, 0x2zzz, etc. They 862 might be right; or not. 864 The Error Hint, an arbitrary 16 bits, is any additional data the 865 sender of the error PDU thinks will help the recipient or the 866 debugger with the particular error. 868 The Signature fields are described in Section 8. 870 12.1. Retransmission 872 If a PDU sender expects an ACK, e.g. for an OPEN, an Encapsulation, a 873 VENDOR PDU, etc., and does not receive the ACK for a configurable 874 time (default one second), and the interface is live at layer 2, the 875 sender resends the PDU using exponential back-off, see [RFC1122]. 876 This cycle MAY be repeated a configurable number of times (default 877 three) before it is considered a failure. The session MAY BE 878 considered closed in this case of this ACK failure. 880 If the link is broken at layer 2, retransmission MAY BE retried when 881 the link is restored. 883 13. The Encapsulations 885 Once the devices know each other's LLEIs, know each other's upper 886 layer (L2.5 and L3) identities, have means to ensure link state, 887 etc., the L3DL session is considered established, and the devices 888 SHOULD exchange L3 interface encapsulations, L3 addresses, and L2.5 889 labels. 891 The Encapsulation types the peers exchange may be IPv4 892 (Section 13.3), IPv6 (Section 13.4), MPLS IPv4 (Section 13.6), MPLS 893 IPv6 (Section 13.7), and/or possibly others not defined here. 895 The sender of an Encapsulation PDU MUST NOT assume that the peer is 896 capable of the same Encapsulation Type. An ACK (Section 12) merely 897 acknowledges receipt. Only if both peers have sent the same 898 Encapsulation Type is it safe for Layer 3 protocols to assume that 899 they are compatible for that type. 901 A receiver of an encapsulation might recognize an addressing 902 conflict, such as both ends of the link trying to use the same 903 address. In this case, the receiver SHOULD respond with an error 904 (Error Code 2) ACK. As there may be other usable addresses or 905 encapsulations, this error might log and continue, letting an upper 906 layer topology builder deal with what works. 908 Further, to consider a logical link of a type to formally be 909 established so that it may be pushed up to upper layer protocols, the 910 addressing for the type must be compatible, e.g. on the same IP 911 subnet. 913 13.1. The Encapsulation PDU Skeleton 915 The header for all encapsulation PDUs is as follows: 917 0 1 2 3 918 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 919 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 920 | PDU Type | Payload Length ~ 921 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 922 ~ | Count | 923 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 924 | Serial Number | 925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 926 | Encapsulation List... | Sig Type | 927 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 928 | Signature Length | Signature ... | 929 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 931 An Encapsulation PDU describes zero or more addresses of the 932 encapsulation type. 934 The 24-bit Count is the number of Encapsulations in the Encapsulation 935 list. 937 The Serial Number is a monotonically increasing 32-bit value 938 representing the sender's state in time. It may be an integer, a 939 timestamp, etc. On session restart (new OPEN), a receiver MAY send 940 the last received Session Number to tell the sender to only send 941 newer data. 943 If a sender has multiple links on the same interface, separate state: 944 data, ACKs, etc. must be kept for each peer session. 946 Over time, multiple Encapsulation PDUs may be sent for an interface 947 as configuration changes. 949 If the length of an Encapsulation PDU exceeds the Datagram size limit 950 on media, the PDU is broken into multiple Datagrams. See Section 8. 952 The Signature fields are described in Section 8. 954 The Receiver MUST acknowledge the Encapsulation PDU with a Type=3, 955 ACK PDU (Section 12) with the Encapsulation Type being that of the 956 encapsulation being announced, see Section 12. 958 If the Sender does not receive an ACK in a configurable interval 959 (default one second), and the interface is live at layer 2, they 960 SHOULD retransmit. After a user configurable number of failures 961 (default three), the L3DL session should be considered dead and the 962 OPEN process SHOULD be restarted. 964 If the link is broken at layer 2, retransmission MAY BE retried if 965 data have not changed in the interim. 967 13.2. Encapsulaion Flags 969 The Encapsulation Flags are a sequence of bit fields as follows: 971 0 1 2 3 4 ... 7 972 +------------+------------+------------+------------+------------+ 973 | Ann/With | Primary | Under/Over | Loopback | Reserved ..| 974 +------------+------------+------------+------------+------------+ 976 Each encapsulation in an Encapsulation PDU of Type T may announce new 977 and/or withdraw old encapsulations of Type T. It indicates this with 978 the Ann/With Encapsulation Flag, Announce == 1, Withdraw == 0. 980 Each Encapsulation interface address in an Encapsulation PDU is 981 either a new encapsulation be announced (Ann/With == 1) (yes, a la 982 BGP) or requests one be withdrawn (Ann/With == 0). Adding an 983 encapsulation which already exists SHOULD raise an Announce/Withdraw 984 Error (see Section 22.4); the EType SHOULD be 2, suggesting a session 985 restart (see Section 12 so all encapsulations will be resent. 987 If an LLEI has multiple addresses for an encapsulation type, one and 988 only one address MAY be marked as primary (Primary Flag == 1) for 989 that Encapsulation Type. 991 An Encapsulation interface address in an Encapsulation PDU MAY be 992 marked as a loopback, in which case the Loopback bit is set. 993 Loopback addresses are generally not seen directly on an external 994 interface. One or more loopback addresses MAY be exposed by 995 configuration on one or more L3DL speaking external interfaces, e.g. 996 for iBGP peering. They SHOULD be marked as such, Loopback Flag == 1. 998 Each Encapsulation interface address in an Encapsulation PDU is that 999 of the direct 'underlay interface (Under/Over == 1), or an 'overlay' 1000 address (Under/Over == 0), likely that of a VM or container guest 1001 bridged or configured on to the interface already having an underlay 1002 address. 1004 13.3. IPv4 Encapsulation 1006 The IPv4 Encapsulation describes a device's ability to exchange IPv4 1007 packets on one or more subnets. It does so by stating the 1008 interface's addresses and the corresponding prefix lengths. 1010 0 1 2 3 1011 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1012 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1013 | PDU Type = 4 | Payload Length ~ 1014 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1015 ~ | Count | 1016 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1017 | Serial Number | 1018 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1019 | Encaps Flags | IPv4 Address ~ 1020 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1021 ~ | PrefixLen | more ... | Sig Type | 1022 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1023 | Signature Length | Signature ... | 1024 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1026 The 24-bit Count is the sum of the number of IPv4 Encapsulations 1027 being announced and/or withdrawn. 1029 13.4. IPv6 Encapsulation 1031 The IPv6 Encapsulation describes a logical link's ability to exchange 1032 IPv6 packets on one or more subnets. It does so by stating the 1033 interface's addresses and the corresponding prefix lengths. 1035 0 1 2 3 1036 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1037 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1038 | PDU Type = 5 | Payload Length ~ 1039 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1040 ~ | Count | 1041 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1042 | Serial Number | 1043 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1044 | Encaps Flags | | 1045 +-+-+-+-+-+-+-+-+ + 1046 | | 1047 + + 1048 | | 1049 + + 1050 | IPv6 Address | 1051 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1052 | | PrefixLen | more ... | Sig Type | 1053 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1054 | Signature Length | Signature ... | 1055 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1056 The 24-bit Count is the sum of the number of IPv6 Encapsulations 1057 being announced and/or withdrawn. 1059 13.5. MPLS Label List 1061 As an MPLS enabled interface may have a label stack, see [RFC3032], a 1062 variable length list of labels is needed. These are the labels the 1063 sender will accept for the prefix to which the list is attached. 1065 0 1 2 3 1066 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1068 | Label Count | Label | Exp |S| 1069 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1070 | Label | Exp |S| more ... | 1071 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1073 A Label Count of zero is an implicit withdraw of all labels for that 1074 prefix on that interface. 1076 13.6. MPLS IPv4 Encapsulation 1078 The MPLS IPv4 Encapsulation describes a logical link's ability to 1079 exchange labeled IPv4 packets on one or more subnets. It does so by 1080 stating the interface's addresses the corresponding prefix lengths, 1081 and the corresponding labels which will be accepted for each address. 1083 0 1 2 3 1084 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1086 | PDU Type = 6 | Payload Length ~ 1087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1088 ~ | Count | 1089 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1090 | Serial Number | 1091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 | Encaps Flags | MPLS Label List ... | ~ 1093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1094 ~ IPv4 Address | PrefixLen | 1095 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1096 | more ... | Sig Type | Signature Length | 1097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1098 | Signature | 1099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1101 The 24-bit Count is the sum of the number of MPLSv4 Encapsulation 1102 being announced and/or withdrawn. 1104 13.7. MPLS IPv6 Encapsulation 1106 The MPLS IPv6 Encapsulation describes a logical link's ability to 1107 exchange labeled IPv6 packets on one or more subnets. It does so by 1108 stating the interface's addresses, the corresponding prefix lengths, 1109 and the corresponding labels which will be accepted for each address. 1111 0 1 2 3 1112 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1114 | PDU Type = 7 | Payload Length ~ 1115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1116 ~ | Count | 1117 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1118 | Serial Number | 1119 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1120 | Encaps Flags | MPLS Label List ... | | 1121 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1122 | | 1123 + + 1124 | | 1125 + + 1126 | IPv6 Address | 1127 + +-+-+-+-+-+-+-+-+ 1128 | | Prefix Len | 1129 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1130 | more ... | Sig Type | Signature Length | 1131 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1132 | Signature ... | 1133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1135 The 24-bit Count is the sum of the number of MPLSv6 Encapsulations 1136 being announced and/or withdrawn. 1138 14. VENDOR - Vendor Extensions 1139 0 1 2 3 1140 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1142 | PDU Type = 255| Payload Length ~ 1143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1144 ~ | Serial Number ~ 1145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1146 ~ | Enterprise Number | 1147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1148 | Ent Type | Enterprise Data ... ~ 1149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1150 ~ | Sig Type | Signature Length | 1151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1152 | Signature ... | 1153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1155 Vendors or enterprises may define TLVs beyond the scope of L3DL 1156 standards. This is done using a Private Enterprise Number [IANA-PEN] 1157 followed by Enterprise Data in a format defined for that Enterprise 1158 Number and Ent Type. 1160 Ent Type allows a VENDOR PDU to be sub-typed in the event that the 1161 vendor/enterprise needs multiple PDU types. 1163 As with Encapsulation PDUs, a receiver of a VENDOR PDU MUST respond 1164 with an ACK or an ERROR PDU. Similarly, a VENDOR PDU MUST only be 1165 sent over an open session. 1167 15. KEEPALIVE - Layer 2 Liveness 1169 0 1 2 3 1170 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1172 | PDU Type = 2 | Payload Length = 0 ~ 1173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1174 ~ | Sig Type = 0 | Signature Length = 0 | 1175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1177 L3DL devices SHOULD beacon frequent Layer 2 KEEPALIVE PDUs to ensure 1178 session continuity. The inter-KEEPALIVE interval is configurable, 1179 with a default of ten seconds. A receiver may choose to ignore 1180 KEEPALIVE PDUs. 1182 An operational deployment MUST BE configured whether to use 1183 KEEPALIVEs or not, either globally, or as finely as to per-link 1184 granularity. Disagreement MAY result in repeated session failure and 1185 reestablishment. 1187 KEEPALIVEs SHOULD be beaconed at a configured frequency. One per 1188 second is the default. Layer 3 liveness, such as BFD, may be more 1189 (or less) aggressive. 1191 When a sender transmits a PDU which is not a KEEPALIVE, the sender 1192 SHOULD reset the KEEPALIVE timer. I.e. sending any PDU acts as a 1193 keepalive. Once the last fragment has been sent, the KEEPALIVE timer 1194 SHOULD BE restarted. Do not wait for the ACK. 1196 If a KEEPALIVE or other PDUs have not been received from a peer with 1197 which a receiver has an open session for a configurable time (default 1198 30 seconds), the link SHOULD BE presumed down. The devices MAY keep 1199 configuration state and restore it without retransmission if no data 1200 have changed. Otherwise, a new session SHOULD BE established and new 1201 Encapsulation PDUs exchanged. 1203 16. Layers 2.5 and 3 Liveness 1205 Layer 2 liveness may be continuously tested by KEEPALIVE PDUs, see 1206 Section 15. As layer 2.5 or layer 3 connectivity could still break, 1207 liveness above layer 2 MAY be frequently tested using BFD ([RFC5880]) 1208 or a similar technique. 1210 This protocol assumes that one or more Encapsulation addresses may be 1211 used to ping, run BFD, or whatever the operator configures. 1213 17. The North/South Protocol 1215 Thus far, a one-hop point-to-point logical link discovery protocol 1216 has been defined. 1218 The devices know their unique LLEIs and know the unique peer LLEIs 1219 and Encapsulations on each logical link interface. 1221 Full topology discovery is not appropriate at the L3DL layer, so 1222 Dijkstra a la IS-IS etc. is assumed to be done by higher level 1223 protocols such as BGP-SPF. 1225 Therefore the LLEIs, link Encapsulations, and state changes are 1226 pushed North via a small subset of the BGP-LS API. The upper layer 1227 routing protocol(s), e.g. BGP-SPF, learn and maintain the topology, 1228 run Dijkstra, and build the routing database(s). 1230 For example, if a neighbor's IPv4 Encapsulation address changes, the 1231 devices seeing the change push that change Northbound. 1233 17.1. Use BGP-LS as Much as Possible 1235 BGP-LS [RFC7752] defines BGP-like Datagrams describing logical link 1236 state (links, nodes, link prefixes, and many other things), and a new 1237 BGP path attribute providing Northbound transport, all of which can 1238 be ingested by upper layer protocols such as BGP-SPF; see Section 4 1239 of [I-D.ietf-lsvr-bgp-spf]. 1241 For IPv4 links, TLVs 259 and 260 are used. For IPv6 links, TLVs 261 1242 and 262. If there are multiple addresses on a link, multiple TLV 1243 pairs are pushed North, having the same ID pairs. 1245 17.2. Extensions to BGP-LS 1247 The Northbound protocol needs a few minor extensions to BGP-LS. 1248 Luckily, others have needed the same extensions. 1250 Similarly to BGP-SPF, the BGP protocol is used in the Protocol-ID 1251 field specified in table 1 of 1252 [I-D.ietf-idr-bgpls-segment-routing-epe]. The local and remote node 1253 descriptors for all NLRI are the IDs described in Section 11. This 1254 is equivalent to an adjacency SID or a node SID if the address is a 1255 loopback address. 1257 Label Sub-TLVs from [I-D.ietf-idr-bgp-ls-segment-routing-ext] 1258 Section 2.1.1, are used to associate one or more MPLS Labels with a 1259 link. 1261 18. Discussion 1263 This section explores some trade-offs taken and some considerations. 1265 18.1. HELLO Discussion 1267 A device with multiple Layer 2 interfaces, traditionally called a 1268 switch, may be used to forward frames and therefore packets from 1269 multiple devices to one logical interface (LLEI), I, on an L3DL 1270 speaking device. Interface I could discover a peer J across the 1271 switch. Later, a prospective peer K could come up across the switch. 1272 If I was not still sending and listening for HELLOs, the potential 1273 peering with K could not be discovered. Therefore, on multi-link 1274 interfaces, L3DL MUST continue to send HELLOs as long as they are 1275 turned up. 1277 18.2. HELLO versus KEEPALIVE 1279 Both HELLO and KEEPALIVE are periodic. KEEPALIVE might be eliminated 1280 in favor of keeping only HELLOs. But KEEPALIVEs are unicast, and 1281 thus less noisy on the network, especially if HELLO is configured to 1282 transit layer-2-only switches, see Section 18.1. 1284 19. VLANs/SVIs/Sub-interfaces 1286 One can think of the protocol as an instance (i.e. state machine) 1287 which runs on each logical link of a device. 1289 As the upper routing layer must view VLAN topologies as separate 1290 graphs, L3DL treats VLAN ports as separate links. 1292 L3DL PDUs learned over VLAN-ports may be interpreted by upper layer-3 1293 routing protocols as being learned on the corresponding layer-3 SVI 1294 interface for the VLAN. 1296 As Sub-Interfaces each have their own LLIEs, they act as separate 1297 interfaces, forming their own links. 1299 20. Implementation Considerations 1301 An implementation SHOULD provide the ability to configure each 1302 logical interface as L3DL speaking or not. 1304 An implementation SHOULD provide the ability to configure whether 1305 HELLOs on an L3DL enabled interface send Nearest Bridge or the MAC 1306 which is propagated by switches from that interface; see Section 10. 1308 An implementation SHOULD provide the ability to distribute one or 1309 more loopback addresses or interfaces into L3DL on an external L3DL 1310 speaking interface. 1312 An implementation SHOULD provide the ability to distribute one or 1313 more overlay and/or underlay addresses or interfaces into L3DL on an 1314 external L3DL speaking interface. 1316 An implementation SHOULD provide the ability to configure one of the 1317 addresses of an encapsulation as primary on an L3DL speaking 1318 interface. If there is only one address for a particular 1319 encapsulation, the implementation MAY mark it as primary by default. 1321 An implementation MAY allow optional configuration which updates the 1322 local forwarding table with overlay and underlay data both learned 1323 from L3DL peers and configured locally. 1325 21. Security Considerations 1327 The protocol as is MUST NOT be used outside a datacenter or similarly 1328 closed environment without authentication and authorization 1329 mechanisms such as [I-D.ymbk-lsvr-l3dl-signing]. 1331 Many MDC operators have a strange belief that physical walls and 1332 firewalls provide sufficient security. This is not credible. All 1333 MDC protocols need to be examined for exposure and attack surface. 1334 In the case of L3DL, Authentication and Integrity as provided in 1335 [I-D.ymbk-lsvr-l3dl-signing] is strongly recommended. 1337 It is generally unwise to assume that on the wire Layer 2 is secure. 1338 Strange/unauthorized devices may plug into a port. Mis-wiring is 1339 very common in datacenter installations. A poisoned laptop might be 1340 plugged into a device's port, form malicious sessions, etc. to 1341 divert, intercept, or drop traffic. 1343 Similarly, malicious nodes/devices could mis-announce addressing. 1345 If OPENs are not being authenticated, an attacker could forge an OPEN 1346 for an existing session and cause the session to be reset. 1348 For these reasons, the OPEN PDU's authentication data exchange SHOULD 1349 be used. 1351 If the KEEPALIVE PDU is not signed (as suggested in Section 8) to 1352 save computation, then a MITM could fake a session being alive. 1354 22. IANA Considerations 1356 22.1. PDU Types 1358 This document requests the IANA create a registry for L3DL PDU Type, 1359 which may range from 0 to 255. The name of the registry should be 1360 L3DL-PDU-Type. The policy for adding to the registry is RFC Required 1361 per [RFC5226], either standards track or experimental. The initial 1362 entries should be the following: 1364 PDU 1365 Code PDU Name 1366 ---- ------------------- 1367 0 HELLO 1368 1 OPEN 1369 2 KEEPALIVE 1370 3 ACK 1371 4 IPv4 Announcement 1372 5 IPv6 Announcement 1373 6 MPLS IPv4 Announcement 1374 7 MPLS IPv6 Announcement 1375 8-254 Reserved 1376 255 VENDOR 1378 22.2. Signature Type 1380 This document requests the IANA create a registry for L3DL Signature 1381 Type, AKA Sig Type, which may range from 0 to 255. The name of the 1382 registry should be L3DL-Signature-Type. The policy for adding to the 1383 registry is RFC Required per [RFC5226], either standards track or 1384 experimental. The initial entries should be the following: 1386 Number Name 1387 ------ ------------------- 1388 0 Null 1389 1-255 Reserved 1391 22.3. Flag Bits 1393 This document requests the IANA create a registry for L3DL PL Flag 1394 Bits, which may range from 0 to 7. The name of the registry should 1395 be L3DL-PL-Flag-Bits. The policy for adding to the registry is RFC 1396 Required per [RFC5226], either standards track or experimental. The 1397 initial entries should be the following: 1399 Bit Bit Name 1400 ---- ------------------- 1401 0 Announce/Withdraw (ann == 0) 1402 1 Primary 1403 2 Underlay/Overlay (under == 0) 1404 3 Loopback 1405 4-7 Reserved 1407 22.4. Error Codes 1409 This document requests the IANA create a registry for L3DL Error 1410 Codes, a 16 bit integer. The name of the registry should be L3DL- 1411 Error-Codes. The policy for adding to the registry is RFC Required 1412 per [RFC5226], either standards track or experimental. The initial 1413 entries should be the following: 1415 Error 1416 Code Error Name 1417 ---- ------------------- 1418 0 No Error 1419 1 Checksum Error 1420 2 Logical Link Addressing Conflict 1421 3 Authorization Failure 1422 4 Announce/Withdraw Error 1424 23. IEEE Considerations 1426 This document requires a new EtherType. 1428 This document requires a new multicast MAC address that will be 1429 broadcast through a switch. 1431 24. Acknowledgments 1433 The authors thank Cristel Pelsser for multiple reviews, Harsha Kovuru 1434 for comments during implementation, Jeff Haas for review and 1435 comments, Joerg Ott for an early but deep transport review, Joe 1436 Clarke for a useful review, John Scudder for deeply serious review 1437 and comments, Larry Kreeger for a lot of layer 2 clue, Martijn 1438 Schmidt for his contribution, Nalinaksh Pai for transport 1439 discussions, Neeraj Malhotra for review, Paul Congdon for Ethernet 1440 hints, Russ Housley for checksum discussion and sBox, and Steve 1441 Bellovin for checksum advice. 1443 25. References 1445 25.1. Normative References 1447 [I-D.ietf-idr-bgp-ls-segment-routing-ext] 1448 Previdi, S., Talaulikar, K., Filsfils, C., Gredler, H., 1449 and M. Chen, "BGP Link-State extensions for Segment 1450 Routing", draft-ietf-idr-bgp-ls-segment-routing-ext-16 1451 (work in progress), June 2019. 1453 [I-D.ietf-idr-bgpls-segment-routing-epe] 1454 Previdi, S., Talaulikar, K., Filsfils, C., Patel, K., Ray, 1455 S., and J. Dong, "BGP-LS extensions for Segment Routing 1456 BGP Egress Peer Engineering", draft-ietf-idr-bgpls- 1457 segment-routing-epe-19 (work in progress), May 2019. 1459 [I-D.ietf-lsvr-bgp-spf] 1460 Patel, K., Lindem, A., Zandi, S., and W. Henderickx, 1461 "Shortest Path Routing Extensions for BGP Protocol", 1462 draft-ietf-lsvr-bgp-spf-10 (work in progress), July 2020. 1464 [I-D.ymbk-lsvr-l3dl-signing] 1465 Bush, R. and R. Austein, "Layer 3 Discovery and Liveness 1466 Signing", draft-ymbk-lsvr-l3dl-signing-01 (work in 1467 progress), May 2020. 1469 [IANA-PEN] 1470 "IANA Private Enterprise Numbers", 1471 . 1474 [IEEE.802_2001] 1475 IEEE, "IEEE Standard for Local and Metropolitan Area 1476 Networks: Overview and Architecture", IEEE 802-2001, 1477 DOI 10.1109/ieeestd.2002.93395, July 2002, 1478 . 1480 [IEEE802-2014] 1481 Institute of Electrical and Electronics Engineers, "Local 1482 and Metropolitan Area Networks: Overview and 1483 Architecture", IEEE Std 802-2014, 2014. 1485 [RFC1213] McCloghrie, K. and M. Rose, "Management Information Base 1486 for Network Management of TCP/IP-based internets: MIB-II", 1487 STD 17, RFC 1213, DOI 10.17487/RFC1213, March 1991, 1488 . 1490 [RFC1629] Colella, R., Callon, R., Gardner, E., and Y. Rekhter, 1491 "Guidelines for OSI NSAP Allocation in the Internet", 1492 RFC 1629, DOI 10.17487/RFC1629, May 1994, 1493 . 1495 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1496 Requirement Levels", BCP 14, RFC 2119, 1497 DOI 10.17487/RFC2119, March 1997, 1498 . 1500 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 1501 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 1502 Encoding", RFC 3032, DOI 10.17487/RFC3032, January 2001, 1503 . 1505 [RFC4271] Rekhter, Y., Ed., Li, T., Ed., and S. Hares, Ed., "A 1506 Border Gateway Protocol 4 (BGP-4)", RFC 4271, 1507 DOI 10.17487/RFC4271, January 2006, 1508 . 1510 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1511 IANA Considerations Section in RFCs", RFC 5226, 1512 DOI 10.17487/RFC5226, May 2008, 1513 . 1515 [RFC5880] Katz, D. and D. Ward, "Bidirectional Forwarding Detection 1516 (BFD)", RFC 5880, DOI 10.17487/RFC5880, June 2010, 1517 . 1519 [RFC6286] Chen, E. and J. Yuan, "Autonomous-System-Wide Unique BGP 1520 Identifier for BGP-4", RFC 6286, DOI 10.17487/RFC6286, 1521 June 2011, . 1523 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 1524 S. Ray, "North-Bound Distribution of Link-State and 1525 Traffic Engineering (TE) Information Using BGP", RFC 7752, 1526 DOI 10.17487/RFC7752, March 2016, 1527 . 1529 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 1530 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 1531 May 2017, . 1533 25.2. Informative References 1535 [Clos0] Clos, C., "A study of non-blocking switching networks 1536 [PAYWALLED]", Bell System Technical Journal 32 (2), pp 1537 406-424, March 1953. 1539 [Clos1] "Clos Network", 1540 . 1542 [I-D.malhotra-bess-evpn-lsoe] 1543 Malhotra, N., Patel, K., and J. Rabadan, "LSoE-based PE-CE 1544 Control Plane for EVPN", draft-malhotra-bess-evpn-lsoe-00 1545 (work in progress), March 2019. 1547 [JUPITER] Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, 1548 A., Bannon, R., Boving, S., Desai, G., Felderman, B., 1549 Germano, P., Kanagala, A., Liu, H., Provost, J., Simmons, 1550 J., Tanda, E., Wanderer, J., HAP.lzle, U., Stuart, S., and 1551 A. Vahdat, "Jupiter rising", Communications of the 1552 ACM Vol. 59, pp. 88-97, DOI 10.1145/2975159, August 2016. 1554 [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, 1555 DOI 10.17487/RFC0791, September 1981, 1556 . 1558 [RFC1122] Braden, R., Ed., "Requirements for Internet Hosts - 1559 Communication Layers", STD 3, RFC 1122, 1560 DOI 10.17487/RFC1122, October 1989, 1561 . 1563 [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, 1564 DOI 10.17487/RFC1982, August 1996, 1565 . 1567 Authors' Addresses 1569 Randy Bush 1570 Arrcus & Internet Initiative Japan 1571 5147 Crystal Springs 1572 Bainbridge Island, WA 98110 1573 US 1575 Email: randy@psg.com 1577 Rob Austein 1578 Arrcus, Inc 1580 Email: sra@hactrn.net 1582 Keyur Patel 1583 Arrcus 1584 2077 Gateway Place, Suite #400 1585 San Jose, CA 95119 1586 US 1588 Email: keyur@arrcus.com