idnits 2.17.1 draft-ymbk-lsvr-lsoe-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords -- however, there's a paragraph with a matching beginning. Boilerplate error? (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document date (March 13, 2018) is 2234 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '256' on line 341 -- Looks like a reference, but probably isn't: '2' on line 370 -- Looks like a reference, but probably isn't: '0' on line 373 -- Looks like a reference, but probably isn't: '1' on line 373 ** Obsolete normative reference: RFC 7752 (Obsoleted by RFC 9552) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Bush 3 Internet-Draft Arrcus & IIJ 4 Intended status: Standards Track K. Patel 5 Expires: September 14, 2018 Arrcus 6 March 13, 2018 8 Link State Over Ethernet 9 draft-ymbk-lsvr-lsoe-00 11 Abstract 13 Used in a Massive Data Center (MDC), BGP-LS and BGP-SPF need link 14 neighbor discovery, liveness, and addressability data. Link State 15 Over Ethernet protocols provide link discovery, exchange AFI/SAFIs, 16 and discover addresses over raw Ethernet. These data are pushed 17 directly to BGP-LS/SPF, obviating the need for centralized controller 18 architectures. This protocol is more widely applicable, and has been 19 designed to support a wide range of routing and similar protocols 20 which need link discovery and characterisation. 22 Requirements Language 24 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 25 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to 26 be interpreted as described in RFC 2119 [RFC2119] only when they 27 appear in all upper case. They may also appear in lower or mixed 28 case as English words, without normative meaning. 30 Status of This Memo 32 This Internet-Draft is submitted in full conformance with the 33 provisions of BCP 78 and BCP 79. 35 Internet-Drafts are working documents of the Internet Engineering 36 Task Force (IETF). Note that other groups may also distribute 37 working documents as Internet-Drafts. The list of current Internet- 38 Drafts is at https://datatracker.ietf.org/drafts/current/. 40 Internet-Drafts are draft documents valid for a maximum of six months 41 and may be updated, replaced, or obsoleted by other documents at any 42 time. It is inappropriate to use Internet-Drafts as reference 43 material or to cite them other than as "work in progress." 45 This Internet-Draft will expire on September 14, 2018. 47 Copyright Notice 49 Copyright (c) 2018 IETF Trust and the persons identified as the 50 document authors. All rights reserved. 52 This document is subject to BCP 78 and the IETF Trust's Legal 53 Provisions Relating to IETF Documents 54 (https://trustee.ietf.org/license-info) in effect on the date of 55 publication of this document. Please review these documents 56 carefully, as they describe your rights and restrictions with respect 57 to this document. Code Components extracted from this document must 58 include Simplified BSD License text as described in Section 4.e of 59 the Trust Legal Provisions and are provided without warranty as 60 described in the Simplified BSD License. 62 Table of Contents 64 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 65 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 66 3. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4 67 4. Top Level Overview . . . . . . . . . . . . . . . . . . . . . 4 68 5. Ethernet to Ethernet Protocols . . . . . . . . . . . . . . . 5 69 5.1. Inter-Link Ether Protocol Overview . . . . . . . . . . . 5 70 5.2. PDUs and Frames . . . . . . . . . . . . . . . . . . . . . 7 71 5.2.1. Frame TLV . . . . . . . . . . . . . . . . . . . . . . 7 72 5.2.2. Link KeepAlive / Hello . . . . . . . . . . . . . . . 10 73 5.2.3. Capability Exchange . . . . . . . . . . . . . . . . . 10 74 5.2.4. Timer Negotiation . . . . . . . . . . . . . . . . . . 11 75 5.3. The AFI/SAFI Exchanges . . . . . . . . . . . . . . . . . 11 76 5.3.1. AFI/SAFI Capability Exchange . . . . . . . . . . . . 11 77 5.3.2. The AFI/SAFI PDU Skeleton . . . . . . . . . . . . . . 12 78 5.3.3. AFI/SAFI ACK . . . . . . . . . . . . . . . . . . . . 13 79 5.3.4. Add/Drop/Prim . . . . . . . . . . . . . . . . . . . . 13 80 5.3.5. IPv4 Announce / Withdraw . . . . . . . . . . . . . . 13 81 5.3.6. IPv6 Announce / Withdraw . . . . . . . . . . . . . . 14 82 5.3.7. MPLS IPv4 Announce / Withdraw . . . . . . . . . . . . 14 83 5.3.8. MPLS IPv6 Announce / Withdraw . . . . . . . . . . . . 15 84 6. Layer 2.5 and 3 Liveness . . . . . . . . . . . . . . . . . . 16 85 7. The North/South Protocol . . . . . . . . . . . . . . . . . . 16 86 7.1. Topology Request for Full State . . . . . . . . . . . . . 16 87 7.2. PDU from Link Layer to Shim . . . . . . . . . . . . . . . 17 88 7.3. Link/ASN sub-PDU . . . . . . . . . . . . . . . . . . . . 17 89 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 90 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 91 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 18 92 11. Normative References . . . . . . . . . . . . . . . . . . . . 19 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 95 1. Introduction 97 The Massive Data Center (MDC) environment presents unusual problems 98 of scale, e.g. O(10,000) switches, while its homogeneity presents 99 opportunities for simple approaches. Approaches such as Jupiter 100 Rising use a central controller to deal with scaling, while BGP-SPF 101 [I-D.keyupate-idr-bgp-spf] provides massive scale out without 102 centralization using a tried and tested scalable distributed control 103 plane, offering a scalable routing solution in Clos and similar 104 environments. But it needs link state and addressing data from the 105 network to build the routing topology. LLDP has scaling issues, e.g. 106 in extending a PDU beyond 1,500 bytes. 108 Link State Over Ethernet (LSOE) provides brutally simple mechanisms 109 for devices to 111 o Discover each other's MACs, 113 o Run MAC keep-alives for liveness assurance, 115 o Discover each other's ASNs, 117 o Negotiate mutually supported AFI/SAFIs, 119 o Discover and maintain link IP/MPLS addresses, 121 o Enable layer three link liveness such as BFD, and finally 123 o Push these data up to BGP-SPF which computes the topology and 124 builds routing and forwarding tables. 126 This protocol is more widely applicable than BGP-SPF, and has been 127 designed to support a wide range of routing and similar protocols 128 which need link discovery and characterisation. 130 2. Terminology 132 Even though it concentrates on the Ethernet layer, this document 133 relies heavily on routing terminology. The following are some 134 possibly confusing terms: 136 AFI/SAFI: Address Family Indicator and Subsequent Address Family 137 Indicator. I.e. classes of addresses such as IPv4, IPv6, 138 ... 139 ASN: Autonomous System Number, a BGP identifier for an 140 originator of routing, particularly BGP, announcements. 141 BGP-SPF A hybrid protocol using BGP transport but Dijkstra SPF 142 decision process. See [I-D.keyupate-idr-bgp-spf]. 144 Clos: A hierarchic switch topology commonly used in data 145 centers. 146 Frame The payload of an Ethernet packet. 147 MAC: Medium Access Control, essentially an Ethernet address, 148 six octets. 149 MDC: Massive Data Center, O(1,000) TORs or more. 150 PDU: Protocol Data Unit, essentially an application layer 151 message. 152 SPF: Shortest Path First, an algorithm for finding the shortest 153 paths between nodes in a graph. 154 TOR: Top Of Rack switch, aggregates the servers in a rack and 155 connects to the Clos spine. 156 ZTP: Zero Touch Provisioning gives devices initial addresses, 157 credentials, etc. on boot/restart. 159 3. Background 161 LSOE assumes a Clos-like topology, though the acyclic constraint is 162 not necessary. 164 While LSOE is designed for the MDC, there are no inherent reasons it 165 could not run on a WAN; though it is not clear that this would be 166 useful. The authentication and authorisation needed to run safely on 167 the WAN are not (yet) included in this protocol. 169 LLDP is not suitable because one can not extend a PDU beyond 1500 170 bytes without hitting an IPR barrier. It is also complex. 172 UDP is unsuitable as it would require prior knowledge of IP level 173 addressing, one of the key purposes of this discovery protocol. 175 LSOE assumes a new IEEE assigned EtherType (TBD). 177 4. Top Level Overview 179 o MAC Link State is exchanged over Ethernet 181 o AFI/SAFI data are exchanged and IP-Level Liveness Checks done 183 o BGP-SPF uses the data to discover and build the topology database 184 +-------------------+ +-------------------+ +-------------------+ 185 | Device | | Device | | Device | 186 | | | | | | 187 |+-----------------+| |+-----------------+| |+-----------------+| 188 || || || || || || 189 || BGP-SPF <+---+> BGP-SPF <+---+> BGP-SPF || 190 || || || || || || 191 |+--------^--------+| |+--------^--------+| |+--------^--------+| 192 | | | | | | | | | 193 | | | | | | | | | 194 |+--------+--------+| |+--------+--------+| |+--------+--------+| 195 || Liveness || || Liveness || || Liveness || 196 || AFI/SAFIs || || AFI/SAFIs || || AFI/SAFIs || 197 || Addresses || || Addresses || || Addresses || 198 |+--------^--------+| |+--------^--------+| |+--------^--------+| 199 | | | | | | | | | 200 | | | | | | | | | 201 |+--------v--------+| |+--------v--------+| |+--------v--------+| 202 || || || || || || 203 || Ether PDUs <+---+> Ether PDUs <+---+> Ether PDUs || 204 || || || || || || 205 |+-----------------+| |+-----------------+| |+-----------------+| 206 +-------------------+ +-------------------+ +-------------------+ 208 There are two sets of protocols: 210 o Ethernet to Ethernet protocols are used to exchange layer 2 data, 211 i.e. MACs, and layer 2.5 and 3 data, i.e. ASNs, AFI/SAFIs, and 212 interface addresses. 214 o A Link Layer to BGP protocol pushes these data up the stack to 215 BGP-SPF, converting to the BGP-LS BGP-like data format. 217 o And, of course, the BGP layer crosses all the devices, though it 218 is not part of these LSOE protocols. 220 5. Ethernet to Ethernet Protocols 222 The basic Ethernet Framed protocols 224 5.1. Inter-Link Ether Protocol Overview 225 | Hello / KeepAlive (type=0) | 226 |--------------------------------------->| 227 | | MACs and Liveness 228 | Hello / KeepAlive (type=0) | Mandatory 229 |<---------------------------------------| 230 | | 231 | | 232 | | 233 | Timers (type=1, cap 1) | 234 |--------------------------------------->| Timers (type 1, cap 1) 235 | | Optional 236 | Timers (type=1, cap 1) | Renegotiate at Any Time 237 |<---------------------------------------| 238 | | 239 | | 240 | | 241 | Link AFI/SAFIs (type=1, cap 4) | 242 |--------------------------------------->| AFI/SAFI Support (cap 4) 243 |<---------------------------------------| Mandatory 244 | Link AFI/SAFIs (type=1, cap 4) | Renegotiate at Any Time 245 | | 246 | | 247 | | 248 | Interface MPLS Labels (type=10) | 249 |--------------------------------------->| Interface Labels 250 | | Optional 251 | Interface MPLS Labels (type=10) | Renegotiate at Any Time 252 |<---------------------------------------| 253 | | 254 | | 255 | | 256 | Interface IPv4 Addresses (type=14) | 257 |--------------------------------------->| Interface IPv4 Addresses 258 | | Optional 259 | Interface IPv4 Addresses (type=14) | Renegotiate at Any Time 260 |<---------------------------------------| 261 | | 262 | | 263 | | 264 | Interface IPv6 Addresses (type=16) | 265 |--------------------------------------->| Interface IPv6 Addresses 266 | | Optional 267 | Interface IPv6 Addresses (type=16) | Renegotiate at Any Time 268 |<---------------------------------------| 270 5.2. PDUs and Frames 272 This is all about inter-device Link State. 274 A PDU is one or more Ethernet Frames. 276 A Frame has a PDU Sequence Number and a Frame Number to allow 277 assembly of out order frames. 279 Because BGP-SPF and Data Plane payloads are assumed to be IP over the 280 same Ethernet, one worries about congestion. 282 5.2.1. Frame TLV 284 The basic Ethernet PDU is a typical TLV (Type Length Value) PDU, 285 except it's really LTV for the sake of alignment :) 287 0 1 2 3 288 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 290 | PDU Sequence No | Frame No | Flags | 291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 292 | Checksum | Length | 293 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 294 | Type | 295 +-+-+-+-+-+-+-+-+ 297 The fields of the basic Ethernet PDU are as follows: 299 PDU Sequence No: Semi-unique identifier of a TLV PDU (e.g. the low 300 order 16 bits of UNIX time) 302 Frame No: 0..255 Frame Sequence Number Within a multi-frame PDU 304 Flags: A bit field 306 0 - Sender has been restarted 307 1 - One of a multi-Frame sequence 308 2 - last of a multi-Frame sequence 309 3-7 - Reserved 311 Checksum: One's complement over Frame, detect bit flips 313 Length: Total Bytes in PDU including all frames and fields 315 Type: An integer 317 0 - Hello / KeepAlive 318 1 - Capability 319 2-9 - Reserved 320 10 - AFI/SAFI ACK 321 11 - IPv4 Announce / Withdraw 322 12 - IPv6 Announce / Withdraw 323 13 - MPLS IPv4 Announce / Withdraw 324 14 - MPLS IPv6 Announce / Withdraw 325 15-255 Reserved 327 5.2.1.1. The Checksum 329 There is a reason conservative folk use a checksum in UDP. And when 330 the operators stretch to jumbo frames ... 332 One's complement is a bit silly, though trivial to implement and 333 might be sufficient. 335 Sum up either 16-bit shorts in a 32-bit int, or 32-bit ints in a 336 64-bit long, then take the high-order section, shift it right, 337 rotate, add it in, repeat until zero. -- smb off the top of his head 339 /* The F table from Skipjack, and it would work for the S-Box. 340 There are other S-Box sources as well. -- Russ Housley */ 341 const BYTE sbox[256] = { 342 0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78, 343 0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e, 344 0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0, 345 0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53, 346 0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5, 347 0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b, 348 0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85, 349 0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90, 350 0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56, 351 0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20, 352 0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e, 353 0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18, 354 0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9, 355 0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87, 356 0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73, 357 0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5, 358 0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e, 359 0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1, 360 0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe, 361 0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac, 362 0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01, 363 0x05,0x59,0x2a,0x46 364 }; 366 /* example C code, constant time even, thanks Rob Austein */ 368 uint16_t sbox_checksum(const *b, const size_t n) 369 { 370 uint32_t sum[2] = {0, 0}; 371 for (int i = 0; i < n; i++) 372 sum[i & 1] += sbox[b[i]]; 373 uint32_t result = (sum[0] << 8) + sum[1]; 374 result = (result >> 16) + (result & 0xFFFF); 375 result = (result >> 16) + (result & 0xFFFF); 376 return (uint16_t) result; 377 } 379 5.2.2. Link KeepAlive / Hello 381 The Hello and KeepAlive PDUs are one and the same. 383 Each device learns the other's MAC from its HELLO whining. I.e., all 384 devices on a wire/interface know each others MACs and learn each 385 other's ASNs. 387 0 1 2 3 388 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 390 | PDU Sequence No | Frame No | Flags | 391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 392 | Checksum | Length = 17 | 393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 394 | Type = 0 | MyASN | 395 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 396 | | YourASN (or Zero) | 397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 398 | | 399 +-+-+-+-+-+-+-+-+ 401 Once two devices know each other's MACs, Ethernet keep-alives may be 402 started to ensure layer two liveness. The timing and acceptable drop 403 of the keep-alives may be set with the Timer Negotiation capability 404 exchange. 406 5.2.3. Capability Exchange 408 Peers on the Ethernet exchange capabilities, such as timers, AFI/ 409 SAFIs supported, etc. There is a simple capability exchange. 411 By convention, the device with the lowest MAC sends first. 413 0 1 2 3 414 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 415 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 416 | PDU Sequence No | Frame No | Flags | 417 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 418 | Checksum | Length | 419 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 420 | Type = 1 | RADflag | Capability | 421 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 423 The RADflag is an integer field which signals the capability 424 negotiation. 426 bit 0 - Request 427 bit 1 - Accept 428 bit 2 - Deny 429 bits 3-255 - Reserved 431 5.2.4. Timer Negotiation 433 Different operational scenarios may call for layer two and layer 434 three timers which differ from the defaults. So there is a 435 capability negotiation to modify these timers. 437 0 1 2 3 438 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 440 | PDU Sequence No | Frame No | Flags | 441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 442 | Checksum | Length = 16 | 443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 444 | Type = 1 | RADflag | Capability = 1 | 445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 446 | Frequency | AllowMissCt | A/S Wait | 447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 449 The meaning of the timer fields are as follows: 451 Frequency: Seconds/10 between KeepAlives (Default is 600) 452 AllowMissCt: Number of missed KeepAlives before declared down 453 A/S Wait AFI/SAFI ACK Timeout in Sec/10 (default 10) 455 5.3. The AFI/SAFI Exchanges 457 The devices know each other's MACs, have means to ensure link state, 458 and know each other's ASNs. Now they can negotiate which AFI/SAFIs 459 are supported, and announce their interface addresses (and labels). 461 5.3.1. AFI/SAFI Capability Exchange 463 First they negotiate what AFI/SAFIs are supported on the link. 465 As before, the lowest MAC initiates the negotiation. 467 0 1 2 3 468 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | PDU Sequence No | Frame No | Flags | 471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 472 | Checksum | Length = 13 | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Type = 1 | RADflag | Capability = 4 | 475 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 476 | AFI/SAFIs | 477 +-+-+-+-+-+-+-+-+ 479 The AFI/SAFIs currently defined are as follows: 481 10 - IPv4 482 11 - IPv6 483 12 - MPLS IPv4 484 13 - MPLS IPv6 485 ... - other tunnels (e.g. GRE) 487 5.3.2. The AFI/SAFI PDU Skeleton 489 Now both sides can exchange their actual interfaces addresses for all 490 the negotiated AFI/SAFIs. 492 0 1 2 3 493 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 495 | PDU Sequence No | Frame No | Flags | 496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 497 | Checksum | Length | 498 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 499 | Type = 42 | Sequence Number | 500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 501 | | AFI/SAFI Count | sub-PDUs... | 502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 504 The AFI/SAFI Exchange is over an unreliable transport so there are 505 Sequence Numbers and ACKs. 507 The Sequence Number is a point-to-point link announcement counter, 508 incremented for each exchange in each direction on the link. 510 The Receiver will ACK it with a Type=10, see following PDU. 512 If the Sender does not receive an ACK in one second, they retransmit. 513 Other delay timers may be negotiated using the Timing Capability. 515 If a sender has multiple links on the same interface, separate 516 counters must be kept for each. 518 5.3.3. AFI/SAFI ACK 520 0 1 2 3 521 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 523 | PDU Sequence No | Frame No | Flags | 524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 525 | Checksum | Length | 526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 527 | Type = 10 | Sequence Number | 528 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 529 | | 530 +-+-+-+-+-+-+-+-+ 532 5.3.4. Add/Drop/Prim 534 Each AFI/SAFI interface address may actually be announced, or 535 withdrawn. 537 An interface may have multiple AFI/SAFIs. 539 For each AFI/SAFI on an interface there might be multiple addresses. 541 One address per AFI/SAFI SHOULD be marked as primary. 543 0 1 2 3 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 545 | Add/Drop | Primary | Reserved | 546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 548 5.3.5. IPv4 Announce / Withdraw 549 0 1 2 3 550 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 552 | PDU Sequence No | Frame No | Flags | 553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 554 | Checksum | Length | 555 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 556 | Type = 11 | Sequence Number | 557 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 558 | | AFI/SAFI Count | Add/Drop/Prim | 559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 560 | IPv4 Prefix/Len | 561 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 562 | | Add/Drop/Prim | | 563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 564 | IPv4 Prefix/Len | more ... | 565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 567 5.3.6. IPv6 Announce / Withdraw 569 0 1 2 3 570 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 572 | PDU Sequence No | Frame No | Flags | 573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 574 | Checksum | Length | 575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 576 | Type = 12 | Sequence Number | 577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 578 | | AFI/SAFI Count | Add/Drop/Prim | 579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 580 | | 581 + + 582 | | 583 + + 584 | | 585 + + 586 | IPv6 Prefix/Len | 587 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 588 | | more ... | 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 591 5.3.7. MPLS IPv4 Announce / Withdraw 592 0 1 2 3 593 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 595 | PDU Sequence No | Frame No | Flags | 596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 597 | Checksum | Length | 598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | Type = 13 | Sequence Number | 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 601 | | AFI/SAFI Count | Add/Drop/Prim | 602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 603 | Label | Exp |S| TTL | 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 | IPv4 Prefix/Len | 606 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 607 | | more ... | 608 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 610 5.3.8. MPLS IPv6 Announce / Withdraw 612 0 1 2 3 613 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 615 | PDU Sequence No | Frame No | Flags | 616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 | Checksum | Length | 618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 619 | Type = 14 | Sequence Number | 620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 621 | | AFI/SAFI Count | Add/Drop/Prim | 622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 623 | Label | Exp |S| TTL | 624 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 625 | | 626 + + 627 | | 628 + + 629 | | 630 + + 631 | IPv6 Prefix/Len | 632 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 633 | | more ... | 634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 636 6. Layer 2.5 and 3 Liveness 638 Now IP/Label liveness may be tested. 640 Assume one or more AFI/SAFI addresses will be used to ping, BFD, or 641 whatever the operator configures. 643 7. The North/South Protocol 645 Thus far, we have a one-hop point-to-point link discovery protocol. 647 We know what ASNs and AFI/SAFIs are on each Link Interface. 649 At the Ethernet layer we did not want to do topology discovery and 650 Dijkstra a la IS-IS. 652 So the link ASNs, AFI/SAFIs, and state changes are pushed North to 653 BGP-SPF which discovers the topology, runs Dijkstra, and builds the 654 routing database. 656 We assume there is a shim to convert and buffer the ether layer data 657 to [RFC7752] BGP-like PDUs which can be digested by BGP-SPF. 659 We assume a reliable intra-device transport, so no ACKs are needed. 661 We assume a PDU capable of 64k. 663 The protocol is [re]started by a request from the 7752 topology Shim 664 Layer. 666 The Ether Layer then sends the full topology, its full link neighbor 667 state, North. 669 The Ether layer sends incremental updates as links and/or addressing 670 change. 672 7.1. Topology Request for Full State 674 The [RFC7752] shim on a device requests a full state dump from the 675 Ethernet layer on the device 677 0 1 2 3 678 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 680 | Type = 0 | Flag | Length = 4 | 681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 683 7.2. PDU from Link Layer to Shim 685 The Northbound PDU has a frame independent of the peer ASNs and links 687 0 1 2 3 688 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 690 | Type = 1 | Flag | Length | 691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 692 | Sequence Number | 693 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 694 | Link Count | Multiple Link/ASN sub-PDUs | 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 697 There are multiple sub-PDUs for all the learned ASNs and all the AFI/ 698 SAFIs for each ASN learned. 700 The fields of the header PDU are as follows: 702 Flag: An integer: 704 0 - This is the start of a Full State transfer 705 1 - Continuation PDU 706 2 - Last PDU of transfer 707 3 - This is the start of a Update for a state change 708 4-255 - Reserved 710 Link Count: Number of Link/ASN sub-PDUs to follow 712 Multiple Link/ASN LSAs: see following 714 7.3. Link/ASN sub-PDU 716 0 1 2 3 717 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 719 | My ASN | 720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 721 | Their ASN | 722 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 723 | Count | AFI/SAFI Type | Add/Drop/Prim | | 724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 725 | Single AFI/SAFI of Type | 726 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 727 | AFI/SAFI Type | Add/Drop/Prim | | 728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+ 729 | Single AFI/SAFI of Type | more ... | 730 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 731 The fields in the AFI/SAFI are as follows: 733 Count: Number of AFI/SAFIs in this sub-PDU 735 AFI/SAFI Type: An integer 737 11 - IPv4 738 12 - IPv6 739 13 - MPLSv4 740 14 - MPLSv6 741 ... 743 Add/Drop/Prim (bits) 745 0 - Announce(1) / Withdraw(0) 746 1 - Primary 747 2-7 - Reserved 749 8. Security Considerations 751 The protocol as is MUST NOT be used outside a datacenter environment 752 due to lack of authentication and authorisation. These will be 753 worked on in a later effort, likely using credentials configured 754 using ZTP. 756 Many MDC operators have a strange belief that physical walls and 757 firewalls provide sufficient security. This is not credible. These 758 protocols need to be examined for exposure and attack surface. 760 On the wire Ethernet is assumed to be secure, though it could be 761 tapped and data modified by an in-house attacker. 763 Malicious nodes/devices could mis-announce addressing, form malicious 764 associations, etc. 766 9. IANA Considerations 768 This document has no IANA Considerations. 770 This document does need a new EtherType. 772 10. Acknowledgments 774 The authors thank Cristel Pelsser for multiple reviews, Martijn 775 Schmidt for his contribution, Rob Austein for reviews and checksum 776 code, Russ Housley for checksum discussion and sBox, and Steve 777 Bellovin for more checksum discussion. 779 11. Normative References 781 [I-D.keyupate-idr-bgp-spf] 782 Patel, K., Lindem, A., Zandi, S., and G. Velde, "Shortest 783 Path Routing Extensions for BGP Protocol", draft-keyupate- 784 idr-bgp-spf-04 (work in progress), January 2018. 786 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 787 Requirement Levels", BCP 14, RFC 2119, 788 DOI 10.17487/RFC2119, March 1997, 789 . 791 [RFC7752] Gredler, H., Ed., Medved, J., Previdi, S., Farrel, A., and 792 S. Ray, "North-Bound Distribution of Link-State and 793 Traffic Engineering (TE) Information Using BGP", RFC 7752, 794 DOI 10.17487/RFC7752, March 2016, 795 . 797 Authors' Addresses 799 Randy Bush 800 Arrcus & IIJ 801 5147 Crystal Springs 802 Bainbridge Island, WA 98110 803 United States of America 805 Email: randy@psg.com 807 Keyur Patel 808 Arrcus 809 2077 Gateway Place, Suite #250 810 San Jose, CA 95119 811 United States of America 813 Email: keyur@arrcus.com