idnits 2.17.1 draft-eastlake-trill-rbridge-dcb-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document updates RFC6325, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 2, 2012) is 4223 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 TRILL Working Group Donald Eastlake 2 INTERNET-DRAFT Huawei 3 Intended status: Proposed Standard Manoj Wadekar 4 Updates: 6325 QLogic 5 Anoop Ghanwani 6 Dell 7 Puneet Agarwal 8 Broadcom 9 Tal Mizrahi 10 Marvell 11 Expires: March 31, 2013 October 2, 2012 13 TRILL: Support of IEEE 802.1Qbb, 802.1Qaz, and Congestion Notification 14 16 Abstract 18 IEEE 802.1 has developed standards as part of its Data Center 19 Bridging (DCB) activity to (1) efficiently minimize data loss due to 20 queue overflow for selected classes of traffic within Local Area 21 Networks (LANs) meeting certain conditions and (2) provide means to 22 allocate the available bandwidth on links to different classes of 23 traffic. These standards are now in IEEE Std 802.1Qbb-2011, IEEE Std 24 802.1Qaz-2011, and the Congestion Notificaiton freature in IEEE Std 25 802.1Q-2011. 27 This document briefly explains these standards and discusses the 28 support of these IEEE 802 standards in TRILL switches (devices that 29 implement the IETF TRILL protocol standard). 31 Status of This Memo 33 This Internet-Draft is submitted to IETF in full conformance with the 34 provisions of BCP 78 and BCP 79. 36 Distribution of this document is unlimited. Comments should be sent 37 to the authors. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as Internet- 42 Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/1id-abstracts.html. The list of Internet-Draft 50 Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 Table of Contents 55 1. Introduction............................................4 56 1.1 Overview of These Standards............................5 57 1.2 Terminology............................................6 58 1.3 Additional Acronyms....................................6 60 2. Priority-Based Flow Control.............................7 61 3. Enhanced Transmission Selection.........................8 62 4. The DCB Exchange Protocol...............................9 64 5. Congestion Notification................................10 65 5.1 Congestion Notification Domains.......................12 66 5.2 Congestion Notification Tag Details...................14 67 5.3 Congestion Notification Message Details...............14 68 5.4 Additions to TRILL to Support Congestion Notification.15 69 5.4.1 TRILL Switch Ingress Details........................16 70 5.4.2 Transit TRILL Switch Details........................19 71 5.4.2.1 Transit TRILL Switch Input Port...................20 72 5.4.2.2 Transit TRILL Switch Output Port..................20 73 5.4.3 TRILL Switch Egress Details.........................21 75 6. Management Considerations..............................22 76 7. IANA Considerations....................................22 77 8. Security Considerations................................22 79 9. References.............................................23 80 9.1 Normative References..................................23 81 9.2 Informative References................................23 82 Version History...........................................25 84 1. Introduction 86 IEEE 802.1 has developed various standards as part of its Data Center 87 Bridging (DCB) activity. The intent of three of these standards is 88 (1) to efficiently eliminate data loss due to queue overflow for 89 selected classes of traffic within Local Area Networks (LANs) meeting 90 certain conditions and (2) to provide limited means to allocate the 91 available bandwidth to different classes of traffic. Those three 92 standardes are Priority Based Flow Control (the IEEE [802.1Qbb] 93 standard), Enhanced Tramission Selection (the IEEE [802.1Qaz] 94 standard), and the Congestion Notification (CN) feature in the IEEE 95 [802.1Q] standard. Intended uses include the support of loss 96 sensitive services, such as Fiber Channel over Ethernet [FCoE], in 97 data centers. Because they are primarily implemented at the port 98 level, no changes in the TRILL protocol are required to support IEEE 99 802.1Qbb or 802.1Qaz. To support 802.1Qau, minor changes to TRILL are 100 required as specified herein. 102 The existing optional PAUSE feature of IEEE 802.3 (Annex 31B of 103 [802.3]) can, with appropriate engineering, also provide Ethernet 104 service without loss of frames due to queue overflow. However, PAUSE 105 has problems as follows: 107 1. Traffic for some protocols, for example TCP [RFC793], requires 108 frame losses to signal congestion for flow control. Elimination of 109 frame drops due to congestion would prevent TCP flow control, 110 unless some other mechanism were added. 112 2. Some traffic consists of time critical network control frames, for 113 example IS-IS Hellos [IS-IS]. PAUSE is relatively indiscriminant 114 and pauses such frames, except for some MAC Control frames such as 115 PAUSE control frames themselves, along with any less critical 116 traffic. Pausing such critical network control frames can 117 compromise transport connectivity. 119 3. PAUSE can result in intermittent waves of spreading traffic 120 paralysis, crippling network throughput, as follows: When a switch 121 S1 receives a PAUSE on a port P1 and can no longer transmit frames 122 out that port it is likely that output queues to P1 will fill up 123 quickly. As soon as one output queue to P1 is full or almost so 124 then, to avoid frame loss, S1 must send PAUSE frames out on each 125 of its ports that might receive a frame for output to P1. For 126 example, it might have to PAUSE input on P2 through P9, 127 unnecessarily blocking traffic between any pair of those ports, to 128 be sure it will not receive input on any of them for P1. This can 129 repeat in switches connected to S1, switches connected to switches 130 connected to S1, etc. 132 1.1 Overview of These Standards 134 Overviews of the three DCB standards covered herein are given below. 135 IEEE 802.1 has specified theses standards and the behavior needed to 136 support them in bridges and end stations. This document discusses the 137 support of these standards in TRILL switches (commonly called 138 RBridges [RFC6325]). 140 IEEE [802.1Qbb], Priority-based Flow Control (PFC), provides a frame 141 priority based refinement of the Ethernet PAUSE feature as described 142 in Section 2. To the extent that a switch implements separate queues 143 for different priorities at each port, this can eliminate the first 144 and second of the PAUSE problems listed above. Traffic requiring 145 frame drops due to congestion can be assigned a priority for which 146 PFC is not enabled. PFC is not normally enabled for the two highest 147 priorities, 6 and 7, which are typically used for time sensitive 148 control frames. PFC also reduces the third problem as any congestion 149 spreading would affect only priorities with PFC enabled. 151 IEEE [802.1Qaz] is a standard covering two things: One, Enhanced 152 Transmission Selection (ETS), allocates bandwidth between traffic 153 class groups indicated by priority. It is described in Section 3. 154 Second, 802.1Qaz contains the specification of the Data Center 155 Bridging Exchange Protocol (DCBX) for discovering and configuring the 156 three standards that this document covers, as described in Section 4. 158 Congestion Notification (CN), formerly IEEE 802.1Qau, provides 159 signaling of congestion on a per flow basis to the end station source 160 of the flow. It was adopted as an amendment to IEEE 802.1Q-2005 and 161 has been rolled into [802.1Q]. As a part of CN, participating end 162 stations are required to implement per flow rate limiting. CN is 163 enabled on a per priority basis and, with appropriate engineering, 164 minimizes frame drops due to queue overflow in a LAN Congestion 165 Notification Domain within which all switches and end stations 166 implement it. CN and 802.1Qbb Priority-Based Flow Control (PFC) 167 complement each other to help eliminate such frame drops. CN reducse 168 congestion by proactively reducing frame ingress rates at the source 169 end station(s) involved in the congestion. For some congestion cases 170 this may be insufficient to stop buffer overflow at a congestion 171 point. PFC provides an emergency brake for such cases and avoids 172 frame loss. CN eliminates the first problem listed above for PAUSE in 173 that frames that require drops due to congestion for flow control can 174 be assigned a priority for which CN is not enabled. CN avoids the 175 second problem because it is not normally used to limit priorities 6 176 and 7, which are typically used for time sensitive control frames. 177 And CN avoids the third problem listed above for PAUSE because it 178 acts by restraining end station flow sources rather than blocking 179 transmission on intermediate switch ports. Section 5 below provides 180 additional information on CN and specifies additions to the TRILL 181 protocol to support it. 183 These three DCB standards may be implemented independently or in any 184 combination except that implementation of any of them implies 185 implementation of DCBX, specified in IEEE [802.1Qaz]. 187 1.2 Terminology 189 The terminology and acronyms of [RFC6325] are used in this document. 191 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 192 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 193 document are to be interpreted as described in [RFC2119]. 195 1.3 Additional Acronyms 197 The following acronyms are used in this document in addition to those 198 defined in [RFC6325]. 200 AVB - Audio-Visual Bridging 202 CN - Congestion Notification [802.1Q] 204 CNM - Congestion Notification Message 206 CNtag - Congestion Notification tag 208 DCB - Data Center Bridging [802.1Qaz] 210 DCBX - DCB Exchange protocol [802.1Qaz] 212 ETS - Enhanced Transmission Selection [802.1Qaz] 214 FCoE - Fiber Channel over Ethernet [FCoE] 216 LLDP - Link Layer Discovery Protocol (IEEE 802.1AB) 218 PFC - Priority-based Flow Control [802.1Qbb] [802.3bd] 220 RBridge - Routing Bridge [RFC6325] 222 TRILL Switch - An alternative name for an RBridge 224 2. Priority-Based Flow Control 226 IEEE [802.1Qbb], Priority-Based Flow Control (PFC), refines the IEEE 227 [802.3] PAUSE feature to permit separately requesting, on a physical 228 Ethernet link, pausing and unpausing the traffic of each of the eight 229 available frame priority levels. The actual priority-based pause 230 Ethernet control frame is specified in IEEE [802.3bd]. 232 Such queue pausing occurs within the transmission logic associated 233 with a port and requires no changes to the TRILL protocol, which is 234 implemented above such port logic, as described in [RFC6325]. 235 LLDP/DCBX is used in PFC discovery and agreement with peers as 236 described in Section 4. A TRILL switch implementing the PFC standard 237 MUST implement DCBX, signaling PFC support and configuration. 238 Guarantee of lossless handling of frames with a particular priority 239 in a TRILL campus requires implementation and enablement of PFC for 240 that priority at all end stations that originate frames and all TRILL 241 switches and bridges in that campus as well as meeting the PFC 242 engineering requirements in [802.1Qbb]. 244 The PFC control frames specified in [802.3bd] are MAC control frames 245 that are not VLAN tagged. Their transmission normally bypasses the 246 output queue at a port so they are transmitted immediately, or as 247 soon as the frame currently being transmitted is sent, so as to meet 248 the timing requirements of PFC. 250 3. Enhanced Transmission Selection 252 Enhanced Transmission Selection (ETS), specified in IEEE [802.1Qaz], 253 allocates bandwidth, between traffic classes, through each of the 254 ports of a switch or end station. (To be more precise, it modifies 255 the algorithm used to select, from multiple priority-based output 256 queues at a port, the next frame to transmit. Provision is made for 257 proprietary algorithms and 802.1 has also specified an algorithm in 258 connection with precise frame timing (AVB), but we are only concerned 259 with the default DCB algorithm.) 261 Transmission selection occurs within the logic associated with a port 262 and requires no changes to the TRILL protocol, which is implemented 263 above such port logic, as described in [RFC6325]. A TRILL switch 264 implementing the ETS standard MUST implement DCBX (see Section 4) 265 signaling of ETS support and configuration. For ETS to be effective, 266 traffic in different ETS groups cannot share an output queue. 268 4. The DCB Exchange Protocol 270 The DCB Exchange Protocol (DCBX) is specified in IEEE [802.1Qaz], 271 which also specifies ETS as described in Section 3. 273 DCBX is built on the Link Layer Discovery Protocol (LLDP), which is 274 specified in IEEE [802.1AB]. DCBX is used for the discovery of DCB 275 capabilities of peer switches, for the detection of inconsistent 276 configuration of DCB features between peer switches, and for the 277 propagation of DCB features to switches configured to accept 278 configuration via DCBX. For purposes of TRILL protocol peering, TRILL 279 switches ignore intervening bridges, but for the purposes of LLDP and 280 DCBX all stations, including TRILL switches, 802.1 bridges, and end 281 stations are considered peers. 283 TRILL switches implementing any of the three DCB protocols MUST also 284 implement DCBX. 286 5. Congestion Notification 288 Congestion Notification (CN) can limit flows to minimize frame loss 289 by having congestion points that detect congestion send Congestion 290 Notification Messages (CNMs) back to reaction points in end stations 291 that can limit flows. See [802.1Q] for the specification of the CN 292 algorithms to perform at congestion and reaction points. Congestion 293 Notification is designed to operate best in minimizing frame loss of 294 unicast flows in a LAN composed of point-to-point physical links 295 where all switches have implemented Congestion Notification. 297 A TRILL switch that implements Congestion Notification may act as an 298 end point, for example when sourcing or sinking SNMP management 299 frames, and thus may contain one or more reaction points, as well as 300 implementing congestion points at its output queues. 302 Reaction points are in end stations where flows originate and are the 303 mechanism to limit flows. The granularity of reaction points is 304 beyond the scope of CN and this document but cannot be larger than a 305 priority at a MAC address. If the granularity is smaller and there 306 are multiple reaction points in an end station for a given priority, 307 then the end station must label outgoing frames with a Congestion 308 Notification tag (CNtag) that includes an end station flow ID. This 309 flow ID is an opaque field to the rest or the network. 311 +-----------------------------------------------+ 312 | Ethernet Header (possibly including VLAN Tag) | 313 +-----------------------------------------------+ 314 | Optional CNtag | 315 +-----------------------------------------------+ 316 | Ethernet Payload | 317 +-----------------------------------------------+ 318 | Ethernet FCS | 319 +-----------------------------------------------+ 321 Figure 1: Native Ethernet Frame in a CN Domain 323 Congestion points are at queues in forwarding devices, normally port 324 output queues. The functions of a congestion point are (1) to 325 conditionally send Congestion Notification Messages (CNMs) to the 326 source of a frame and (2) to conditionally strip Congestion 327 Notification tags (CNtags) out of a frame being forwarded, for 328 example if it is being forwarded out of a congestion notifiation 329 domain. 331 When a frame is to be inserted into an output queue with a congestion 332 point, the procedures specified in IEEE [802.1Q] are used to 333 determine if a CNM should be sent to the frame's source and if so to 334 determine various fields in that CNM. When a frame is to be inserted 335 into an output queue with a congestion point, the congestion point 336 may remove any CNtag in the frame as discussed in Section 5.1. 337 Congestion points are implemented within the logic associated with a 338 port and require no changes to TRILL for the output of native frames, 339 as TRILL is implemented above such port logic as described in 340 [RFC6325]; however, when outputting a TRILL Data frame, any CNM 341 generated needs to be for the TRILL encapsulated frame rather than 342 for the entire TRILL Data frame. In that case there are some 343 differences between the details of the creation of a CNM at an TRILL 344 switch output port and at a bridge output port. This CNM also needs 345 to be TRILL encapsulated but this will happen automatically as the 346 CNM is specified by [802.1Q] to be treated as a native frame arriving 347 at the port. 349 +-----------------------------------------------+ 350 | Ethernet Header (possibly including VLAN Tag) | 351 +-----------------------------------------------+ 352 | CNtag | 353 +-----------------------------------------------+ 354 | Congestion Notification Message Fixed Fields | 355 + - - - - - - - - - - - - - - - - - - - - - - -+ 356 | Initial bytes of frame causing CNM | 357 +-----------------------------------------------+ 358 | Ethernet FCS | 359 +-----------------------------------------------+ 361 Figure 2: Native Congestion Notification Message 363 Within a contiguous part of the TRILL campus where Congestion 364 Notification is enabled (see Section 5.1), you would see the same 365 frames with the same tags as in a similar bridged LAN except that 366 those frames will be TRILL encapsulated as shown in Figures 3 and 4. 367 The exception is when a TRILL-ignorant bridge within the campus 368 produces a CNM in response to a TRILL data frame as shown in Figure 369 6. The resulting CNM is corrected by the first TRILL switch it 370 encounters, which will be the previous-hop TRILL switch. 372 +-----------------------------------------------+ 373 | Link Header | 374 +-----------------------------------------------+ 375 | TRILL Header | 376 +-----------------------------------------------+ 377 | CNtag | 378 +-----------------------------------------------+ 379 | Rest of Native Payload | 380 +-----------------------------------------------+ 381 | Link Trailer | 382 +-----------------------------------------------+ 384 Figure 3. TRILL Data Form of CNtagged Native Frame 386 +-----------------------------------------------+ 387 | Link Header | 388 +-----------------------------------------------+ 389 | TRILL Header | 390 +-----------------------------------------------+ 391 | CNtag | 392 +-----------------------------------------------+ 393 | Congestion Notification Message Fixed Fields | 394 + - - - - - - - - - - - - - - - - - - - - - - -+ 395 | Initial bytes of frame causing CNM | 396 +-----------------------------------------------+ 397 | Link Trailer | 398 +-----------------------------------------------+ 400 Figure 4: TRILL Data Form of Congestion Notification Message 402 5.1 Congestion Notification Domains 404 Congestion Notification (CN) reduces frame drops due to output queue 405 overflow in a Congestion Notification Domain. There could be many 406 such domains, each specified for a particular priority and contiguous 407 set of network stations (end stations, TRILL switches, or bridges), 408 within a TRILL campus. For example, two Congestion Notification 409 Domains, one at priority X and one at priority Y, could cover the 410 same set of contiguous stations, overlapping but different sets of 411 such stations, or completely disjoint sets of such stations, in a 412 campus. 414 CN includes mechanisms to "defend" Congestion Notification Domains, 415 that is, make sure only congestion managed flows of frames enter 416 congestion point queues. The edge of a domain, i.e. the set of 417 station ports in the domain directly connected to a station not in 418 the domain, is determined by a combination of auto-detection using 419 LLDP (see Section 4) and management configuration. Bridges that 420 implement Congestion Notification defend a domain by the following: 422 1. Prohibiting priority mapping inside the domain. 424 2. Mapping the priority of any frame entering the domain from a 425 station outside the domain to a priority that is not a congestion 426 managed priority. 428 3. Prohibiting the mapping of the priority of any frame entering the 429 domain from a station outside the domain to the domain's priority. 431 The station containing the reaction-point-equipped source of a flow 432 must be part of a Congestion Notification Domain at the flow's 433 priority along with all stations along the path to the flow's 434 destination and all of the queues involved with the flow must be 435 congestion-point-equipped in order for CN to be able to meet its 436 goals. 438 Because of item 2 in the list above, a station can be a member of no 439 more than 7 different Congestion Notification Domains because there 440 must be at least one priority that is not congestion managed for use 441 as the mapped priority of entering frames from outside the domain and 442 which are therefore not part of a congestion managed flow. As a 443 practical matter, it is unlikely that a station would be a member of 444 more than 4 or 5 different Congestion Notification Domains as 445 priorities 6 and 7 are normally used for high priority control frames 446 that are not congestion controlled and at least one low priority is 447 kept non-congestion managed for mapping as above. 449 The per port per priority state of a switch or end station will be 450 one of the following four values, which have the effects indicated: 452 o Disabled: 453 - On native frame input, frame priority can be mapped to or from 454 this priority. 455 - If this is an end-station output port, CNtags are not added. 456 - If this is a switch output port, CNtags are not stripped. 458 o Edge: 459 - On native frame input, a frame with this priority is mapped to 460 a non-CN priority and no native frame can be mapped to this 461 priority, regardless of the priority-mapping table at the port. 462 For TRILL Data frames, this also applies to the Inner.VLAN 463 priority. 464 - If this is an end-station output port, CNtags are not added. 465 - If this is a switch output port, CNtags are stripped including 466 any CNtag in the encapsulated frame if a TRILL Data frame is 467 being output. 469 o Interior: 470 - On frame input, a frame in this priority is not mapped to 471 another priority and no frame can be mapped to this priority, 472 regardless of the priority-mapping table at this port. For 473 TRILL Data frames, this also applies to the Inner.VLAN 474 priority. 475 - If this is an end-station output port, CNtags are not added. 476 - If this is a switch output port, CNtags are strippedd including 477 any CNtag in the encapsulated frame if a TRILL Data frame is 478 being output. 480 o InteriorReady: 481 - On frame input, a frame in this priority is not mapped to 482 another priority and no frame can be mapped to this priority, 483 regardless of the priority-mapping table at this port. For 484 TRILL Data frames, this also applies to the Inner.VLAN 485 priority. 486 - If this is an end-station output port, CNtags may be added. 487 - If this is a switch output port, CNtags are not stripped. 489 Note that when the priority of a TRILL encapsulated frame is mapped, 490 the priority field in the Inner.VLAN tag MUST be changed. 492 5.2 Congestion Notification Tag Details 494 An end station originating a native frame may add a Congestion 495 Notification tag (CNtag) to identify the native frame's reaction 496 point in that end station, if the end station and the next hop device 497 are part of a Congestion Notification Domain. A CNtag is 4 bytes 498 long, consisting of a 2 bytes Ethertype (0x22E9) followed by a 2 499 bytes flow ID, and appears after any VLAN tag but before the frame 500 body. This CNtag flow ID is an opaque quantity only meaningful to the 501 originating end station. The inclusion of a CNtag is optional as the 502 originating end station may be able to identify the corresponding 503 reaction point from other information returned in a Congestion 504 Notification Message such as the priority. 506 As described in Section 5.3, CNtags are always added to Congestion 507 Notification Messages (CNM) when they are created. 509 5.3 Congestion Notification Message Details 511 A Congestion Notification Message (CNM) is, under certain 512 circumstances, created by a congestion point, as described in IEEE 513 [802.1Q], when a frame is entered into the queue associated with that 514 congestion point. The CNM frame always includes a Congestion 515 Notification tag (CNtag, see Section 5.2). The CNtag includes a zero 516 flow ID if the frame provoking the CNM did not have a CNtag. The body 517 of the CNM itself, after the CNtag, starts with the CNM Ethertype 518 (0x22E7) followed by the information below: 520 - CNM version information, currently zero 521 - Quantized congestion feedback information as specified in 522 [802.1Q] 523 - An 8 byte opaque ID of the congestion point generating the CNM 524 - The priority of the frame causing the CNM 525 - The destination MAC address of the frame causing the CNM 526 - The number of bytes included from the beginning of the body of 527 the frame causing the CNM 528 - The first up to 64 bytes of the body of the frame causing the 529 CNM 531 Except that input bytes/frame counters are not incremented, a CNM 532 generated at an output queue for a port is treated as if it had been 533 received on that port. CNMs are considered to be in the same VLAN as 534 the frame that provoked them and have configurable priority that 535 defaults to priority 6. 537 It is undesirable, but not an error, for a CNM to be sent in response 538 to a CNM frame which encounters congestion. This is normally avoided 539 by sending CNM frames with a priority which does not have congestion 540 notification enabled. 542 As described in Section 5.4.1.3 below, when a CNM is generated by an 543 TRILL switch when queuing a TRILL data frame, it is generated for the 544 enclosed frame, not for the entire TRILL data frame. This will cause 545 the CNM to be addressed to the source end station of the data. 547 5.4 Additions to TRILL to Support Congestion Notification 549 The figure below is used in the discussion in this section. The 550 assumption is that a frame is generated at End Station "a" (ESa) 551 destined for End Station "b" (ESb) and this frame is forwarded 552 through the sequence of 802.1 bridges (Bn) and TRILL switches 553 (RBridges, RBn) shown. For native frames from ESa, RB1 acts as the 554 ingress TRILL switch, encapsulating and directing them to egress 555 TRILL switch RB3 for decapsulation and delivery to ESb. The arrows 556 indicate the flow of a data frame. Any resulting CNM would flow in 557 the opposite direction; however, such a CNM would be independently 558 routed towards ESa and would not be constrained to follow the same 559 sequence of switches shown below. 561 +-----+ +-----+ +-----+ +-----+ 562 | ESa +-->--+ B1 | + RB3 |-->--+ B3 + 563 +-----+ +--+--+ +--+--+ +--+--+ 564 | | | 565 V ^ V 566 | | | 567 +--+--+ +-----+ +--+--+ +--+--+ 568 | RB1 +-->--+ RB2 +-->--+ B2 + | ESb | 569 +-----+ +-----+ +-----+ +-----+ 571 Figure 5: Example Frame Path 573 TRILL can make no change to the actions at any reaction points in ESa 574 or any congestion points at the output queues of B1, B2, or B3, since 575 they are not TRILL switches. Any CNM generated at B2 will be in 576 response to a TRILL frame and will need to be corrected by the 577 previous hop TRILL switch. The situation at the output queue of RB3 578 is actually the same as B3 since, as egress, RB3 will have 579 decapsulated any traffic for ESb before it tries to insert it in an 580 output queue. Thus the frame RB3 is enqueuing will be a native frame, 581 a congestion point at the RB3 output can act, for such a frame, 582 exactly as an IEEE 802.1 congestion point, and any CNM generated in 583 the RB3 output from that native frame will be treated as if it was 584 received by the RB3 port. 586 A CNM created at the RB1 or RB2 output queue is straightforward. 587 Assume the CNM is created in response to TRILL Data frame 1 (TDF1) 588 and the TDF1 encapsulates native frame 1 (NF1). The CNM would be 589 created as a TRILL encapsulated CNM with the ingress TRILL switch of 590 NF1 as its egress. The Inner.MacDA would be ESa. The Inner.MacSA 591 would be the MAC address of the port on which the TRILL encapsulated 592 CNM was initially sent, that is, the same as the Outer.MacSA. The 593 encapsulated CNM itself would be filled in as if in response to NF1, 594 not TDF1. 596 Similarly, a CNM created at B3 would have ESa as its destination 597 address and would be TRILL encapsulated when it arrived at RB3 as RB3 598 would be its ingress TRILL switch. 600 5.4.1 TRILL Switch Ingress Details 602 This section specifies special actions for CN at a TRILL switch input 603 port receiving a native frame, that is, the TRILL switch ingress 604 function. The usual processing on the priority of the input TRILL 605 data frame, modified as described in Section 5.1, is done. Special 606 actions are required only when the native frame received is a CNM. 608 The ingress process at a TRILL switch, say RB2, supporting CN MUST 609 detect the case of a native CNM created by a bridge in response to a 610 TRILL frame, say by B2 in Figure 4, and transform or discard it as 611 described below. If such a CNM was generated in response to a TRILL 612 control (IS-IS) frame, it is discarded. No other changes are needed 613 in the TRILL switch ingress process. 615 A native CNM requiring special actions is easily recognized on 616 ingress as it's MAC destination address will be the TRILL switch and 617 it will have the CNM Ethertype. (A CNM not addressed to the TRILL 618 switch must have been generated in response to an unencapsulated 619 native frame, for example at B3 in the diagram above, and can be 620 encapsulated by its Ingress TRILL switch and generally forwarded by 621 transit TRILL switches in the same way as other native frame.) 623 Such a native CNM resulting from a TRILL data frame at B2 has the 624 contents generally shown in Figure 6 and listed further below. 626 +-----------------------------------------------+ 627 | Ethernet Header (possibly including VLAN Tag} | 628 +-----------------------------------------------+ 629 | CNtag | 630 +-----------------------------------------------+ 631 | CNM Ethertype and Fixed Fields | 632 + - - - - - - - - - - - - - - - - - - - - - - -+ 633 | Up to 64 initial bytes of the following: | 634 | +-----------------------------------------+ | 635 | | TRILL Ethertype and Header | | 636 | +-----------------------------------------+ | 637 | | Optional CNtag | | 638 | +-----------------------------------------+ | 639 | | Native Payload | | 640 | +-----------------------------------------+ | 641 | | 642 +-----------------------------------------------+ 643 | Ethernet FCS | 644 +-----------------------------------------------+ 646 Figure 6: Native CNM Caused by a TRILL Data Frame 648 1 + Outer.MacDA, MAC address of RB2 649 2 + Outer.MacSA, MAC address of port on which B2, the bridge 650 generating this CNM, sent the CNM 651 3 + Outer.VLAN tag for the designated VLAN on the RB2 to RB3 link 652 with the priority configured at B2 for CNMs (default priority 6) 653 4 + CNtag (CNtag Ethertype 0x22E9 followed by Flow ID of zero) 654 + CNM 655 5 o CNM Ethertype 0x22E7 656 6 o CNM version information, quantized congestion feedback 657 information, and an 8 byte opaque ID of the congestion 658 point generating the CNM 659 7 o The priority of the TRILL encapsulated frame causing the 660 CNM 661 8 o The destination MAC address of the TRILL encapsulation 662 frame causing the CNM, RB3 in this case 663 9 o The number of bytes included below from the beginning of 664 the body of the TRILL encapsulation frame causing the CNM 665 + Initial bytes of body of TRILL encapsulation Data frame causing 666 the CNM 667 o TRILL Header of the frame causing the CNM 668 10 - TRILL Ethertype 0x22F3 669 11 - Flags, hop count, options length 670 12 - Egress nickname, RB3 in this case 671 13 - Ingress nickname, RB1 in this case 672 14 - Options, if any 673 15 o Inner.MacDA, MAC address of ESb 674 16 o Inner.MacSA, MAC address of ESa 675 17 o Inner.VLAN tag of the TRILL encapsulated frame causing the 676 CNM 677 18 o Optional CNtag 678 19 o Encapsulated native frame body 680 The ingressing TRILL switch RB2 transforms this CNM above into the 681 following TRILL encapsulated CNM. 683 + Outer.MacDA, MAC address of next hop RBridge (RB1) toward 684 originating end station 685 + Outer.MacSA, MAC address of RB2 port on which this TRILL 686 encapsulated CNM frame is to be sent 687 + Outer.VLAN tag for the designated VLAN on the RB2 to RB1 link 688 with priority copied from incoming Outer.VLAN, field #3 above 689 + TRILL Header to get the CNM to the right end station 690 o TRILL Ethertype 0x22F3 691 o Flags, hop count, options length 692 o Egress nickname, RB1 in this case, from ingress nickname in 693 the TRILL header in the received CNM, field #13 above 694 o Ingress nickname, RB2 in this case, the nickname of the 695 RBridge doing this transformation 696 o Options, if any 697 + Inner.MacDA, MAC address of ESa, field #16 above 698 + Inner.MacSA, MAC address of B2, field #2 above 699 + Inner.VLAN Tag with VLAN ID from field #17 above and priority 700 from field #3 above 701 + CNtag, with flow ID from field #18 above, if #18 is present, 702 otherwise flow ID of zero 703 + CNM 704 o CNM Ethertype 0x22E7 705 o CNM version information, quantized congestion feedback 706 information, and an 8 byte opaque ID of the congestion 707 point generating the CNM, field #6 above 708 o The priority of the native frame who's encapsulated form 709 caused the CNM, from Inner.VLAN, field #17 above 710 o The destination MAC address of the frame whose encapsulated 711 form caused the CNM, the Inner.MacDA, field #15 above 712 o The number of bytes included below from the beginning of 713 the body of the frame whose encapsulated form caused the 714 CNM. This will be 24 smaller (but not less than zero) than 715 the same field (#9) in the CNM tag received due to dropping 716 the TRILL Header (8 bytes), MAC addresses (12 bytes), and 717 Inner.VLAN (4 bytes). 718 + Initial bytes of the body of the frame whose encapsulated form 719 caused the CNM, field #19 above 721 Because of the reduction in the number of bytes of the body of the 722 frame that would have caused the CNM if it weren't TRILL 723 encapsulated, it is RECOMMENDED that bridges and TRILL switches 724 implementing Congestion Notification in a TRILL campus be configured 725 to include the maximum (64) number of bytes when generating a CNM. 727 5.4.2 Transit TRILL Switch Details 729 The subsections below describe transit TRILL switch support of 730 Congestion Notification at input and output ports. As this is a TRILL 731 switch in its transit role, only the handling of TRILL Data frames is 732 discussed. If the TRILL switch is receiving a native frame, it will 733 be an ingress as described in Section 5.4.2 and if it is sending a 734 native frame, it will be an egress as described in Section 5.4.3. 735 However, this section does apply to the output of an encapsulated 736 frame that was ingressed at a TRILL switch and to the input, in TRILL 737 encapsulated form, of a frame to be egressed at a TRILL switch. 739 5.4.2.1 Transit TRILL Switch Input Port 741 The usual 802.1Q processing on the priority of the input TRILL data 742 frame, modified as described in Section 5.1, is done. 744 5.4.2.2 Transit TRILL Switch Output Port 746 As discussed in Section 5.1, a CNtag is stripped under some 747 circumstances; however, such a CNtag will appear as part of the 748 encapsulated frame, not on the outside of the TRILL data frame, so 749 the CNtag is stripped from deeper in the frame. When there is a 750 Congestion Point enabled at a TRILL switch output queue, a CNM is not 751 generated as the result of trying to queue a TRILL control (IS-IS) 752 frame for output at a TRILL switch port. A TRILL encapsulated CNM is 753 generated in response to a TRILL Data frame composed as below, when 754 to do so is specified by [802.1Q]. The TRILL Data frame causing the 755 CNM is referred to as TDF1 and its encapsulated native frame as NF1. 757 + Outer.MacDA - MAC address of the next hop RBridge towards the 758 egress nickname used in the TRILL Header (see below) 759 + Outer.MacSA - MAC address of the output port on which the TRILL 760 encapsulated CNM is to be sent 761 + Outer.VLAN - Designated VLAN of the link on which the TRILL 762 encapsulated CNM is to be sent 763 + TRILL Header 764 o TRILL Ethertype 0x22F3 765 o Flags, hop count, options length 766 o Egress nickname, from ingress nickname in TDF1 767 o Ingress nickname, a nickname of the RBridge generating the 768 CNM 769 o Options, if any 770 + Inner.MacDA - set to the Inner.MacSA of TDF1, that is, the 771 source MAC address of NF1 772 + Inner.MacSA - same as Outer.MacSA of TDF1 773 + Inner.VLAN - same as the Inner.VLAN of TDF1, that is, the VLAN 774 tag of NF1 775 + CNtag - with flow ID from the CNtag of NF1 or zero if NF1 did 776 not have a CNtag 778 + CNM - message generated for NF1 780 5.4.3 TRILL Switch Egress Details 782 After decapsulation, processing of the decapsulated native frame is 783 the same as at any CN equipped output port. As discussed in Section 784 5.1, any CNtag present is stripped under some circumstances. If the 785 output queue is congested, then a native CNM may be generated in 786 response to the decapsulated native frame. This native CNM will then 787 be treated as if it had been received on the port. 789 6. Management Considerations 791 ---TBD--- 793 7. IANA Considerations 795 This document requires no IANA actions. This section should be 796 deleted by the RFC Editor before publication. 798 8. Security Considerations 800 See [RFC6325] for general RBridge Security Considerations. 802 ---more TBD--- 804 9. References 806 Normative and informational references for this document are given 807 below. 809 9.1 Normative References 811 [802.1AB] - IEEE, "IEEE Standard for Local and metropolitan area 812 networks / Station and Media Access Control Connectivity 813 Discovery", IEEE 802.1AB-2009, 17 September 2009. 815 [802.1Q] - IEEE, "IEEE Standard for Local and metropolitan area 816 networks / Virtual Bridged Local Area Networks", IEEE 817 802.1Q-2011, May 2011. 819 [RFC2119] - Bradner, S., "Key words for use in RFCs to Indicate 820 Requirement Levels", BCP 14, RFC 2119, March 1997 822 [RFC6325] - Perlman, R., Eastlake 3rd, D., Dutt, D., Gai, S., and A. 823 Ghanwani, "Routing Bridges (RBridges): Base Protocol 824 Specification", RFC 6325, July 2011. 826 9.2 Informative References 828 [IS-IS] - ISO/IEC, "Intermediate system to Intermediate system 829 routeing information exchange protocol for use in conjunction 830 with the Protocol for providing the Connectionless-mode Network 831 Service (ISO 8473)", ISO/IEC 10589:2002. 833 [802.1Qaz] - IEEE, "Draft Standard for Local and Metropolitan Area 834 Networks / Virtual Bridged Local Area Networks / Amendment XX: 835 Enhanced Transmission Selection for Bandwidth Sharing Between 836 Traffic Classes", IEEE Std 802.1Qaz-2011, June 2011. 838 [802.1Qbb] - IEEE, "Draft Standard for Local and Metropolitan Area 839 Networks / Virtual Bridged Local Area Networks / Amendment: 840 Priority-based Flow Control", IEEE Std 802.1Qbb-2011, June 841 2011. 843 [802.3] IEEE, "IEEE Standard for Information technology / 844 Telecommunications and information exchange between systems / 845 Local and metropolitan area networks / Specific requirements 846 Part 3: Carrier sense multiple access with collision detection 847 (CSMA/CD) access method and physical layer specifications", 848 IEEE 802.3-2008, 26 December 2008. 850 [802.3bd] - IEEE 802.3, "Draft Standard for Information technology / 851 Telecommunications and information exchange between systems / 852 Local and Metropolitan Area Networks / Specific requirements 853 Part 3: Carrier Sense Multiple Access with Collision Detection 854 (CSMA/CD) Access Method and Physical Layer Specifications / 855 Amendment: MAC Control Frame for Priority-based Flow Control", 856 IEEE Std 802.3bd-2011, June 2011. 858 [FCoE] - http://fcoe.com/ 860 [RFC793] - Postel, J., "Transmission Control Protocol", STD 7, RFC 861 793, September 1981 863 Version History 865 Changes from -00 to -01. 867 Minor editorial changes. 869 Changes from -01 to -02 871 1. Update for IETF draft which is now an RFC. 873 2. Update for all referenced 802.1 drafts that have become 802.1 874 standards including the rolling of 802.1Qau into 802.1Q-2011. 876 3. Editorial changes. 878 Changes from -02 to -03 880 Updates Author Info, version, and date. 882 Changes from -03 to -04 884 1. Update to take into account the incorporation of IEEE 802.1Qau 885 into 802.1Q and that adoption of the 802.1Qaz and 802.1Qau 886 standards. 888 2. Change most occurrences of RBridge to TRILL or TRILL switch. 890 3. Minor editorial changes. 892 Authors' Addresses 894 Donald Eastlake 3rd 895 Huawei R&D USA 896 155 Beaver Street 897 Milford, MA 01757 USA 899 Tel: +1-508-333-2270 900 Email: d3e3e3@gmail.com 902 Manoj Wadekar 903 QLogic Corporation 904 26650 Aliso Viejo Pkwy 905 Aliso Viejo, CA 92656 USA 907 Tel: +1-949-389-6000 908 Email: manoj.wadekar@qlogic.com 910 Anoop Ghanwani 911 Dell 912 350 Holger Way 913 San Jose, CA 95134 USA 915 Phone: +1-408-571-3500 916 Email: anoop@alumni.duke.edu 918 Puneet Agarwal 919 Broadcom 921 Phone: +1-949-926-5000 922 Email: pagarwal@broadcom.com 924 Tal Mizrahi 925 Marvell 926 6 Hamada Street 927 Yokneam, 20692 Israel 929 Email: talmi@marvell.com 930 Copyright, Disclaimer, and Additional IPR Provisions 932 Copyright (c) 2012 IETF Trust and the persons identified as the document 933 authors. All rights reserved. 935 This document is subject to BCP 78 and the IETF Trust's Legal Provisions 936 Relating to IETF Documents (http://trustee.ietf.org/license-info) in 937 effect on the date of publication of this document. Please review these 938 documents carefully, as they describe your rights and restrictions with 939 respect to this document. Code Components extracted from this document 940 must include Simplified BSD License text as described in Section 4.e of 941 the Trust Legal Provisions and are provided without warranty as 942 described in the Simplified BSD License. The definitive version of an 943 IETF Document is that published by, or under the auspices of, the IETF. 944 Versions of IETF Documents that are published by third parties, 945 including those that are translated into other languages, should not be 946 considered to be definitive versions of IETF Documents. The definitive 947 version of these Legal Provisions is that published by, or under the 948 auspices of, the IETF. Versions of these Legal Provisions that are 949 published by third parties, including those that are translated into 950 other languages, should not be considered to be definitive versions of 951 these Legal Provisions. For the avoidance of doubt, each Contributor to 952 the IETF Standards Process licenses each Contribution that he or she 953 makes as part of the IETF Standards Process to the IETF Trust pursuant 954 to the provisions of RFC 5378. No language to the contrary, or terms, 955 conditions or rights that differ from or are inconsistent with the 956 rights and licenses granted under RFC 5378, shall have any effect and 957 shall be null and void, whether published or posted by such Contributor, 958 or included with or in such Contribution.