idnits 2.17.1 draft-zhang-trill-mtu-negotiation-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC6325, updated by this document, for RFC5378 checks: 2006-05-11) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (August 14, 2014) is 3540 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'ISIS' Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Mingui Zhang 3 Intended Status: Standards Track Xudong Zhang 4 Updates: 6325 Donald Eastlake 5 Huawei 6 Radia Perlman 7 Intel 8 Vishwas Manral 9 Ionos 10 Somnath Chatterjee 11 Cisco 12 Expires: February 15, 2015 August 14, 2014 14 TRILL IS-IS MTU Negotiation 15 draft-zhang-trill-mtu-negotiation-06.txt 17 Abstract 19 The base IETF TRILL protocol has a TRILL campus wide MTU feature, 20 specified in RFC 6325 and RFC 7177, that assures that link status 21 changes can be successfully flooded throughout the campus while being 22 able to take advantage of a campus wide capability to support jumbo 23 packets. This document specifies optional updates to that MTU feature 24 to take advantage, for appropriate link local packets, of link local 25 MTUs that exceed the TRILL campus MTU. In addition, it specifies an 26 efficient algorithm for local MTU testing. It updates RFC 6325. 28 Status of this Memo 30 This Internet-Draft is submitted to IETF in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF), its areas, and its working groups. Note that 35 other groups may also distribute working documents as 36 Internet-Drafts. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 The list of current Internet-Drafts can be accessed at 44 http://www.ietf.org/1id-abstracts.html 46 The list of Internet-Draft Shadow Directories can be accessed at 47 http://www.ietf.org/shadow.html 49 Copyright and License Notice 51 Copyright (c) 2014 IETF Trust and the persons identified as the 52 document authors. All rights reserved. 54 This document is subject to BCP 78 and the IETF Trust's Legal 55 Provisions Relating to IETF Documents 56 (http://trustee.ietf.org/license-info) in effect on the date of 57 publication of this document. Please review these documents 58 carefully, as they describe your rights and restrictions with respect 59 to this document. Code Components extracted from this document must 60 include Simplified BSD License text as described in Section 4.e of 61 the Trust Legal Provisions and are provided without warranty as 62 described in the Simplified BSD License. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.1. Conventions used in this document . . . . . . . . . . . . . 3 68 2. Link-Wide TRILL IS-IS MTU Size . . . . . . . . . . . . . . . . 3 69 3. Link MTU Size Testing . . . . . . . . . . . . . . . . . . . . . 5 70 4. Refreshing Campus-Wide Sz . . . . . . . . . . . . . . . . . . . 7 71 5. Relationship between Port MTU, Lz and Sz . . . . . . . . . . . 8 72 6. LSP Synchronization . . . . . . . . . . . . . . . . . . . . . . 8 73 7. Recommendations for Traffic Link MTU Size Testing . . . . . . . 8 74 8. Backwards Compatibility . . . . . . . . . . . . . . . . . . . . 9 75 9. Security Considerations . . . . . . . . . . . . . . . . . . . . 9 76 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 77 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 78 11.1. Normative References . . . . . . . . . . . . . . . . . . . 10 79 11.2. Informative References . . . . . . . . . . . . . . . . . . 10 80 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 12 82 1. Introduction 84 [RFC6325] describes the way how RBridges agree on the campus-wide 85 minimum acceptable inter-RBridge MTU (Maximum Transmission Unit) size 86 - the campus-wide Sz to ensure that link state flooding operates 87 properly and all RBridges converge to the same link state. For the 88 proper operation of TRILL IS-IS, all RBridges MUST format their LSPs 89 not greater than the campus-wide Sz. [RFC7177] defines the diagram of 90 state transitions of an adjacency. "Link MTU size is successfully 91 tested" is part of an event (A6) causing the transition from "2-way" 92 state to "Report" state for an adjacency. If MTU testing is enabled, 93 this part means the link MTU testing of size X succeeds, and X is 94 greater than or equal to the campus-wide Sz [RFC6325]. In other 95 words, if this link cannot support an MTU of the campus-wide Sz, it 96 will not be reported as part of the campus topology. 98 This document specifies a new value, "link-wide Lz" to represent the 99 link-wide minimum acceptable inter-RBridge MTU size for a specific 100 link. There are PDUs which are valid only to a local link, such as 101 CSNPs and PSNPs. These PDUs should be formatted not greater than the 102 link-wide Lz. Since link-wide Lz is normally greater than the campus- 103 wide Sz, link scope PDUs can therefore be optionally formatted 104 greater than the campus-wide Sz up to Lz. 106 An optional TRILL IS-IS MTU size testing algorithm is specified in 107 Section 3 to detail the MTU testing method described in Section 4.3.2 108 of [RFC6325] and in [RFC7177]. It's recommended to multicast the MTU- 109 probes when there are multiple RBridges on a link responding to the 110 probing with MTU-ack [RFC7177]. The testing method and rules of this 111 draft are devised in a way to minimize the tries of MTU-probing for 112 testing, which therefore reduces the number of multicast packets for 113 MTU testing. 115 1.1. Conventions used in this document 117 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 118 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 119 document are to be interpreted as described in RFC 2119 [RFC2119]. 121 2. Link-Wide TRILL IS-IS MTU Size 123 This draft specifies a new value "Lz" to represent the acceptable 124 inter-RBridge link MTU size on the local link. Link-wide Lz is the 125 minimum Lz supported by all RBridges on a specific link. If the link 126 is usable, Lz will be greater than or equal to the campus wide Sz 127 MTU. Some TRILL IS-IS PDUs are exchanged only between neighbors 128 instead of the whole campus. They should be confined by the link-wide 129 Lz instead of the campus-wide Sz. CSNPs and PSNPs are examples of 130 such PDUs. They are exchanged just on the link as part of LSP 131 synchronization. 133 [FS-LSP] defines the PDUs which support flooding scopes in addition 134 to area wide scope and domain wide scope. RBridges on a local link 135 that support Lz greater than Sz MUST support the L1 Circuit Scoped 136 (L1CS) flooding. They use that flooding to exchange their maximally 137 supportable value of "Lz". The smallest value of the Lz collected on 138 a link, but not less than Sz, is the link-wide Lz. 140 The maximum sized level 1 link-local PDU, such as PSNP or CSNP, which 141 may be generated by a system is controlled by the value of the 142 management parameter originatingL1SNPBufferSize. This value 143 determines Lz. The TRILL APPsub-TLV shown in Figure 2.1 SHOULD be 144 included in a GENINFO TLV [RFC6823] in an L1CS-LSP number zero. If it 145 is missing from a fragment zero L1CS-LSP or there is no fragment zero 146 L1CS-LSP, it is assumed that its originating IS is implicitly 147 advertising its originatingSNPBufferSize value as Sz octets. 149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 150 | Type | (2 byte) 151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 152 | Length | (2 byte) 153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 154 | originatingSNPBufferSize | (2 byte) 155 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 157 Figure 2.1: Lz is reported in the originatingSNPBufferSize TLV. 159 Type: set to originatingSNPBufferSize subTLV (TRILL APPsub-TLV type 160 0x0002). Two bytes because this APPsub-TLV appears in an Extended TLV 161 [FS-LSP]. 163 Length: set to 2. 165 originatingSNPBufferSize: the local value of 166 originatingL1SNPBufferSize, limited to 1470 to 65,535 bytes. 168 Lz:1800 Lz:1800 169 +---+ | +---+ 170 |RB1|(2000)-|-(2000)|RB2| 171 +---+ | +---+ 172 | 173 Lz:1800 | 174 +---+ +--+ 175 |RB3|(2000)-(1700)|B1| 176 +---+ +--+ 177 | 179 Figure 2.2: Link-wide Lz = 1800 v.s. tested link MTU size = 1700 181 Even if all RBridges on a specific link have reached consensus on the 182 value of link-wide Lz, it does not mean that these RBridges can 183 safely exchange PDUs between each other. Figure 2.2 shows such a 184 corner case. RB1, RB2 and RB3 are three RBridges on the same link and 185 their Lz is 1800, so the link-wide Lz of this link is 1800. There is 186 an intermediate bridge (say B1) between RB2 and RB3 whose port MTU 187 size is 1700. If RB2 sends PDUs formatted in chunk of size 1800, it 188 will be discarded by B1. 190 Therefore the link MTU size should be tested. After the link MTU size 191 of an adjacency is successfully tested, those link local PDUs such as 192 CSNP, PSNP and also L1CS-LSP will be formatted no greater than the 193 tested link MTU size and will be safely transmitted on this link. 195 As for campus-wide Sz, RBridges continue to propagate their 196 originatingL1LSPBufferSize across the campus through the 197 advertisement of LSPs as defined in Section 4.3.2 of [RFC6325]. The 198 smallest value of Sz advertised by any RBridge, but not less than 199 1470, will be deemed as the campus-wide Sz. Each RBridge should 200 format their "campus-wide" PDUs, for example LSPs, not greater than 201 what they believe to be the campus-wide Sz. 203 3. Link MTU Size Testing 205 [RFC7177] defines the event A6 as including "MTU test is successful" 206 if the MTU testing is enabled. As described in Section 4.3.2 of 207 [RFC6325], this is a combination of the following event and 208 condition. 210 Event: The link MTU size has been tested. 212 Condition: The link can support the campus-wide Sz. 214 This condition can be efficiently tested by the following "Binary 215 Search Algorithm" and rules. The MTU-probe and MTU-ack PDUs are 216 specified in Section 3 of [RFC7176]. 218 X, X1, and X2 are local integer variables. 220 Step 0: RB1 sends an MTU-probe padded to the size of link-wide Lz. 222 1) If RB1 successfully receives the MTU-ack from RB2 to the probe of 223 the value of link-wide Lz within k tries (where k is a 224 configurable parameter whose default is 3), then link MTU size is 225 set to the size of link-wide Lz and stop. 227 2) RB1 tries to send an MTU-probe padded to the size 1470. 229 a) If RB1 fails to receive an MTU-ack from RB2 after k tries, RB1 230 sets the "failed minimum MTU test" flag for RB2 in RB1's Hello 231 and stop. 233 b) Link MTU size <-- 1470, X1 <-- 1470, X2 <-- link-wide Lz, X <-- 234 [(X1 + X2)/2] (Operation "[...]" returns the fraction-rounded- 235 up integer.). Repeat Step 1. 237 Step 1: RB1 tries to send an MTU-probe padded to the size X. 239 1) If RB1 fails to receive an MTU-ack from RB2 after k tries, then: 241 X2 <-- X and X <-- [(X1 + X2)/2] 243 2) If RB1 receives an MTU-ack to a probe of size X from RB2 then: 245 link MTU size <-- X, X1 <-- X and X <-- [(X1 + X2)/2] 247 3) If X1 >= X2 or Step 1 has been repeated n times (where n is a 248 configurable parameter whose default value is 5), stop. Else 249 repeat Step 1. 251 MTU testing is only done in the Designated VLAN [RFC7177]. Since the 252 execution of the above algorithm can be resource consuming, it is 253 recommended that the DRB takes the responsibility to do the testing. 254 Multicast should be used instead of unicast when multiple RBridges 255 are desired to respond with MTU-ack on the link. The Binary Search 256 Algorithm is proposed here to minimize the tries of probing therefore 257 it reduces the number of multicast packets for MTU-probing. 259 The following rules are designed to determine whether the 260 aforementioned "Condition" holds. 262 RBridges have figured out the upper bound (X2) and lower bound (X1) 263 for the link MTU size from the execution of the above algorithm. If 264 the campus-wide Sz is smaller than the lower bound or greater than 265 the upper bound, RBridges can directly judge whether the link 266 supports the campus-wide Sz without MTU-probing. 268 (a) If X1 >= campus-wide Sz. This link can support campus-wide Sz. 270 (b) Else if X2 <= campus-wide Sz. This link cannot support campus- 271 wide Sz. 273 Otherwise, RBridges need to test whether the link can support campus- 274 wide Sz: 276 (c) X1 < campus-wide Sz < X2. RBridges need probe the link with MTU- 277 probe messages padded to campus-wide Sz. If an MTU-ack is 278 received within k tries, this link can support campus-wide Sz. 279 Otherwise, this link cannot support campus-wide Sz. Through this 280 test, the lower bound and upper bound of link MTU size can be 281 updated accordingly. 283 4. Refreshing Campus-Wide Sz 285 RBridges may join in or leave the campus, which may change the 286 campus-wide Sz. The following recommendations are specified for 287 refreshing the campus-wide Sz. 289 1) When a new RBridge joins in the campus and its 290 originatingL1LSPBufferSize is smaller than current campus-wide Sz, 291 reporting its originatingL1LSPBufferSize in its LSPs will cause 292 other RBridges decrease their campus-wide Sz. Then the LSPs in the 293 campus will be resized to be no greater than the new campus-wide 294 Sz. 296 2) When an RBrige leaves the campus and its 297 originatingL1LSPBufferSize is equal to the campus-wide Sz, its 298 LSPs are purged from the remaining campus after reaching MaxAge 299 [ISIS]. The campus-wide Sz may be recalculated and may be 300 increased. In other words, while RB1 normally ignores link state 301 information for any IS-IS unreachable [RFC7180] RBridge RB2, 302 originatingL1LSPBufferSize is an exception. Its value, even from 303 IS-IS unreachable RBridges, is used in determining Sz. 305 Frequent LSP "resizing" is harmful to the stability of the TRILL 306 campus, so it should be dampened. Within the two kinds of resizing 307 actions, only the upward resizing will be dampened. When an upward 308 resizing event happens, a timer is set (this is a configurable 309 parameter whose default value is 300 seconds). Before this timer 310 expires, all subsequent upward resizing will be dampened. Of course, 311 in a well-configured campus with all RBridges configured to have the 312 same originatingL1LSPBufferSize, no resizing will be necessary. 314 If the refreshed campus-wide Sz is smaller than the lower bound or 315 greater than the upper bound of the tested link MTU size, the 316 resource consuming link MTU size testing can be avoided according to 317 rule (a) or (b) specified in Section 3. Otherwise, RBridges need to 318 test the link MTU size according to rule (c). But it's unnecessary to 319 perform the link MTU size testing algorithm all over again. 321 5. Relationship between Port MTU, Lz and Sz 323 When port MTU size is smaller than the local 324 originatingL1SNPBufferSize of an RBridge (sort of a wrong 325 configuration), this port should be explicitly disabled from the 326 TRILL campus. On the other hand, when an RBridge receives an LSP or 327 L1CS-LSP with size greater than the link-wide Lz or the campus-wide 328 Sz but not greater than its port MTU size, this LSP should be 329 processed rather than discarded. If the size of an LSP is greater 330 than the MTU size of a port over which it is to be propagated, no 331 attempt shall be made to propagate this LSP over the port and an 332 LSPTooLargeToPropagate alarm shall be generated [ISIS]. 334 6. LSP Synchronization 336 An RBridge participates in LSP synchronization on a link as soon as 337 it has at least one adjacency on that link that has advanced to at 338 least the 2-Way state [RFC7177]. On a LAN link, CSNP and PSNP PDUs 339 are used for synchronization. On a point-to-point link, only PSNP are 340 used. 342 The CSNPs and PSNPs MUST be formatted in chunks of size at most the 343 link-wide Lz but are processed normally if received larger than that. 344 Since the link MTU size may not have been tested in the 2-Way state, 345 link-wide Lz may be greater than the supported link MTU size. In that 346 case, a CSNP or PSNP may be discarded. After the link MTU size is 347 successfully tested, RBridges will begin to format these PDUs in the 348 size no greater than it, therefore these PDUs will finally 349 successfully get through. 351 Note that the link MTU size is probably greater than the campus-wide 352 Sz. Link local PDUs are formatted in the size of link MTU size rather 353 than the campus-wide Sz, which promises a reduction in the number of 354 PDUs and a faster LSP synchronization process. 356 7. Recommendations for Traffic Link MTU Size Testing 358 Campus-wide Sz and link-wide Lz are used to limit the size of most 359 TRILL IS-IS PDUs. They are different from the MTU size restricting 360 the size of TRILL data packets. The size of a TRILL data packet is 361 restricted by the physical MTU of the ports and links the packet 362 traverses. It is possible that a TRILL data packet successfully gets 363 through the campus but its size is greater than the campus-wide Sz or 364 link-wide Lz values. 366 The algorithm defined in link MTU size testing can also be used in 367 TRILL traffic MTU size testing, only that the link-wide Lz used in 368 that algorithm should be replaced by the port MTU of the RBridge 369 sending MTU probes. The successfully tested size X can be advertised 370 as an attribute of this link using MTU sub-TLV defined in [RFC7176]. 372 Unlike RBridges, end stations do not participate in the exchange of 373 ISIS PDUs of TRILL, therefore they can not grasp the traffic link MTU 374 size from a TRILL campus automatically. An operator may collect these 375 values using network management tools such as TRILL ping or 376 TraceRoute. Then the path MTU is set as the smallest tested link MTU 377 on this path and end stations should not generate frames that, when 378 encapsulated as TRILL Data packets, may exceed this path MTU. 380 8. Backwards Compatibility 382 There can be a mixture of Lz-ignorant and Lz-aware RBridges on a 383 link. This will act properly although it will not be as efficient as 384 it would be if all RBridges on the link are Lz-aware. 386 At the side of an Lz-aware RBridge, in case that link-wide Lz is 387 greater than campus-wide Sz, larger link-local TRILL IS-IS PDUs can 388 be sent out to gain efficiencies. Lz-ignorant RBridges as receivers 389 will have no problem to handle them since the 390 originatingL1LSPBufferSize value of these RBridges had been reported 391 and the link-wide Lz is not greater than that value. 393 At the side of an Lz-ignorant RBridge, TRILL IS-IS PDUs are always 394 formatted not greater than the campus-wide Sz. Lz-aware RBridges as 395 receivers can handle these PDUs since they cannot be greater than the 396 link-wide Lz. 398 An Lz-ignorant RBridge does not support the link MTU testing 399 algorithm defined in Section 3 but may be using some algorithm just 400 to test for Sz MTU on the link. In any case, if an RBridge per 401 [RFC6325] receives an MTU-probe, it MUST respond with an MTU-ack 402 padded to the same size as the MTU-probe. So the extension of TRILL 403 MTU negotiation with Lz, as specified in this document, is fully 404 backwards compatible. 406 9. Security Considerations 407 This document raises no new security issues for TRILL. For general 408 and adjacency related TRILL security considerations, see [RFC6325] 409 and [RFC7177]. 411 10. IANA Considerations 413 Similar as Section 7.2 of [ESADI], IANA is requested to create a new 414 subregistry (2 suggested) under the Generic Information TLV (#251) 415 [RFC6823] for TRILL originatingSNPBufferSize sub-TLV defined in 416 Section 2 of this document. 418 11. References 420 11.1. Normative References 422 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 423 Requirement Levels", BCP 14, RFC 2119, March 1997. 425 [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol 426 Specification", RFC 6325, July 2011. 428 [RFC7177] Eastlake 3rd, D., Perlman, R., Ghanwani, A., Yang, H., and 429 V. Manral, "Transparent Interconnection of Lots of Links 430 (TRILL): Adjacency", RFC 7177, May 2014. 432 [RFC7176] Eastlake 3rd, D., Senevirathne, T., Ghanwani, A., Dutt, D., 433 and A. Banerjee, "Transparent Interconnection of Lots of 434 Links (TRILL) Use of IS-IS", RFC 7176, May 2014. 436 [FS-LSP] L. Ginsberg, S. Previdi and Y. Yang, "IS-IS Flooding Scope 437 LSPs", draft-ietf-isis-fs-lsp-02.txt, in RFC Ed Queue. 439 [RFC7180] D. Eastlake, M. Zhang, et al., "TRILL: Clarifications, 440 Corrections, and Updates", draft-ietf-trill-clear-correct- 441 06.txt, in RFC Editor's queue. 443 [RFC6823] Ginsberg, L., Previdi, S., and M. Shand, "Advertising 444 Generic Information in IS-IS", RFC 6823, December 2012. 445 11.2. Informative References 447 [ISIS] ISO, "Intermediate system to Intermediate system routeing 448 information exchange protocol for use in conjunction with 449 the Protocol for providing the Connectionless-mode Network 450 Service (ISO 8473)," ISO/IEC 10589:2002. 452 [ESADI] H. Zhai, F. Hu, et al., "TRILL (Transparent Interconnection 453 of Lots of Links): ESADI (End Station Address Distribution 454 Information) Protocol", draft-ietf-trill-esadi-09.txt, in 455 RFC Ed Queue. 457 Author's Addresses 459 Mingui Zhang 460 Huawei Technologies 461 No.156 Beiqing Rd. Haidian District, 462 Beijing 100095 P.R. China 464 Email: zhangmingui@huawei.com 466 Xudong Zhang 467 Huawei Technologies 468 No.156 Beiqing Rd. Haidian District, 469 Beijing 100095 P.R. China 471 Email: zhangxudong@huawei.com 473 Donald E. Eastlake, 3rd 474 Huawei Technologies 475 155 Beaver Street 476 Milford, MA 01757 USA 478 Phone: +1-508-333-2270 479 Email: d3e3e3@gmail.com 481 Radia Perlman 482 Intel 483 2200 Mission College Blvd. 484 Santa Clara, CA 95054-1549 USA 486 Phone: +1-408-765-8080 487 EMail: radia@alum.mit.edu 489 Vishwas Manral 490 Ionos 491 4100 Moorpark Ave. 492 San Jose, CA 95117 USA 494 EMail: vishwas@ionosnetworks.com 496 Somnath Chatterjee 497 Cisco Systems 499 Email: somnath.chatterjee01@gmail.com