idnits 2.17.1 draft-mlevy-ixp-jumboframes-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 14, 2011) is 4545 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 1981 (Obsoleted by RFC 8201) Summary: 1 error (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Levy 3 Internet-Draft Hurricane Electric 4 Intended status: Informational November 14, 2011 5 Expires: May 17, 2012 7 Jumbo Frame Deployment at Internet Exchange Points (IXPs) 8 draft-mlevy-ixp-jumboframes-00.txt 10 Abstract 12 This document provides guidelines on how to deploy Jumbo Frame 13 support on Internet Exchange Points (IXP). Jumbo Frame support 14 allows packets larger than 1,500 Bytes to be passed between IXP 15 customers over the IXPs layer 2 fabric. This document describes 16 methods to enable Jumbo Frame support and keep in place existing 17 1,500 Byte communications. 19 This document strongly recommends that IXP operators choose 9,000 20 Bytes for their Jumbo Frame implementation. 22 Status of this Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on May 17, 2012. 39 Copyright Notice 41 Copyright (c) 2011 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Defining MTU values . . . . . . . . . . . . . . . . . . . 3 58 1.2. Jumbo Frames . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.3. IXPs . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 1.4. IP Backbones . . . . . . . . . . . . . . . . . . . . . . . 5 61 1.5. IP Traffic today . . . . . . . . . . . . . . . . . . . . . 5 62 1.6. NRENs and Jumbo Frames . . . . . . . . . . . . . . . . . . 6 63 1.7. Requirements Language . . . . . . . . . . . . . . . . . . 6 64 2. The Property of an IXPs Switch Fabric . . . . . . . . . . . . 6 65 3. MTU Size Considerations . . . . . . . . . . . . . . . . . . . 7 66 3.1. Jumbo Frame size recommendation . . . . . . . . . . . . . 8 67 3.2. Jumbo Frame size example router configurations . . . . . . 9 68 3.3. Jumbo Frame size limitations . . . . . . . . . . . . . . . 10 69 3.4. Consistent MTU Sizes . . . . . . . . . . . . . . . . . . . 10 70 4. Methods of coordinating MTU changes or adding a larger MTU 71 values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 72 5. Changing MTU using a Flag-Day approach . . . . . . . . . . . . 12 73 6. Testing customer MTU values . . . . . . . . . . . . . . . . . 12 74 6.1. MTU Testing Example . . . . . . . . . . . . . . . . . . . 13 75 7. Customer affecting issues . . . . . . . . . . . . . . . . . . 14 76 8. Addressing Plans . . . . . . . . . . . . . . . . . . . . . . . 14 77 8.1. IPv4/IPv6 Addressing Plans . . . . . . . . . . . . . . . . 14 78 8.2. VLAN Numbering Plans . . . . . . . . . . . . . . . . . . . 15 79 9. IXPs Operating Route Server Configuration . . . . . . . . . . 16 80 10. Known issues for IXPs to consider . . . . . . . . . . . . . . 16 81 10.1. PMTU (Path MTU) issues . . . . . . . . . . . . . . . . . . 17 82 10.2. IXP Customer BGP sessions . . . . . . . . . . . . . . . . 18 83 10.3. IXP Operator Service Level Agreements (SLAs) . . . . . . . 18 84 11. Customer Requirements outside of the IXP operator's control . 18 85 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 86 13. Security Considerations . . . . . . . . . . . . . . . . . . . 19 87 14. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 88 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 89 15.1. Normative References . . . . . . . . . . . . . . . . . . . 20 90 15.2. Informative References . . . . . . . . . . . . . . . . . . 20 91 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 93 1. Introduction 95 The standard Maximum Transmission Unit (MTU) value, for IP packets 96 encapsulated within an Ethernet frame, is 1,500 Bytes. This is 97 described in RFC 894 [RFC894] and RFC 1042 [RFC1042]. 99 The specific size of a Jumbo Frame is not defined by the IEEE. Many 100 sizes can be chosen depending on the hardware vendor or hardware 101 platform. This document strongly recommends that IXP operators 102 choose 9,000 Bytes for their Jumbo Frame implementation. 104 1.1. Defining MTU values 106 All MTU sizes, including the default 1,500 Byte size, refers to the 107 IP packet/payload size vs. the full Ethernet frame size. The 108 standard Ethernet frame size is 1,514 Bytes (1,500 + 6 + 6 + 2) or 109 1,518 (1,500 + 6 + 6 + 4 + 2) Bytes depending on the use of IEEE 110 802.1Q (VLAN) tags [IEEE802_1Q]. The Preamble and CRC lengths are 111 not used in the count. 113 Non IEEE 802.1Q Enabled 114 +--------+----+---+---------+-----/ /----+---+ 115 |Preamble|Dest|Src|EtherType| IP Payload |CRC| 116 +--------+----+---+---------+-----/ /----+---+ 117 \ \ 118 \ \ 119 \ \ 120 \ \ 121 \ \ 122 \ \ 123 \ \ 124 \ \ 125 +--------+----+---+---------+-------+-----/ /----+---+ 126 |Preamble|Dest|Src|EtherType|VLAN ID| IP Payload |CRC| 127 +--------+----+---+---------+-------+-----/ /----+---+ 128 IEEE 802.1Q Enabled 130 All sizes listed within this document references the IP payload 131 portion of the Ethernet frame only. 133 1.2. Jumbo Frames 135 Jumbo Frames are considered to be Ethernet frames that can carry an 136 IP payload greater than 1,500 Bytes [MATHIS2002] [SAUVER2003]. Jumbo 137 Frames are sometimes called "Giant Jumbo", "Mini Jumbo" or "Baby 138 Jumbo" [TULYU2011]. This document recommends the use of the wording 139 "Jumbo Frame" as the terminology within the IXP industry. This 140 document only uses the wording "Jumbo Frame" to represent a frame 141 capable of transporting a payload above 1,500 Bytes MTU. 143 If customers require end-to-end Jumbo Frame support and an IXP within 144 the path only provides 1,500 Byte MTU connections, then the end-to- 145 end provided Path MTU (PMTU) can only be 1,500 Bytes. This document 146 recommends ways for IXP operators to provide networks with Jumbo 147 Frame support and potentially allowing larger end-to-end PMTU. 149 Additional protocols that exceed 1,500 Byte MTU are "FCoE", "iSCSI", 150 "MPLS", "IEEE 802.1AS", "IEEE 802.3AE", etc. None are applicable to 151 the IXP industry. 153 1.3. IXPs 155 An Internet Exchange Points (IXP) is a layer 2 service allowing one 156 network to communicate with one or more networks over a shared 157 fabric. These days an IXP is normally built using high availability 158 Ethernet switches and historically provided the IEEE defined default 159 Ethernet Maximum Transmission Unit (MTU) size of 1,500 Bytes for each 160 port. 162 As the Internet has grown, both in geography and speed, IXPs has 163 mainly stuck to 1,500 Byte MTU size. A study done in 2008 of the 164 peering community showed interest in larger MTU peering 165 [HANKINS2008]. 167 +----------+---------------+---------------+----------------------+ 168 | IXP | Location | Provided MTU | Comments | 169 +----------+---------------+---------------+----------------------+ 170 | AMS-IX | Amsterdam, NL | 1,500 | Untagged ports | 171 | Any2 | US | 1,500 | Untagged ports | 172 | DE-CIX | Frankfurt, DE | 1,500 | Untagged ports | 173 | Equinix | US & others | 1,500 | Untagged ports | 174 | HKIX | Hong Kong, HK | 1,500 | Untagged ports | 175 | JPIX | Tokyo, JP | 1,500 | Untagged ports | 176 | JPNAP | Tokyo, JP | 1,500 | Untagged ports | 177 | LINX | London, UK | 1,500 | Untagged ports | 178 | NASA-AIX | Palo Alto, US | 1,500 & 9,000 | Two VLANs on request | 179 | NETNOD | Stockholm, SE | 1,500 & 4,470 | Two VLANs by default | 180 | Telx TIE | US | 1,500 | Untagged ports | 181 +----------+---------------+---------------+----------------------+ 183 Table 1: IXP MTU sizes 185 There is no extensive study of IXP operators and MTU values. This is 186 just a minimal review to show it exists. 188 1.4. IP Backbones 190 Some IP backbones have implemented larger MTU sizes on backbone links 191 [NANOG2008]; however, it's safe to say that nearly every broadband 192 user is connected at 1,500 Byte MTU size, or less. Broadband or 193 dialup connections using PPPoE are configured at 1,492 Bytes. See 194 RFC 2516 [RFC2516]. 196 The same limitation of 1,500 Bytes can be said for most sources of 197 content. (CITATION NEEDED) 199 Allowing end-to-end system to communicate with larger MTUs can reduce 200 end-system CPU usage, provide less per-packet overhead and improve 201 TCP performance [NANOG2003] [Internet2_LSR]. Applications that do 202 mass data transfer (backups, replication, NNTP, etc) benefit from 203 larger MTU paths. 205 VPNs that require MTU sizes of 1,500 Bytes could use larger MTU paths 206 to handle the additional header bytes. Presently VPNs provide a 207 smaller end-to-end MTU size. 209 There's not expected to be much value to VoIP traffic, simple DNS 210 requests or other similar protocols that nearly always send small 211 packets. (DNS zone transfers could use larger packets). Operating 212 on a larger MTU Path should have no adverse affect on the end-to-end 213 communications. 215 1.5. IP Traffic today 217 It's acknowledged that a majority of Internet traffic today uses 218 small MTU size packets. A study of IP traffic at the AMS-IX IXP in 219 Amsterdam showed the following breakdown [TULYU2011]. 221 +-------------------+---------+---------+---------+---------+ 222 | Size | Current | Average | Maximum | Minimum | 223 +-------------------+---------+---------+---------+---------+ 224 | 0 - 63 Bytes | 0.0% | 0.0% | 0.0% | 0.0% | 225 | 64 - 127 Bytes | 41.2% | 41.1% | 45.7% | 38.7% | 226 | 128 - 255 Bytes | 3.5% | 3.4% | 4.9% | 2.8% | 227 | 256 - 511 Bytes | 2.1% | 1.9% | 2.2% | 1.6% | 228 | 512 - 1023 Bytes | 2.7% | 2.5% | 2.8% | 2.1% | 229 | 1023 - 1513 Bytes | 28.8% | 27.8% | 29.4% | 24.8% | 230 | 1514 Bytes | 21.8% | 23.3% | 26.1% | 21.5% | 231 | > 1514 Bytes | 0.0% | 0.0% | 0.0% | 0.0% | 232 +-------------------+---------+---------+---------+---------+ 234 Weekly Graph - 25 October 2011 to 1 November 2011 (Note: This table 235 is shown in Ethernet frame sizes, ie: 14 Bytes greater than IP MTU) 237 Table 2: AMS-IX Frame Size Distrubution 239 The AMS-IX IXP does not provide customer ports configured to anything 240 other than 1,500 Bytes; hence, today AMS-IX will never measure 241 traffic in the final row of this table. (ie: Above 1,500 Bytes IP MTU 242 size). It's safe to say that any IXP operating at the default 1,500 243 Byte MTU will never see packets above 1,500 Bytes. This means that 244 there's no way to measure the potential traffic until Jumbo Frames on 245 the IXP are enabled. 247 1.6. NRENs and Jumbo Frames 249 Research network (NRENs etc) have long-standing operational 250 experiences with Jumbo Frame enabled networks. They have taken the 251 time to test and deploy larger MTU sized networks globally [JET2007] 252 [SUMMERHILL2003]. 254 1.7. Requirements Language 256 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 257 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 258 document are to be interpreted as described in RFC 2119 [RFC2119]. 260 2. The Property of an IXPs Switch Fabric 262 An IXP configuration can vary dramatically. It can be a very simple 263 switch without monitoring or it can be a multi-site multi-terabit 264 infrastructure with 24/7 NOC support and extensive portal support for 265 network customers. 267 This document only addresses Ethernet based IXPs (which is today the 268 near de-facto technology). Ethernet ports can be configured in two 269 ways: 271 a. Untagged ports with all traffic destined for the shared fabric. 273 b. Tagged ports with traffic controlled by a Virtual LAN (VLAN) 274 identifier. Frames are placed into whatever configured virtual 275 fabric the switch is configured with. This could include some 276 configurations where only two customer ports communicate 277 privately. 279 Customers connecting to an IXP need to be operating in the correct 280 tagged or untagged mode. Untagged packets sent into a tagged port 281 will not propagate. This should be considered part of an IXPs 282 standard customer configuration review and install testing process. 284 This document assumes that the IXP is operating a hardware platform 285 that can provide its customers with a large MTU service. Most modern 286 hardware provides support for Jumbo Frames. 288 If an IXP can only operate at 1,500 Byte MTU, then this document is 289 not appropriate till the IXP upgrades the hardware platform. 291 Its quite possible that an existing IXP is operating today with an 292 MTU value above 1,500 Bytes; but has never told its customers. This 293 is not recommended; but is known to work. It is not recommended that 294 customers take advantage of this without the coordination of the IXP 295 operator. See below. 297 3. MTU Size Considerations 299 The default payload MTU on Ethernet is 1,500 Bytes. This is defined 300 by the IEEE 802 specification. There is normally no configuration 301 required by network or IXP operators to ensure that clean 302 communications is provided to interconnected networks (IXP customer- 303 to-customer communications). All Ethernet hardware operates at 1,500 304 Byte MTU, including switches, routers, servers, end-user computers, 305 etc. 307 Jumbo Frame support is provided by many hardware vendors and some 308 non-Ethernet based systems also have greater than 1,500 Byte MTUs. 309 As IP packets can be transported by many different media types 310 (Ethernet, Token rings or FDDI rings, POS, Radio links, VPNs, 311 Tunnels, etc), the IP protocol can handle nearly any MTU side. 313 IXPs mainly use Ethernet fabrics and layer 2 communications on 314 Ethernet fabrics require matching MTU sizes. 316 Adding support for Jumbo Frames means that a higher MTU value needs 317 to be picked. 319 1,500 Bytes This is the default from the IEEE 802 specifications. 321 4,352 Bytes FDDI as defined in RFC 1390 [RFC1390]. 323 4,470 Bytes SONET POS links along with older switches use this. 325 9,000 Bytes Less than an absolute maximum value; but a number that's 326 easy to remember [JET2007]. 328 9,170 Bytes Used by some hardware. 330 9,174 Bytes Used by some hardware. Used by CERN. 332 9,180 Bytes Used by some hardware. Used by Internet2/Abilene 333 Backbone [SUMMERHILL2003], CalREN, etc. 335 9,192 Bytes Used by some hardware. 337 9,216 Bytes Used by some hardware. 339 An extensive study of Jumbo Frame sizes can be found in a 340 presentation by Joe St Sauver in 2003 [SAUVER2003]. 342 The MTU size picked needs to also address the potential of a frame 343 being transported via an encapsulation protocol that reduces overall 344 frame size. Encapsulation could exist within the transport from the 345 router to the IXP and reduce the customers MTU. This means that 346 using the absolute maximum value of the hardware platform could cause 347 issues for customers. 349 The IXP operators can choose from many hardware vendors. There's no 350 industry standard for an exact Jumbo Frame size; so it varies by 351 vendor and sometimes even by platform. Add to that, an IXP operator 352 and can configure the fabric to nearly any size below their hardware 353 maximum. 355 3.1. Jumbo Frame size recommendation 357 It's RECOMMENDED that Jumbo Frames are defined as 9,000 Bytes. 359 The choice of 9,000 Bytes is based on experience at the Networking 360 and Information Technology Research and Development (NITRD) - Large 361 Scale Network (LSN) Joint Engineering Team (JET) community [JET2007]. 362 It's considered to be an easy to recall number and hence reduces 363 misconfiguration. 365 If an IXP operator is going to introduce a Jumbo Frame service, it's 366 RECOMMENDED that they pick 9,000 Bytes. Smaller numbers are not 367 useful anymore (the 4,470 value is a legacy value). While values 368 substantially over 9,000 Bytes may be supported by some vendors, 369 support for substantialy larger values is incomplete at best. 371 9,000 Bytes easily provides support for a TCP or UDP payload of 8,192 372 Bytes. Protocols like NFS and iSCSI use 8,192 Bytes for data as this 373 matches multiples of physical disk sector sizes along with CPU 374 virtual memory mapping systems. 376 The value 9,100 Bytes SHOULD NOT be used as this can not be supported 377 by all hardware (even if it's also an easy number to recall). 379 3.2. Jumbo Frame size example router configurations 381 Cisco example. 383 ! 384 interface gigabitethernet 1/1 385 mtu 9216 386 ip mtu 9000 387 ipv6 mtu 9000 388 ! 390 ! 391 interface vlan 1000 392 mtu 9216 393 ip mtu 9000 394 ipv6 mtu 9000 395 ! 397 Juniper example. 399 interface xe-0/1/0 400 mtu 9000 401 unit 0 402 family inet 403 mtu 9000 404 family inet6 405 mtu 9000 407 Brocade/Foundry example. 409 ! 410 default-max-frame-size 9216 411 ! 412 interface ve 81 413 ip mtu 9000 414 ipv6 mtu 9000 415 ! 417 3.3. Jumbo Frame size limitations 419 There is a maximum to the size of an Ethernet frame as long at its 420 represented within the link layer size field. Hardware design 421 normally dictates that a memory buffer needs to be reserved or 422 configured into hardware of a specific size. This usually limits the 423 maximum size of a packet. 425 Every Ethernet frame has a calculated CRC value to make sure the data 426 does not get a bit-level error. With the size of the CRC used by the 427 IEEE 802 Ethernet specifications it's not clear than frames larger 428 than approximately 9,000 Bytes are well protected. Updates to the 429 IEEE 802 specification to implement larger CRCs could allow 430 protection of larger frames; however this subject is outside of the 431 scope of this document. 433 Jumbo frame links that are surrounded by standard MTU valued links 434 will never be used by end-to-end communications. For example a 9,000 435 Byte MTU link surrounded by 1,500 Byte MTU links will never see a 436 packet greater than 1,500 Bytes pass via the IXP. 438 --------- --------- --------- --------- 439 ---| RTR-A |-1,500-| RTR-B |--9,000--| RTR-C |-1,500-| RTR-D |--- 440 --------- --------- --------- --------- 442 This could simply be put down to future-proofing a network link. In 443 fact many IP backbones operate with 4,470 Byte or ~9,000 Byte long- 444 haul links without any detrimental issues, even if customer only see 445 a 1,500 Byte end-to-end service. 447 3.4. Consistent MTU Sizes 449 A maximum sized packet can be sent from a device with a smaller MTU 450 to a device with a larger MTU; however a larger MTU device can't send 451 to a smaller MTU device. A frame sent that's larger than the 452 receivers MTU will produce an incoming error. 454 A vast majority of Ethernet users have never experienced this issue, 455 as it's unique to the Jumbo Frame configurations. Users have simply 456 lived with the default 1,500 Byte packet size preconfigured on each 457 and every device. 459 When two devices communicate over a shared fabric, it's important 460 that both entities have the same MTU value. On an IXP fabric where 461 all peering networks are using the default MTU value of 1,500 Bytes, 462 there's no issue with communications. Should a network configure a 463 different MTU value than other devices on a shared fabric, there's a 464 possibility of a packet not being received by the destination device. 466 That means IXP operator have to coordinate with every customer any 467 change to the fabrics MTU. If an additional MTU is provided it must 468 be keep on different hardware-platform, specific ports or specific 469 VLANs. 471 4. Methods of coordinating MTU changes or adding a larger MTU values 473 Various methods exist for IXPs to operate with more than one MTU 474 value. 476 a. Provide two untagged ports, one with the de-facto MTU of 1,500 477 Byte packets and one for the larger MTU value. The IXP fabric 478 should be configured so the two different MTUs are kept seperate. 479 This assumes the IXP and customer has additional network ports to 480 support the larger MTU. Billing for additional ports is not 481 within the scope of this document. 483 b. Add a duplicate IXP hardware platform configured with the larger 484 MTU value. With this configuration the two different MTU values 485 never touch. This assumes the IXP operator has additional 486 hardware for the new fabric and that the customer has additional 487 network ports to connect to that new IXP fabric. This assumes 488 the IXP operator has additional space and power for the new 489 fabric; along with the additonal operational overhead required. 490 Billing for additional fabric and ports is not within the scope 491 of this document. 493 c. Coordinate a specific cutover date/time and have all IXP 494 customers reconfigure at that cutover time. Customers that don't 495 reconfigure will run the risk of loosing operational abilities. 496 This also assumes that every customer has network hardware 497 capable of the larger MTU value. This is not a recommended 498 solution as it removes support for 1,500 Byte MTU communications. 500 d. Add a second IP range on the existing switch fabric dedicated for 501 the larger MTU range and coordinate a time to increase all switch 502 interfaces to the larger MTU size. Existing 1,500 Byte MTU 503 communications can continue as-is using the existing IP range. 505 New larger MTU communications can use a new IP range. It's 506 unclear this configuration works in the real world as MTU values 507 are defined by port or virtual port vs. by IP. This is not a 508 configuration recommended by this document. 510 e. Provide each customer a tagged port with one VLAN setup for 1,500 511 Byte MTU services and another VLAN setup for the larger MTU 512 service. Existing customers, who want to implement Jumbo Frame 513 support, can choose a cutover time to move from untagged to 514 tagged ports. Existing 1,500 Byte MTU sessions will continue on 515 a VLAN on that tagged port. New customers can be enabled with 516 tagged ports at service delivery time. This is the configuration 517 recommended by this document. 519 All methods require coordination with the customer to verify 520 configuration correctness. All methods assumes the IXP operator has 521 the additonal operational overhead required to support this offering. 522 IXPs that presently use quarantine ports or VLANs already have 523 processes in place to verify new customers are configured correctly. 524 Providing Jumbo Frame support requires the customer to adjust their 525 configuration and be in-sync with the IXP configuration. 527 Whatever method is chosen; it's in the interest of the IXP and it's 528 customers to encourage customers to enable Jumbo Frame support. 530 5. Changing MTU using a Flag-Day approach 532 IXP operators can assign a flag-day to coordinate a change to the MTU 533 value. This requires communications and coordination with all 534 customers. It also assumes all customers on that fabric are capable 535 of Jumbo Frames. 537 One advantage of a flag-day is that it allows the IXP provider to 538 remove legacy setups rather than support them forever. 540 This is also needed if a current Jumbo Frame enabled VLAN is being 541 updated from one size Jumbo Frame to a different one (e.g., from 542 4,470 bytes to 9,000 bytes). 544 6. Testing customer MTU values 546 An IXP operator can test the customer port MTU setting via a simple 547 ping [PING] packet. ICMP filtering on the customers router could 548 impeed this testing. Assuming the test host is connected via a large 549 MTU size path to the IXP, the testing setup can check each customer 550 port to confirm the MTU configuration is correct. 552 To use a ping packet with IPv4 you are required to set the DF bit. 553 For IPv6 there's no fragmentation during transmission of packets, 554 it's only done at the host level. If you use a server for testing, 555 then the "ping6 -m" (or equivalent option) should be used to control 556 the kernel packet processing and force no fragmentation at the packet 557 level. 559 Assuming the customer responds to an ICMP ping packet, then a ping 560 with a incrementing packet size will measure the customer-configured 561 MTU value. Commands like tracepath or tracepath6 [TRACEPATH] can be 562 used for these tests. 564 It's important that the IXP provider has each-and-every customer 565 setup with the identical MTU value. 567 6.1. MTU Testing Example 569 A existing IXP did a review of it's Jumbo Frame enabled customers. 570 The IXP has a 4,470 Byte MTU VLAN and had informed all its customers 571 to operate at 4,470 Bytes MTU. 573 +----------+--------------+----------+-----------------------+ 574 | Customer | Measured MTU | Correct? | Works? | 575 +----------+--------------+----------+-----------------------+ 576 | Most | 4,470 | Yes | Yes | 577 | Cust-X | 1,500 | No | Incorrect! | 578 | Cust-Y | 4,484 | No | Incorrect (but works) | 579 | Cust-Z | 9,000 | No | Incorrect (but works) | 580 +----------+--------------+----------+-----------------------+ 582 Data from testing on a Jumbo Frame enabled IXP 584 Table 3: Testing IXP customers 586 The customer responding with the 1,500 Byte MTU should be having 587 operational issues with other peers at that IXP. Any packet greater 588 than 1,500 Bytes sent towards that customer port will be dropped. A 589 small MTU router can send a packet to a large MTU router; however, if 590 a large MTU router sends a packet to a small MTU router and that 591 packet is greater than the receiver MTU; then the packet will be 592 dropped by the receiver with a layer 2 framing error. The customer 593 operating with an MTU of 4,484 or 9,000 Bytes may have it's IP MTU 594 set at 4,470 Bytes and hence operate correctly. Or they may just be 595 lucky and never see a large packet flow across their links. 597 Further investigating showed that with at least one IP router 598 platform there's a Maximum Receive Unit (MRU) size on the Ethernet 599 interfaces that's based on the physical interfaces memory size. This 600 allows inbound packets that are larger than the MTU setting. In the 601 case of a ping packet with the DF bit set, the response is fragmented 602 to match the routers MTU. 604 7. Customer affecting issues 606 Customers may not like changes within the IXP setup. IXP operators 607 have various choices when it comes to implementing Jumbo Frames. 609 a. Decide to completely ignore the requirement and define the IXP as 610 a 1,500 Byte MTU only IXP. 612 b. Decide to implement Jumbo Frames at the point when the IXP 613 operator announces and creates the IXP (this assumes we are 614 talking about a new IXP). 616 c. Allow customer to pick how they connect to the IXP. Customer can 617 choose to connect with only one port and only one MTU size, from 618 two or more ports (untagged) each set and allowing access to the 619 MTU values operated by the IXP, one single port (tagged) allowing 620 access to the MTU values operated by the IXP or some other method 621 specific to the IXP. 623 d. For IXP operators that allow for private VLAN between customers, 624 the MTU value should be defined and if the IXP implements Jumbo 625 Frames, then the value should be communicated to the customers at 626 each port associated with the private VLAN. 628 There's no need to provide each customer with the same setup; 629 however, operational issues should be addressed if customer 630 configuration is not consistent. Clear documentation and 631 provisioning process will be required. 633 8. Addressing Plans 635 Adding support for Jumbo Frames within an IXP could require 636 additional addressing schemes for layer 2 and layer 3. This assumes 637 the existing 1,500 Byte MTU customer-connection stays. 639 8.1. IPv4/IPv6 Addressing Plans 641 Technically a large MTU path between two networks could be parallel 642 to the same connection as a standard 1,500 Byte MTU. If that is the 643 case, then it's useful for the IXP operator to provide a different IP 644 network range; but using a similar IP addressing schemes for each 645 path. This means that if a specific prefix is used for an IPv4 /24 646 or an IPv6 /64 allocated to an exchange fabric with the rest of the 647 address allocated to the customer; then the same final part of the 648 address should be used for the large MTU connection. For example 649 repeat the last octet if it's an IPv4 address or the last 64 bits 650 with an IPv6 address. 652 For example, if the IXP used 192.0.2.0/24 (or 2001:DB8:10::/64) today 653 and has 198.51.100.0/24 (or 2001:DB8:11::/64) allocated for the new 654 Jumbo Frame services; then: 656 192.0.2.NN for customer NN 658 198.51.100.NN for customer NN on Jumbo Frame service 660 Or for IPv6: 662 2001:DB8:10::NN for customer NN 664 2001:DB8:11::NN for customer NN on Jumbo Frame service 666 The goal is to make sure that customers always communicate with 667 customers setup with a like MTU value. 669 It's noted that IXP operators will have to acquire additional IP 670 space for the Jumbo Frame network addressing. This is left outside 671 the scope of this document. 673 8.2. VLAN Numbering Plans 675 If the IXP operator provides tagged ports to implement different MTU 676 values; then the operator should allocate VLAN numbers that are 677 compatible with the customer base. 679 IXP operators can choose to: 681 a. Some IXP hardware platforms will require the same VLAN number to 682 be used for all customer ports. 684 b. Some will allow the VLAN number to be set on a per-port per- 685 customer basis. 687 Allowing the VLAN to be set on a per-port per-customer basis could 688 cause confusion and/or provisioning issues. This is for the IXP 689 operator to decide. 691 Customers may have limited choices on their VLAN configuration. Some 692 customer hardware platforms do not allow the same VLAN number to be 693 used for different purposes on the same router. 695 IXP operators should consider coordinating with other IXP operators 696 in their region so the VLAN numbers are not overlapping. 698 The IXP operator can choose an arbitrary VLAN numbers from the IEEE 699 802.1Q [IEEE802_1Q] specification range. VLAN number 0 and 4,095 are 700 reserved, as per the specification. VLAN number 1 is used by many 701 platforms to denote the default VLAN and hence should also be 702 avoided. 704 The IEEE 802.1ad [IEEE802_1AD] Provider Bridges standard, commonly 705 called Q-in-Q, is not applicable to IXP operators implementing Jumbo 706 Frames. 708 9. IXPs Operating Route Server Configuration 710 If a route server is provided by the IXP operator on the 1,500 Byte 711 MTU fabric, then another instance of the route server has to operate 712 on the Jumbo Frame MTU fabric and be configured with the correct 713 Jumbo Frame MTU. Hence the Jumbo Frame route server hardware needs 714 to support Jumbo Frames on it's Ethernet interface. 716 It's important that a customer network is never provided a next hop 717 that's on a port that would drop an incorrectly sized packet. 719 BGP sessions have the possibility of using larger MSS and MTU sizes 720 when a peering session is initiated. The ability to choose a 721 different MSS is very dependent on the configuration each side of the 722 BGP configuration. For IXPs that implement Jumbo Frames on their 723 route servers; they should report the negotiated MSS size for each 724 BGP session. 726 10. Known issues for IXPs to consider 728 Increasing the MTU size has a cost at the network layer. These 729 issues should be considered by the IXP operation for performance, 730 reliability, cost and operational issues. 732 a. As stated above, it's not clear that frames larger than 733 approximately 9,000 Bytes are well-protected by the existing IEEE 734 802 checksum method. IXP operators that measure error counters 735 on interfaces should consider providing customers access to their 736 port error statistics (along with their traffic statistics). 738 b. Jumbo Frames do not have a defined size by the IEEE and hence the 739 strong recommendation that IXP operators choose 9,000 Bytes for 740 their Jumbo Frame implementation. It's true that each IXP can 741 choose a different number; however, consistency amongst IXP 742 operators will be a plus. 744 c. IXP operators should understand that a larger MTU packet will 745 potentially require additional transmission time and buffer 746 memory. Packets may have a larger packet delay and potentially a 747 different or greater jitter value. 749 d. IXP operators should realize that any mis-configured customer-to- 750 customer communications, with disparate MTU values, will have a 751 potential of failing without any useful reporting at the IP or 752 layer 4 level. No PMTU (Path MTU) packet will be generated 753 should a large MTU packet be sent to a port configured with a 754 smaller MTU. 756 e. Jumbo Frame support is not intended to change existing end-to-end 757 packet communications if the end-nodes are configured at 1,500 758 Byte MTU (or lower). Only end-to-end communications where a 759 larger MTU path exists along the whole source to destination path 760 will take advantage of IXPs with larger MTUs. 762 IXPs should consider recommending existing and new customers enable 763 the larger MTU connection along with the existing 1,500 Byte 764 connections as this provides a potential larger MTU should an end-to- 765 end packet require it. 767 This document does not address how an IXP will present these issues 768 to its customers or charge for any mitigation of these issues. 770 In order to encourage the deployment of Jumbo Frames, it's 771 recommended that IXP operators only charge customers if there is a 772 physical difference in their offering. 774 10.1. PMTU (Path MTU) issues 776 The IP protocol has two Path MTU Discovery (PMTU) mechanisms to 777 handle packets traveling along a path with varying MTU values for 778 various links in the path. 780 The IPv4 Path MTU Discovery protocol, RFC 1191 [RFC1191], is 781 considered often NOT to work. See RFC 2923 [RFC2923] [SAUVER2003]. 782 In IPv6, Path MTU Discovery protocol, RFC 1981 [RFC1981], is 783 considered to work. 785 However neither the IPv4 or IPv6 PMTU methods will work if the layer 786 2 fabric has a mismatched value. 788 10.2. IXP Customer BGP sessions 790 IXP Customers setup BGP session via an IXP to enable inter-customer 791 routing. For Jumbo Frame enabled IXPs the customers can setup one 792 session or more than one session depending on the MTU match between 793 the two customers. 795 +--------------------+--------------------+------------------------+ 796 | Customer-A MTU | Customer-B MTU | Choices | 797 +--------------------+--------------------+------------------------+ 798 | 1,500 Byte | 1,500 Byte | Can only do 1,500 Byte | 799 | 1,500 Byte | 9,000 Byte | Can't communicate | 800 | 1,500 Byte | 9,000 & 1,500 Byte | Can only do 1,500 Byte | 801 | 9,000 Byte | 1,500 Byte | Can't communicate | 802 | 9,000 & 1,500 Byte | 1,500 Byte | Can only do 1,500 Byte | 803 | 9,000 Byte | 9,000 Byte | Can only do 9,000 Byte | 804 | 9,000 & 1,500 Byte | 9,000 & 1,500 Byte | Can do one or both | 805 +--------------------+--------------------+------------------------+ 807 Table 4: BGP session setup for IXP customers 809 If the two customers at on both the 1,500 Byte and 9,000 Byte 810 fabrics; then special care should be taken by the IXP customers to 811 confirm their path prefers the 9,000 Byte fabric. This is done so 812 the advantages of the Jumbo Frame fabric will be realized. 814 This can be done by only enabling the Jumbo Frame BGP session or by 815 keeping the 1,500 Byte BGP session active; but with a lower priority 816 so the routes prefer the next-hop associated with the Jumbo Frame 817 fabric. 819 IXP customers should note that an extra BGP session will require 820 additional BGP resources; but provide resilience should the Jumbo 821 Frame fabric fail for any reason. 823 Outside of the IXPs general operating rules, the BGP session 824 configuration is not within the control of the IXP. 826 10.3. IXP Operator Service Level Agreements (SLAs) 828 This document does not state if an IXP operator has to change its SLA 829 to handle Jumbo Frames. That's within the control of the IXP 830 operator. 832 11. Customer Requirements outside of the IXP operator's control 834 Many Customers may opt to implement Jumbo Frame services from an IXP, 835 even if they never will send a packet greater than 1,500 Bytes. The 836 IXP operator should not discourage this behavior as it could be 837 considered as future-proofing their network. 839 If the IXP has a higher charge for Jumbo Frames and a customer 840 decides to accept those additional charges; but never send a large 841 packet, then this is also acceptable. The customer is allowed to do 842 anything they want, within technical reason. 844 Customers may have requirement from their own customer-base to 845 provide where possible end-to-end large MTU services even if their 846 customer-base never sends a large packet. This is very hierarchal 847 nature of the Internet and is not the concern of the IXP operator as 848 long as the IXP operator is satisfied with the service level they are 849 providing. 851 12. IANA Considerations 853 This memo includes no request to IANA. 855 13. Security Considerations 857 The support of Jumbo Frames at IXPs doesn't have any direct impact on 858 Internet infrastructure security. 860 If there was a security issue related to using Jumbo Frames then 861 providing Jumbo Frame support within IXPs simply extends the 862 potential source location of that thread. Firewalling, filtering or 863 protection at any point on the path does not change when Jumbo Frames 864 on IXPs is provided. 866 It's possible that security monitoring facilities should be upgraded 867 to be tolerant of and handle Jumbo Frames. Existing hardware may 868 only capture and report on packets up to 1,500 Byte. 870 14. Acknowledgements 872 I would like to thank the encouragement and many contributions I 873 received from people with large MTU experience. Bobby Cates (NASA), 874 Greg Hankins (Brocade, was Force10 [HANKINS2008]), Kurt-Erik 875 Lindqvist (NETNOD). Peter Lothberg (STUPI and now DTAG), Kevin 876 Oberman (retired from ESnet), Joe St Sauver, Ph.D. (University of 877 Oregon [SAUVER2003]), Maksym Tulyu (AMS-IX [TULYU2011]) and Mathias 878 Wolkert (NETNOD). 880 A special thanks goes out to Selina Lo, whom in the late 90's 881 introduced me to the wonders of a working Ethernet Jumbo Frame 882 implementation. 884 I would also like to also thank the contributions from people with 885 extensive global peering experience: Andy Davidson (LoNAP & Hurricane 886 Electric), Roque Gagliano (Cisco), Mike Leber (Hurricane Electric) 887 and Doug Wilson (Yahoo!). 889 15. References 891 15.1. Normative References 893 [RFC1042] Postel, J. and J. Reynolds, "Standard for the transmission 894 of IP datagrams over IEEE 802 networks", STD 43, RFC 1042, 895 February 1988. 897 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 898 November 1990. 900 [RFC1390] Katz, D., "Transmission of IP and ARP over FDDI Networks", 901 STD 36, RFC 1390, January 1993. 903 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 904 for IP version 6", RFC 1981, August 1996. 906 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 907 Requirement Levels", BCP 14, RFC 2119, March 1997. 909 [RFC2516] Mamakos, L., Lidl, K., Evarts, J., Carrel, D., Simone, D., 910 and R. Wheeler, "A Method for Transmitting PPP Over 911 Ethernet (PPPoE)", RFC 2516, February 1999. 913 [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", 914 RFC 2923, September 2000. 916 [RFC894] Hornig, C., "A Standard for the Transmission of IP 917 Datagrams over Ethernet Networks", RFC 894, April 1984. 919 15.2. Informative References 921 [HANKINS2008] 922 Hankins, G., Provo, R., and T. Scholl, "Peering Survey 923 2008 Results", May 2008, . 927 [IEEE802_1AD] 928 "802.1ad - Provider Bridges", May 2006, 929 . 931 [IEEE802_1Q] 932 "802.1Q - Virtual LANs", November 2006, 933 . 935 [Internet2_LSR] 936 "Internet2 Land Speed Record", November 2011, 937 . 939 [JET2007] "Recommendation on IP MTU for the JET community", 940 April 2007, . 943 [MATHIS2002] 944 Mathis, M., "Raising the Internet MTU", November 2002, 945 . 947 [NANOG2003] 948 Cottrell, L., "Achieving Record Speed TransAtlantic End- 949 to-end TCP Throughput", June 2003, . 952 [NANOG2008] 953 Scholl, T., "NANOG42 - Increasing the MTU of the 954 Internet", February 2008, . 957 [PING] "ping, ping6 - send ICMP ECHO_REQUEST to network hosts", 958 November 2007, 959 . 961 [SAUVER2003] 962 St Sauver, J., "Practical Issues Associated With 9K MTUs", 963 February 2003, 964 . 966 [SUMMERHILL2003] 967 Summerhill, R., ""Jumbo" Frames and Internet2", 968 February 2003, . 971 [TRACEPATH] 972 Kuznetsov, A., "tracepath, tracepath6 - traces path to a 973 network host discovering MTU along this path", 974 November 2007, 975 . 977 [TULYU2011] 978 Tulyu, M., "Jumbo Frames in AMS-IX version 0.3", 979 November 2011, . 982 Author's Address 984 Martin J. Levy 985 Hurricane Electric 986 760 Mission Court 987 Fremont, CA 94359 988 US 990 Phone: +1 510 580-4100 991 Email: martin@he.net 992 URI: http://he.net/