idnits 2.17.1 draft-nandy-utkarsh-pim-mcast-path-mtu-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (January 18, 2021) is 1192 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RFC4664' on line 111 -- Looks like a reference, but probably isn't: 'RFC4665' on line 111 -- Looks like a reference, but probably isn't: 'RFC2119' on line 152 -- Looks like a reference, but probably isn't: 'RFC1191' on line 296 == Unused Reference: '2' is defined on line 286, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 289, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 291, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. '3' -- Possible downref: Non-RFC (?) normative reference: ref. '4' -- Possible downref: Non-RFC (?) normative reference: ref. '5' Summary: 0 errors (**), 0 flaws (~~), 5 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Tathagata Nandy 2 Intended Status: Proposed Standard HPE 3 Utkarsh Srivastava 4 HPE 5 Expires: 18 July 2021 January 18, 2021 7 Multicast Path MTU 8 draft-nandy-utkarsh-pim-mcast-path-mtu-00 10 Abstract 11 Path MTU discovery (rfc1191) is a standard technique to determine 12 the supported MTU between two Internet Protocol (IP) hosts to avoid 13 any fragmentation. In a multicast distribution tree, source will 14 not know where the receivers are located. So the technique used to 15 compute the path MTU for a unicast stream does not work in a 16 multicast network. This document describes a method to discover 17 multicast path MTU with the goal to avoid traffic loss. This 18 solution also aims to solve the problem of traffic loss in for 19 multicast streams because of incorrect MTU setting and no path MTU 20 support for multicast networks. 22 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six 32 months and may be updated, replaced, or obsoleted by other 33 documents at any time. It is inappropriate to use Internet-Drafts 34 as reference material or to cite them other than as "work in 35 progress." 37 This Internet-Draft will expire on 18 July 2021. 39 Copyright Notice 41 Copyright (c) 2020 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 46 license-info) in effect on the date of publication of this 47 document. Please review these documents carefully, as they 48 describe your rights and restrictions with respect to this 49 document. Code Components extracted from this document must include 50 Simplified BSD License text as described in Section 4.e of the 51 Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Conventions used in this document . . . . . . . . . . . . . 3 58 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . 4 59 4. Multicast Path MTU . . . . . . . . . . . . . . . . . . . . 5 60 5. Security Considerations . . . . . . . . . . . . . . . . . . 6 61 6. IANA considerations . . . . . . . . . . . . . . . . . . . . 6 62 7. References . . . . . . . . . . . . . . . . . . . . . . . . 7 63 7.1. Normative References . . . . . . . . . . . . . . . . . 7 64 7.2. Informative References . . . . . . . . . . . . . . . . 7 65 8. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 8 66 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 8 68 1. Introduction 69 When one IP host has a large amount of data to send to another 70 host, the data is transmitted as a series of IP datagrams. It is 71 usually preferable that these datagrams be of the largest size that 72 does not require fragmentation anywhere along the path from the 73 source to the destination. (For the case against fragmentation, 74 see [5].) This datagram size is referred to as the Path MTU (PMTU), 75 and it is equal to the minimum of the MTUs of each hop in the path. 76 A shortcoming of the current Internet protocol suite is the lack of 77 a standard mechanism for a host to discover the PMTU of an 78 arbitrary path. Note: The Path MTU is what in [1] is called the 79 "Effective MTU for sending" (EMTU_S). A PMTU is associated with a 80 path, which is a particular combination of IP source and 81 destination address and perhaps a Type-of-service (TOS). The 82 current practice [1] is to use the lesser of 576 and the first-hop 83 MTU as the PMTU for any destination that is not connected to the 84 same network or subnet as the source. In computer networking, 85 multicast is group communication where data transmission is 86 addressed to a group of destination computers simultaneously. 87 Multicast can be one-to-many or many-to-many distribution. 88 Multicast should not be confused with physical layer 89 point-to-multipoint communication. Ethernet frames with a value of 90 1 in the least-significant bit of the first octet of the 91 destination address are treated as multicast frames and are flooded 92 to all points on the network. This mechanism constitutes multicast 93 at the data link layer. This mechanism is used by IP multicast to 94 achieve one-to-many transmission for IP on Ethernet networks. 95 Modern Ethernet controllers filter received packets to reduce CPU 96 load, by looking up the hash of a multicast destination address in 97 a table, initialized by software, which controls whether a 98 multicast packet is dropped or fully received. IP multicast is a 99 technique for one-to-many communication over an IP network. The 100 destination nodes send Internet Group Management Protocol join and 101 leave messages, for example in the case of IPTV when the user 102 changes from one TV channel to another. Multicast uses network 103 infrastructure efficiently by requiring the source to send a packet 104 only once, even if it needs to be delivered to a large number of 105 receivers. The nodes in the network take care of replicating the 106 packet to reach multiple receivers only when necessary. 108 2. Conventions used in this document 109 2.1. Terminology 110 The reader is assumed to be familiar with the terminology, 111 reference models, and taxonomy defined in [RFC4664] and [RFC4665]. 112 For readability purposes, we repeat some of the terms here. 113 Moreover, we also propose some other terms needed when IP multicast 114 support is discussed. 116 Multicast domain 117 An area in which multicast data is transmitted. In this 118 document, this term has a generic meaning that can refer to 119 Layer-2 and Layer-3. Generally, the Layer-3 multicast domain is 120 determined by the Layer-3 multicast protocol used to establish 121 reachability between all potential receivers in the 122 corresponding domain. The Layer-2 multicast domain can be the 123 same as the Layer-2 broadcast domain (i.e., VLAN), but it may be 124 restricted to being smaller than the Layer-2 broadcast domain if 125 an additional control protocol is used. 127 PIM-SM 128 Protocol Independent Multicast Sparse Mode (PIM-SM) is a family 129 of multicast routing protocols for Internet Protocol (IP) 130 networks that provide one-to-many and many-to-many distribution 131 of data over a LAN, WAN or the Internet. It explicitly builds 132 unidirectional shared trees rooted at a rendezvous point (RP) 133 per group, and optionally creates shortest-path trees per 134 source. PIM-SM uses shared trees by default and implements 135 source-based trees for efficiency; it assumes that no hosts want 136 the multicast traffic unless they specifically ask for it. 137 Senders first send the multicast data to the RP, which in turn 138 sends the data down the shared tree to the receivers. 140 PIM-SSM 141 PIM source-specific multicast (SSM) uses a subset of PIM sparse 142 mode and IGMP version 3 (IGMPv3) to allow a client to receive 143 multicast traffic directly from the source. PIM SSM uses the 144 PIM sparse-mode functionality to create an SPT between the 145 receiver and the source, but builds the SPT without the help of 146 an RP. 148 2.2. Conventions 149 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 150 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in 151 this document are to be interpreted as described in [RFC2119]. 153 3. Problem Statement 154 3.1. Motivation 155 Path MTU discovery computes the lowest MTU supported between two 156 hosts to avoid IP fragmentation. For a unicast packet, source 157 device sends out a packet with Don't Fragment (DF) flag bit set in 158 the IP header [1]. Any device along the path whose MTU is 159 smaller than the packet will drop the packet and send back an ICMP 160 Packet Too Big (Type 2) message containing its MTU, allowing the 161 source host to reduce its Path MTU appropriately. The process is 162 repeated until the MTU is small enough to traverse the entire path 163 without fragmentation. In a multicast distribution tree, the 164 source does not know the host for a multicast group till the 165 complete multicast tree is built. Hosts in different branches of 166 the tree use IGMP/MLD followed by PIM to become part of the 167 multicast tree. Generally the process starts at the host where it 168 sends a request to become part of a multicast tree through IGMP 169 joins. The same request is sent to the RP and there by source and 170 group develop a common path. So the technique mentioned above may 171 not work for multicast flows. 173 3.2. Scalability 174 Most routers doesn't send ICMP (unreachable; fragmentation needed) 175 messages in response to too-big IPv4 multicast packets with DF-bit 176 set. They're just dropping these packets silently, breaking PMTUD. 177 This is a case of as-per-design feature and is updated in section 178 7.2 of RFC 1112 that an ICMP error message (Destination 179 Unreachable, Time Exceeded, Parameter Problem, Source Quench, or 180 Redirect) is never generated in response to a datagram destined to 181 an IP host group. The same document also describes why RFC 1112 182 prohibits sending ICMP error messages in response to multicast 183 datagrams. The processing done on ICMP error replies by the *nix 184 socket API might block the sender socket if an error comes back 185 from a single receiver or if TTL expires when traversing a 186 particularly long branch of the multicast tree, not exactly a good 187 idea in multicast environment. 189 4. Multicast Path MTU 190 The multicast Stream between a Source and a Host for a particular 191 Group uses the following path. 193 1. The Sender device connected Router, periodically sends probe 194 messages for a well-known Multicast Group that falls in the 195 PIM-SSM range. The probe packet here is nothing but small 196 packets whose destination IP falls in the SSM group range. 197 This should be a reserved IP and should not be used for 198 any other regular multicast stream. 200 2. The Probe packets are different from the actual packets that 201 the Source is sending. This algorithm runs on the Routers 202 and not on the actual Source sending the Stream. 204 3. The receiver Routers will also run periodic probing to the 205 Source(s). As part of the probe the receiving Routers will 206 run Path MTU protocol to the Source Device. The PMTU will 207 run only for Active Sources when they receive the Probe 208 packets. This is the reason, the Sender device needs to 209 send periodic probe packets. 211 4. This will be performed at all the Receiver Routers 212 (Designated Router). All these Receiver Routers would also 213 use the same Source which would be specifically reserved 214 for PMTU computation. This is the PIM SSM source for 215 the specified Group. 217 5. There are two options, one is the receiver Router 218 (Host Connected DR) themselves sending a PIM Join for 219 these Groups to the sources or optionally it can act 220 on this by receiving an IGMP v3 join. In the latter 221 case , the Host device need to send IGMP v3 joins to 222 the Sources for Computing Path MTU 224 6. The Receiver DR (Host Connected) would compute PMTU 225 to the Source by sending Probe packets of different 226 sizes. 228 7. Once the receiver Router has computed the PMTU to the 229 Source connected DR, the PMTU will be sent to the 230 Source Router via a new option in PIM Join packet or 231 a new type of PIM packet. A new ICMP packet is not 232 chosen for this as this algorithm is supposed to 233 run inside the PIM Application. 235 8. Once the Source Connected Designated Router receives 236 the PMTU for all the connected paths, it would compute 237 the minimum MTU and send it back to the Source device. 238 This takes away all the computation headache from the 239 Source Device. The Source device will get the periodic 240 MTU update from all the Routers and should never send 241 any packets with a MTU higher than this. The assumption 242 is that TCP/IP stack with ICMP packets is implemented 243 in all the Sources, so internally it can handle the 244 ICMP packets. 246 9. The probing packets sent by the sender device can be 247 of reduced frequency to prevent congestion 249 10. The receiver can keep sending the probe packets 250 as long as it has an intended Host. 252 5 IANA Considerations 253 This memo includes no request to IANA. 255 6 Security Considerations 256 This Path MTU Discovery mechanism makes possible two 257 denial-of-service attacks, both based on a malicious party sending 258 false Datagram Too Big messages to an Internet host. In the first 259 attack, the false message indicates a PMTU much smaller than 260 reality. This should not entirely stop data flow, since the victim 261 host should never set its PMTU estimate below the absolute minimum, 262 but at 8 octets of IP data per datagram, progress could be slow. 263 In the other attack, the false message indicates a PMTU greater 264 than reality. If believed, this could cause temporary blockage as 265 the victim sends datagrams that will be dropped by some router. 266 Within one round-trip time, the host would discover its mistake 267 (receiving Datagram Too Big messages from that router), but 268 frequent repetition of this attack could cause lots of datagrams to 269 be dropped. A host, however, should never raise its estimate of the 270 PMTU based on a Datagram Too Big message, so should not be 271 vulnerable to this attack. A malicious party could also cause 272 problems if it could stop a victim from receiving legitimate 273 Datagram Too Big messages, but in this case there are simpler 274 denial-of-service attacks available. In another case if the 275 packets are always rejected because of higher MTU and the sender 276 does not change the packet size or the admin does not adjust the 277 MTU, there is a risk of a DOS attack on the Switch sending the ICMP 278 Error packet. Multicast packet send at high rate can consume the 279 CPU resources of all the Routers implementing the PMTU for 280 Multicast. 282 7 References 283 7.1 Normative References 284 [1] J. Mogul, S. Deering. Path MTU Discovery. RFC 1191, DECWRL 285 and Stanford University, November, 1990. 286 [2] J. Postel, INTERNET CONTROL MESSAGE PROTOCOL. RFC 791, 287 ISI, September 1981. 288 7.2 Informative References 289 [3] 291 [4] 292 [5] 295 8 Acknowledgments 296 The authors thank the contributors of [RFC1191] and RFC{5501] since 297 the structure and content of this document were, for some sections, 298 largely inspired from it. The authors also thank Mark Pearson and 299 others for their valuable reviews and feedback. THIS SOFTWARE IS 300 PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY 301 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 302 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 303 PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR 304 CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 305 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 306 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF 307 USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 308 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 309 OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 310 OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 311 SUCH DAMAGE. 313 9 Authors' Addresses 314 Tathagata Nandy 315 Hewlett Packard India Software Operations Pvt. Ltd. 316 Survey # 192, Whitefield Road, 317 Mahadevapura Post, Bangalore 560048. India 318 Phone: (+91) 9611895857 319 EMail: tathagata.nandy@hpe.com 321 Utkarsh Srivastava 322 Hewlett Packard India Software Operations Pvt. Ltd. 323 Survey # 192, Whitefield Road, 324 Mahadevapura Post, Bangalore 560048. India 325 Phone: (+91) 8948794936 326 EMail: usrivastava@hpe.com