| < draft-ietf-bier-path-mtu-discovery-11.txt | draft-ietf-bier-path-mtu-discovery-12.txt > | |||
|---|---|---|---|---|
| BIER Working Group G. Mirsky | BIER Working Group G. Mirsky | |||
| Internet-Draft Ericsson | Internet-Draft Ericsson | |||
| Intended status: Standards Track T. Przygienda | Intended status: Standards Track T. Przygienda | |||
| Expires: 7 April 2022 Juniper Networks | Expires: 7 October 2022 Juniper Networks | |||
| A. Dolganow | A. Dolganow | |||
| Individual contributor | Individual contributor | |||
| 4 October 2021 | 5 April 2022 | |||
| Path Maximum Transmission Unit Discovery (PMTUD) for Bit Index Explicit | Path Maximum Transmission Unit Discovery (PMTUD) for Bit Index Explicit | |||
| Replication (BIER) Layer | Replication (BIER) Layer | |||
| draft-ietf-bier-path-mtu-discovery-11 | draft-ietf-bier-path-mtu-discovery-12 | |||
| Abstract | Abstract | |||
| This document describes Path Maximum Transmission Unit Discovery | This document describes Path Maximum Transmission Unit Discovery | |||
| (PMTUD) in Bit Indexed Explicit Replication (BIER) layer. | (PMTUD) in Bit Indexed Explicit Replication (BIER) layer. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 7 April 2022. | This Internet-Draft will expire on 7 October 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
| extracted from this document must include Simplified BSD License text | extracted from this document must include Revised BSD License text as | |||
| as described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
| provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Revised BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.1. Conventions used in this document . . . . . . . . . . . . 3 | 1.1. Conventions used in this document . . . . . . . . . . . . 2 | |||
| 1.1.1. Acronyms . . . . . . . . . . . . . . . . . . . . . . 3 | 1.1.1. Terminology . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.1.2. Requirements Language . . . . . . . . . . . . . . . . 3 | 1.1.2. Requirements Language . . . . . . . . . . . . . . . . 3 | |||
| 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. PMTUD Mechanism for BIER . . . . . . . . . . . . . . . . . . 4 | 3. PMTUD Mechanism for BIER . . . . . . . . . . . . . . . . . . 4 | |||
| 3.1. Data TLV for BIER Ping . . . . . . . . . . . . . . . . . 6 | 3.1. Data TLV for BIER Ping . . . . . . . . . . . . . . . . . 6 | |||
| 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | |||
| 6. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 7 | 6. Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 7.1. Normative References . . . . . . . . . . . . . . . . . . 7 | 7.1. Normative References . . . . . . . . . . . . . . . . . . 7 | |||
| 7.2. Informative References . . . . . . . . . . . . . . . . . 8 | 7.2. Informative References . . . . . . . . . . . . . . . . . 7 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 1. Introduction | 1. Introduction | |||
| In packet switched networks, when a host seeks to transmit data to a | In packet switched networks, when a host seeks to transmit data to a | |||
| target destination, the data is transmitted as a set of packets. In | target destination, the data is transmitted as a set of packets. In | |||
| many cases, it is more efficient to use the largest size packets that | many cases, it is more efficient to use the largest size packets that | |||
| are less than or equal to the least Maximum Transmission Unit (MTU) | are less than or equal to the least Maximum Transmission Unit (MTU) | |||
| for any forwarding device along the routed path to the IP destination | for any forwarding device along the routed path to the IP destination | |||
| for these packets. Such "least MTU" is known as Path MTU (PMTU). | for these packets. Such "least MTU" is known as Path MTU (PMTU). | |||
| Fragmentation or packet drop, silent or not, may occur on hops along | Fragmentation or packet drop, silent or not, may occur on hops along | |||
| the route where an MTU is smaller than the size of the datagram. To | the route where an MTU is smaller than the size of the datagram. To | |||
| avoid any of the listed above behaviors, the packet source must find | avoid any of the listed above behaviors, the packet source must find | |||
| the value of the least MTU, i.e., PMTU, that will be encountered | the value of the least MTU, i.e., PMTU, that will be encountered | |||
| along the route that a set of packets will follow to reach the given | along the route that a set of packets will follow to reach the given | |||
| set of destinations. Such MTU determination along a specific path is | set of destinations. Such MTU determination along a specific path is | |||
| referred to as path MTU discovery (PMTUD). | referred to as path MTU discovery (PMTUD). | |||
| [RFC8279] introduces and explains Bit Index Explicit Replication | [RFC8279] introduces and explains Bit Index Explicit Replication | |||
| (BIER) architecture and how it supports the forwarding of multicast | (BIER) architecture and how it supports the forwarding of multicast | |||
| data packets. A BIER domain consists of Bit-Forwarding Routers | data packets. [I-D.ietf-bier-ping] introduced BIER Ping as a | |||
| (BFRs) that are uniquely identified by their respective BFR-ids. An | transport-independent OAM mechanism to detect and localize failures | |||
| ingress border router (acting as a Bit Forwarding Ingress Router | in the BIER data plane. This document specifies how BIER Ping can be | |||
| (BFIR)) inserts a Forwarding Bit Mask (F-BM) into a packet. Each | used to perform efficient PMTUD in the BIER domain. | |||
| targeted egress node (referred to as a Bit Forwarding Egress Router | ||||
| (BFER)) is represented by Bit Mask Position (BMP) in the BMS. A | ||||
| transit or intermediate BIER node, referred to as BFR, forwards BIER | ||||
| encapsulated packets to BFERs, identified by respective BMPs, | ||||
| according to a Bit Index Forwarding Table (BIFT). | ||||
| 1.1. Conventions used in this document | 1.1. Conventions used in this document | |||
| 1.1.1. Acronyms | 1.1.1. Terminology | |||
| BFR: Bit-Forwarding Router | ||||
| BFER: Bit-Forwarding Egress Router | ||||
| BFIR: Bit-Forwarding Ingress Router | ||||
| BIER: Bit Index Explicit Replication | ||||
| BIFT: Bit Index Forwarding Tree | ||||
| F-BM: Forwarding Bit Mask | ||||
| MTU: Maximum Transmission Unit | ||||
| OAM: Operations, Administration and Maintenance | ||||
| PMTUD: Path MTU Discovery | This document uses terminology defined in [RFC8279]. Familiarity | |||
| with this specification and the terminology used is expected. | ||||
| 1.1.2. Requirements Language | 1.1.2. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
| "OPTIONAL" in this document are to be interpreted as described in BCP | "OPTIONAL" in this document are to be interpreted as described in BCP | |||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | 14 [RFC2119] [RFC8174] when, and only when, they appear in all | |||
| capitals, as shown here. | capitals, as shown here. | |||
| 2. Problem Statement | 2. Problem Statement | |||
| skipping to change at page 4, line 5 ¶ | skipping to change at page 3, line 30 ¶ | |||
| primarily targeted to work on point-to-point, i.e. unicast paths. | primarily targeted to work on point-to-point, i.e. unicast paths. | |||
| These mechanisms use packet fragmentation control by disabling | These mechanisms use packet fragmentation control by disabling | |||
| fragmentation of the probe packet. As a result, a transient node | fragmentation of the probe packet. As a result, a transient node | |||
| that cannot forward a probe packet that is bigger than its link MTU | that cannot forward a probe packet that is bigger than its link MTU | |||
| sends to the packet source an error notification, otherwise the | sends to the packet source an error notification, otherwise the | |||
| packet destination may respond with a positive acknowledgment. Thus, | packet destination may respond with a positive acknowledgment. Thus, | |||
| possibly through a series of iterations, varying the size of the | possibly through a series of iterations, varying the size of the | |||
| probe packet, the packet source discovers the PMTU of the particular | probe packet, the packet source discovers the PMTU of the particular | |||
| path. | path. | |||
| Thus applied such existing PMTUD solutions are inefficient for point- | Applying such existing PMTUD solutions are inefficient for point-to- | |||
| to-multipoint paths constructed for multicast traffic. Probe packets | multipoint paths constructed for multicast traffic. Probe packets | |||
| must be flooded through the whole set of multicast distribution paths | must be flooded through the whole set of multicast distribution paths | |||
| over and over again until the very last egress responds with a | over and over again until the very last egress responds with a | |||
| positive acknowledgment. Consider without loss of generality an | positive acknowledgment. Consider the multicast network presented in | |||
| example multicast network presented in Figure 1, where MTU on all | Figure 1, where MTU on all links but one (B, D) is the same. If MTU | |||
| links but one (B, D) is the same. If MTU on the link (B, D) is | on the link (B, D) is smaller than the MTU on the other links, using | |||
| smaller than the MTU on the other links, using existing PMTUD | existing PMTUD mechanism probes will unnecessarily flood to leaf | |||
| mechanism probes will unnecessary flood to leaf nodes E, F, and G for | nodes E, F, and G for the second and consecutive times and positive | |||
| the second and consecutive times and positive responses will be | responses will be generated and received by root A repeatedly. | |||
| generated and received by root A repeatedly. | ||||
| ----- | ----- | |||
| --| D | | --| D | | |||
| ----- / ----- | ----- / ----- | |||
| --| B |-- | --| B |-- | |||
| / ----- \ ----- | / ----- \ ----- | |||
| / --| E | | / --| E | | |||
| ----- / ----- | ----- / ----- | |||
| | A |--- ----- | | A |--- ----- | |||
| ----- \ --| F | | ----- \ --| F | | |||
| skipping to change at page 5, line 5 ¶ | skipping to change at page 4, line 40 ¶ | |||
| to forward towards the subset of targeted downstream BFERs, the BFR | to forward towards the subset of targeted downstream BFERs, the BFR | |||
| responds with a partial (compared to the one it received in the | responds with a partial (compared to the one it received in the | |||
| request) bitmask towards the originating BFIR in error notification. | request) bitmask towards the originating BFIR in error notification. | |||
| That allows for retransmission of the next probe with a smaller MTU | That allows for retransmission of the next probe with a smaller MTU | |||
| address only towards the failed downstream BFERs instead of all BFERs | address only towards the failed downstream BFERs instead of all BFERs | |||
| addressed in the previous probe. In the scenario discussed in | addressed in the previous probe. In the scenario discussed in | |||
| Section 2 the second and all following (if needed) probes will be | Section 2 the second and all following (if needed) probes will be | |||
| sent only to the node D since MTU discovery of E, F, and G has been | sent only to the node D since MTU discovery of E, F, and G has been | |||
| completed already by the first probe successfully. | completed already by the first probe successfully. | |||
| [I-D.ietf-bier-ping] introduced BIER Ping as a transport-independent | ||||
| OAM mechanism to detect and localize failures in the BIER data plane. | ||||
| This document specifies how BIER Ping can be used to perform | ||||
| efficient PMTUD in the BIER domain. | ||||
| Consider the network displayed in Figure 1 to be a presentation of a | Consider the network displayed in Figure 1 to be a presentation of a | |||
| BIER domain and all nodes to be BFRs. To discover MTU over BIER | BIER domain and all nodes to be BFRs. To discover MTU over BIER | |||
| domain to BFERs D, F, E, and G BFIR A will use BIER Ping with Data | domain to BFERs D, F, E, and G BFIR A will use BIER Ping with Data | |||
| TLV, defined in Section 3.1. Size of the first probe set to M_max | TLV, defined in Section 3.1. Size of the first probe set to M_max | |||
| determined as minimal MTU value of BFIR's links to BIER domain. As | determined as minimal MTU value of BFIR's links to BIER domain. As | |||
| has been assumed in Section 2, MTUs of all links but the link (B, D) | has been assumed in Section 2, MTUs of all links but the link (B, D) | |||
| are the same. Thus BFERs E, F, and G would receive BIER Echo Request | are the same. Thus BFERs E, F, and G would receive BIER Echo Request | |||
| and will send their respective replies to BFIR A. BFR B may pass the | and will send their respective replies to BFIR A. BFR B may pass the | |||
| packet which is too large to forward over egress link (B, D) to the | packet which is too large to forward over egress link (B, D) to the | |||
| appropriate network layer for error processing where it would be | appropriate network layer for error processing where it would be | |||
| recognized as a BIER Echo Request packet. BFR B MUST send BIER Echo | recognized as a BIER Echo Request packet. BFR B MUST send BIER Echo | |||
| Reply to BFIR A and MUST include Downstream Mapping TLV, defined in | Reply to BFIR A and MUST include Downstream Mapping TLV, defined in | |||
| [I-D.ietf-bier-ping] setting its fields in the following fashion: | [I-D.ietf-bier-ping] setting its fields in the following fashion: | |||
| * MTU SHOULD be set to the minimal MTU value among all egress BIER | * MTU SHOULD be set to the minimal MTU value among all egress BIER | |||
| links, logical links between this and downstream BFRs, that could | links, logical links between this and downstream BFRs, that could | |||
| be used to reach B's downstream BFERs; | be used to reach B's downstream BFERs; | |||
| * Address Type MUST be set to 0 [Ed.note: we need to define 0 as | * Address Type MAY be set to any value defined in Section 3.3.4 | |||
| valid value for the Address Type field with the specific semantics | [I-D.ietf-bier-ping]. | |||
| to "Ignore" it.] | ||||
| * I flag MUST be cleared; | * I flag MUST be cleared to direct the responding BFR not to include | |||
| the Incoming SI-BitString TLV in the BIER Echo Response. | ||||
| * Downstream Interface Address field (4 octets) MUST be zeroed and | * Downstream Interface Address field MUST be zeroed. | |||
| MUST include in the Egress Bitstring sub-TLV the list of all BFERs | ||||
| that cannot be reached because the attempted MTU turned out to be | * List of Sub-TLVs MUST include the Egress Bitstring sub-TLV with | |||
| too small. | the list of all BFERs that cannot be reached because the egress | |||
| MTU turned out to be too small. | ||||
| The BFIR will receive either of the two types of packets: | The BFIR will receive either of the two types of packets: | |||
| * a positive Echo Reply from one of BFERs to which the probe has | * a positive Echo Reply from one of BFERs to which the probe has | |||
| been sent. In this case, the bit corresponding to the BFER MUST | been sent. In this case, the bit corresponding to the BFER MUST | |||
| be cleared from the BMS; | be cleared from the bitmask string (BMS); | |||
| * a negative Echo Reply with bit string listing unreached BFERs and | * a negative Echo Reply with bit string listing unreached BFERs and | |||
| recommended MTU value MTU'. The BFIR MUST add the bit string to | recommended MTU value MTU'. The BFIR MUST add the bit string to | |||
| its BMS and set the size of the next probe as min(MTU, MTU') | its BMS and set the size of the next probe as min(MTU, MTU') | |||
| If upon expiration of the Echo Request timer BFIR didn't receive any | If a negative Echo Reply is received, the BFIR MUST wait for the | |||
| Echo Replies, then the size of the probe SHOULD be decreased. There | expiration of the Echo Request before transmitting the updated Echo | |||
| are scenarios when an implementation of the PMTUD would not decrease | Request. If upon expiration of the Echo Request timer BFIR didn't | |||
| the size of the probe. For example, suppose upon expiration of the | receive any Echo Replies, then the size of the probe SHOULD be | |||
| Echo Request timer BFIR didn't receive any Echo Reply. In that case, | decreased. There are scenarios when an implementation of the PMTUD | |||
| BFIR MAY continue to retransmit the probe using the initial size and | would not decrease the size of the probe. For example, suppose upon | |||
| MAY apply probe delay retransmission procedures. The algorithm used | expiration of the Echo Request timer BFIR didn't receive any Echo | |||
| to delay retransmission procedures on BFIR is outside the scope of | Reply. In that case, BFIR MAY continue to retransmit the probe using | |||
| this specification. The BFIR sends probes using BMS and locally | the initial size and MAY apply probe delay retransmission procedures. | |||
| defined retransmission procedures until either the bit string is | The algorithm used to delay retransmission procedures on BFIR is | |||
| clear, i.e., contains no set bits, or until the BFIR retransmission | outside the scope of this specification. The BFIR sends probes using | |||
| procedure terminates and PMTU discovery is declared unsuccessful. In | BMS and locally defined retransmission procedures, but not more | |||
| the case of convergence of the procedure, the size of the last probe | frequently than after the Echo Request timer expired, until either | |||
| indicates the PMTU size that can be used for all BFERs in the initial | the bit string is clear, i.e., contains no set bits, or until the | |||
| BMS without incurring fragmentation. | BFIR retransmission procedure terminates and PMTU discovery is | |||
| declared unsuccessful. In the case of convergence of the procedure, | ||||
| the size of the last probe indicates the PMTU size that can be used | ||||
| for all BFERs in the initial BMS without incurring fragmentation. | ||||
| Thus we conclude that in order to comply with the requirement in | Thus we conclude that in order to comply with the requirement in | |||
| [I-D.ietf-bier-oam-requirements]: | [I-D.ietf-bier-oam-requirements]: | |||
| * a BFR SHOULD support PMTUD; | * a BFR SHOULD support PMTUD; | |||
| * a BFR MAY use defined per BIER sub-domain MTU value as initial MTU | * a BFR MAY use defined per BIER sub-domain MTU value as initial MTU | |||
| value for discovery or use it as MTU for this BIER sub-domain to | value for discovery or use it as MTU for this BIER sub-domain to | |||
| reach BFERs; | reach BFERs; | |||
| End of changes. 20 change blocks. | ||||
| 77 lines changed or deleted | 55 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||