PWE3 Y(J). Stein Internet-Draft RAD Data Communications Intended status: Standards Track July 1, 2009 Expires: January 2, 2010 Ethernet PW Congestion Handling Mechanisms draft-stein-pwe3-ethpwcong-00.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 2, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Abstract Mechanisms for handling congestion in Ethernet pseudowires are presented. These mechanisms extend capabilities of the native service across the PSN, and require use of the PWE3 control word. Stein Expires January 2, 2010 [Page 1] Internet-Draft ethpwcong July 2009 Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Control Word Format . . . . . . . . . . . . . . . . . . . . . . 3 3. Drop Eligibility Indication . . . . . . . . . . . . . . . . . . 4 4. Explicit Congestion Notification . . . . . . . . . . . . . . . 5 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 6 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 6 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7.1. Normative References . . . . . . . . . . . . . . . . . . . 7 7.2. Informative References . . . . . . . . . . . . . . . . . . 7 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 7 Stein Expires January 2, 2010 [Page 2] Internet-Draft ethpwcong July 2009 1. Introduction Ethernet PWs do not presently have mechanisms for handling AC congestion. When the egress AC becomes congested, the egress PE will receive PAUSE (802.3x) frames or experiences back-pressure, denying it the capability of forwarding frames to the AC. This will result in the egress PE's output buffers filling up and eventually Ethernet frames will need to be discarded. Not only are such frames lost after precious PSN bandwidth has already been consumed, they are also discarded without regard to importance, priority, or fairness. If the Ethernet frames being transported are carrying TCP/IP traffic, then TCP rate cut-back will limit the traffic volume to some extent. However, the early discard that triggers the rate cut-back also results in packet retransmission, adding additional Ethernet PW traffic to be transported. When the Ethernet frames are not carrying TCP/IP, but rather UDP/IP, or any other non-TCP/IP traffic that does not react to packet discard by cutting back the transmission rate, the situation is potentially worse. The native Ethernet service handles congestion by causing the sender to stop sending frames. On full duplex links this is accomplished by the congested receiver sending PAUSE frames. On half-duplex networks this is accomplished by the congested receiver introducing back- pressure. In either case the effect is that the sender stops forwarding frames until the receiver is once again ready to process them, thus eliminating congestion. Ethernet PWs do not transport received congestion indications across the PSN, nor do they generate congestion indications when the egress PE detects congestion. It is possible to rectify this lack of functionality by adding indications in the PWE control word. The arbitrariness of the packets discarded can be alleviated by including a drop eligibility indication. The loss itself can be possibly avoided by mechanisms that explicit indicate forward and backward congestion. Such indications enable a PE to reflect the egress AC congestion status back towards the ingress AC, where steps can be taken to limit the ingress rate, thus avoiding buffer overflow. 2. Control Word Format The mechanisms described herein are only available when the Ethernet PW employs the PWE3 control word. Thus, when congestion handling is support the control word MUST be included in the PW packet. The use of the control word is usually signaled using the PWE3 control Stein Expires January 2, 2010 [Page 3] Internet-Draft ethpwcong July 2009 protocol [RFC4447]. There is no need to additionally signal the use of the mechanisms described herein, as the default actions suffice. The format of the control word is given in Figure 1 and has been chosen to be compatible with that of RFC 4619 [RFC4619]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 0|F|B|D|R|FRG| Length | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1. Control Word structure Bits 0 to 3 In the above diagram, the first 4 bits MUST be set to 0. F (bit 4) Forward Explicit Congestion Notification (FECN) bit. B (bit 5) Backward Explicit Congestion Notification (BECN) bit. D (bit 6) Discard Eligibility Indication (DEI) bit. R (bit 7) RESERVED bit. FRG (bits 8 and 9) described in RFC 4623 [RFC4623]. Length (bits 10 - 15) described in RFC 4385 [RFC4385]. Sequence Number (bits 16 - 31) described in RFC 4385 [RFC4385] and service specific encapsulation documents. 3. Drop Eligibility Indication If drop eligibility is supported, then the ingress PE MUST set the Drop Eligibility Indicator (DEI) bit in the PWE3 control word, and during congestion the egress PE MUST preferentially discard Ethernet frames that arrived in PW packets with the DEI bit set. When the ingress PE receives a Q-in-Q Ethernet frame from the AC, it MUST copy the DEI bit from the Ethernet frame into the DEI bit in the PWE3 control word. The ingress PE SHOULD perform the MEF bandwidth profile (token bucket) algorithm [MEF10.1]. Frames marked red MUST be discarded, and green and yellow frames MUST be encapsulated and forwarded. For yellow frames the ingress PE MUST set the DEI bit in the PWE control word. Stein Expires January 2, 2010 [Page 4] Internet-Draft ethpwcong July 2009 Intermediate network elements MUST NOT clear the DEI bit. Intermediate PW-aware network elements (e.g., S-PEs) MAY set the DEI bit upon experiencing congestion, if they run the MEF BW profile (token bucket) algorithm. When the egress PE needs to discard an Ethernet frame, it MUST discard packets with the DEI bit set before discarding packets with the DEI bit cleared. When the egress PE forwards Q-in-Q Ethernet frame to the AC, it MUST copy the DEI bit from the PWE control word into the DEI bit in the Ethernet frame. 4. Explicit Congestion Notification If explicit congestion notification is supported, then the egress PE MUST make the ingress PE aware of the congestion experienced, and the ingress PE MAY make the egress PE aware of such congestion. An ingress PE being informed of congestion by the egress PE SHOULD take steps to alleviate this congestion. If the egress PE receives PAUSE frames or detects Ethernet back- pressure or detects that its AC-bound queues pass a preconfigured threshold, then it MUST set the BECN bit in the PWE control word of all PW packets set in the opposite direction towards the ingress PE. If no packets are available for sending in the backward direction, the egress PE MUST send dummy BECN PW packets towards the ingress PE at a preconfigured rate (default is one per second). These dummy BECN packets have their BECN bit set, their length field set to zero, but contain no data. When the egress PE PAUSE timer expires, or it detects that back- pressure that had been applied has been removed, or its AC-bound queues drop below a preconfigured threshold, it MUST clear the BECN bit of all PW packets set towards the ingress PE. If no packets are available for sending in the backward direction, the egress PE MUST send three dummy BECN PW packets towards the ingress PE at a preconfigured rate (default is one per second). These dummy BECN packets have their BECN bit cleared, their length field set to zero, but contain no data. Intermediate network elements MUST NOT clear the BECN bit. Intermediate PW-aware network elements (e.g., S-PEs) upon experiencing congestion MAY set the BECN bit on packets forwarded in the opposite direction. When the ingress PE receives packets with the BECN bit set (including Stein Expires January 2, 2010 [Page 5] Internet-Draft ethpwcong July 2009 dummy BECN packets). it SHOULD perform one of the following operations to ameliorate the situation. It SHOULD send PAUSE packets or apply backpressure towards the ingress AC. If its Ethernet interface does not support PAUSE or back-pressure, it SHOULD apply the MEF bandwidth profile algorithm to frames received from the AC before sending them towards the PSN. If the ingress PE has admission control functionality, it SHOULD refuse further connections with traffic that would be forwarded to the egress PE, and MAY withdraw low priority connections. If the ingress PE detects that its output queues pass a preconfigured threshold, then it SHOULD send PAUSE frames or apply back-pressure to the AC. It SHOULD also set the FECN bit in the PWE control word of all PW packets set towards the egress PE, in order to inform the egress PE to expect delays. Intermediate network elements MUST NOT clear the FECN bit. Intermediate PW-aware network elements (e.g., S-PEs) MAY set the FECN bit upon experiencing congestion in the forward direction. If packets with FECN set have been send, then when the ingress PE sees that its PSN-bound queues drop below a preconfigured threshold, it MUST clear the FECN bit of all PW packets sent towards the egress PE. If no packets are available for sending in the forward direction, the ingress PE MUST send three dummy FECN PW packets towards the egress PE at a preconfigured rate (default is one per second). These dummy BECN packets have their FECN bit cleared, their length field set to zero, but contain no data. 5. Security Considerations The congestion handling mechanisms introduced here do not introduce significant security considerations above those present for PWs that do not use these mechanisms. For example, a denial of service attack based on forcing the ingress PE to slow down would require the ability to inject otherwise valid PW packets. A malicious entity that has attained that level has already breached the fundamental security of the PW infrastructure. 6. IANA Considerations This document requires no IANA actions. Stein Expires January 2, 2010 [Page 6] Internet-Draft ethpwcong July 2009 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use over an MPLS PSN", RFC 4385, February 2006. [RFC4623] Malis, A. and M. Townsley, "Pseudowire Emulation Edge-to- Edge (PWE3) Fragmentation and Reassembly", RFC 4623, August 2006. [MEF10.1] "MEF Technical Specification MEF 10.1 - Ethernet Service Attributes Phase 2", Metro Ethernet Forum MEF 10.1, November 2006. 7.2. Informative References [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, "Pseudowire Setup and Maintenance Using the Label Distribution Protocol (LDP)", RFC 4447, April 2006. [RFC4619] Martini, L., Kawa, C., and A. Malis, "Encapsulation Methods for Transport of Frame Relay over Multiprotocol Label Switching (MPLS) Networks", RFC 4619, September 2006. Author's Address Yaakov (Jonathan) Stein RAD Data Communications 24 Raoul Wallenberg St., Bldg C Tel Aviv 69719 ISRAEL Phone: +972 3 645-5389 Email: yaakov_s@rad.com Stein Expires January 2, 2010 [Page 7]