Network Working Group Jerry Ash Internet Draft Luyuan Fang Wai Sum Lai Category: Informational AT&T Expiration Date: January 2002 July 2001 Alternative Technical Solution for MPLS DiffServ TE Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Service Provider requirements for support of DiffServ-aware MPLS Traffic Engineering (DS-TE) are presented in [DS-TE-REQ]. [DS-TE- SOLN] describes a proposed technical solution for meeting the DS-TE requirements. The draft proposes complex IGP extensions of per- class-type link-state advertisements (LSA) to communicate per-class- type available bandwidth, etc. There is concern about scalability of the IGP overhead, and particularly the IGP response to overloads and failures [IGP-SCALE]. This Draft presents an alternative technical solution which avoids further extensions of the IGP. We give an example of this in this draft which shows how to use measurement- based/reservation-crankback admission control rather than flooding more per-CT-available-bandwidth information). This draft proposes this as an alternative Technical Solution for discussion. Table of Contents 1. Introduction 2. Concerns over Scalability of IGP Link-State Protocols 3. DS-TE Technical Solution Alternative 4. Security Considerations 5. References Ash, et al Category - Expiration [1] Alternative Solution for MPLS DiffServ TE July 2001 6. Authors' Addresses 1. Introduction Service Provider requirements for support of DiffServ-aware MPLS Traffic Engineering (DS-TE) are presented in [DS-TE-REQ]. DS-TE is discussed in the Traffic Engineering Working Group Framework document [TEWG-FW]. DS-TE requirements are defined for class types (CTs), where CTs are defined in [TEWG-FW] as aggregations of individual service classes. Instead of having per-class parameters being configured and propagated on each LSR interface, classes are aggregated into CTs having common per-CT parameters (e.g., minimum bandwidth) to satisfy required performance levels, however, no bandwidth requirements are enforced for classes within a CT. The main motivation for grouping a set of classes into a CT is to improve the scalability of IGP LSAs by propagating information on a per-CT basis instead of on a per-class basis, and also to allow better bandwidth sharing between classes in the same CT. It might be useful to generalize the notion of CTs and consider them to be a combination of a) QoS classes (e.g., as specified in [Y.1541] consistent with [DiffServ] queuing priority classes), b) admission-control priority classes, and c) restoration priority classes at both the MPLS-LSP and transport link level. This is discussed further in [CT-DEFN]. [DS-TE-SOLN] describes a proposed technical solution for meeting the DS-TE requirements. The draft proposes complex IGP extensions of per-class-type link-state advertisements (LSA) to communicate per- class-type available bandwidth, etc. It already gets so complex that the draft proposes to compress the advertisements. There is concern about scalability of the IGP overhead, and particularly the IGP response to overloads and failures [IGP-SCALE]. Hence there is concern about further significant extensions to increase IGP overhead, which will further exacerbate the problem identified in [IGP-SCALE]. Furthermore we think the extensions are unnecessary, because there are other, equally effective ways to do DS-TE without the IGP TE extensions. First we briefly review concerns over the scalability of IGPs, and then present an alternative technical solution which avoids further extensions of the IGP. We give an example of this which shows how to use measurement-based/reservation-crankback admission control rather than flooding more per-CT-available-bandwidth information). This draft proposes this as an alternative Technical Solution for discussion. 2. Concerns over the Scalability of IGP Link-State Protocols Congestion can arise in data networks for many different reasons. There is evidence based on previous failures that link state (LS) routing protocols, such as OSPF and ISIS, currently can not recover from large failures which result in widespread loss of topology Ash, et al Category - Expiration [2] Alternative Solution for MPLS DiffServ TE July 2001 database information (especially when areas/peer-groups get "too large"). LS protocols typically use topology-state update (TSU) mechanisms to build the topology database at each node, typically conveying the topology status through flooding of TSU messages containing link, node, and reachable-address information between nodes. In OSPF, they use the link state advertisement (LSA), in PNNI, such mechanisms use the PNNI topology state element (PTSE), in frame-relay and proprietary-routing networks, they may use other TSU mechanisms to exchange topology status information to build the topology database at each node. Earlier papers and contributions identified issues of congestion control and failure recovery for LS protocol networks, such as OSPF, ISIS, and PNNI networks [IGP-SCALE, maunder, choudhury, pappalardo1, pappalardo2, atm01-0101]. In [IGP-SCALE] much evidence is presented of the current problems associated with LS failure recovery from various failure conditions, which is based on a) failure experience, b) vendor analysis of product performance, and c) analytic modeling, simulation analysis, and emulation analysis. As to failure experience, AT&T has experienced serious data network outages in which recovery of the underlying LS protocols was inadequate. For example, in the failure in the AT&T Frame Relay Network on April 13, 1998 [att], an initial procedural error triggered two undetected software bugs, leading to a huge overload of control messages in the network. The result of this control overload was the loss of all topology database information, and the link-state protocol then attempted to recover the database with the usual Hello and TSU updates. Analysis has shown that several problems then occurred to prevent the network from recovering properly: - Very large number of TSUs being sent to every node to process, causing general processor overload - Route computation based on incomplete topology recovery, causing routes to be generated based on transient, asynchronous topology information and then in need of frequent re-computation - Inadequate work queue management to allow processes to complete before more work is put into the process queue - Inability to segment the network (into smaller "peer groups") to aid in the link-state protocol recovery - Inability to access the node processors with network management commands due to lack of necessary priority of these messages A more recent failure occurred on February 20, 2001 in the AT&T ATM Network, which resulted in a large overload of TSUs, and a lengthy network outage [pappalardo1, pappalardo2]. Manual procedures were put in place to reduce TSU flooding, which worked to stabilize the Ash, et al Category - Expiration [3] Alternative Solution for MPLS DiffServ TE July 2001 network. It is desirable that such TSU flooding reduction be automatic under overload. In general, there have been a number of major outages reported by most major carriers, and routing protocol issues have generally been involved. Other relevant LS-network failures are reported in [cholewka, jander]. Various networks employing LS protocols use various control messages and mechanisms to update the LS database, not necessarily LSAs, PTSEs, or flooding mechanisms. Based on experience, however, the LS protocols are found to be vulnerable to loss of database information, control overload to re-sync databases, and other failure/overload scenarios which make such networks more vulnerable in the absence of adequate protection mechanisms. Hence we are addressing a generic problem of LS protocols across a variety of implementations, and the basic problem is prevalent in LS protocol networks employing frame-relay, ATM, and IP based technologies. As a result of these failures, a number of congestion control/failure recovery mechanisms are being recommended [IGP- SCALE]. The goal is to enable LS protocols to a) gracefully recover from massive loss of topology database information, and b) respond gracefully to network overloads and failures. [IGP-SCALE] proposes specific additional considerations for network congestion control/failure recovery. Candidate mechanisms are proposed for control of network congestion and failure recovery, in particular the following mechanisms are proposed for investigation in OSPF and ISIS working groups: a) throttle new connection setups, topology-state updates, and Hello updates based on automatic congestion control mechanisms, b) special marking of critical control messages (e.g., Hello and topology-state-update Ack) so that they may receive prioritized processing, c) database backup, in which a topology database could be automatically recovered from loss based on local backup mechanisms, and d) hitless restart, which allows routes to continue to be used if there is an uninterrupted data path, even if the control path is interrupted due to a failure. There is much work already underway in standards bodies, namely the IETF, ATM Forum, and ITU-T, to address issues of congestion control and failure recovery in ATM- and IP-based packet networks. Numerous references are cited and are further explained in the document [maunder, moy1, moy2, moy3, murphy, whitehead, zinin, atm01-0101, btd-cs-congestion-02.00]. 3. DS-TE Technical Solution Alternative 3.1 General Requirements Ash, et al Category - Expiration [4] Alternative Solution for MPLS DiffServ TE July 2001 The following are some proposed, high-level requirements for MPLS- DiffServ TE, which address some of the IGP scalability concerns discussed in Section 2. This is all very preliminary and high-level, and intended to initiate further discussion. Also, the numerical values below are for illustrative purposes only. 1. No new LSAs used to signal per-CT available bandwidth, maximum bandwidth, preemption parameters, etc. Rather, CT-bandwidth allocated and protected by mechanisms that do not require new per- CT LSAs, such as in #3-5 below. (Note that [boyle] proposed that current specifications could be adapted to accomplish MPLS- DiffServ TE.) 2. MPLS connection-admission control and QoS/DiffServ signaling should be decoupled. 3. CT-Bandwidth allocated and protected by MPLS connection admission control, but done without additional LSA extensions. 3.2 Example We now give an existence proof example of how these requirements can be met. The example presented uses measurement-based/reservation- crankback admission control rather than flooding more per-CT- available-bandwidth information. This draft proposes this as an alternative Technical Solution for discussion. We now present the details of the example using the following 6 example class-types (CT): CT 1 (premium, protected): interactive (real-time), with key priority CT 2 (basic, protected): interactive (real-time), with normal priority CT 3 (premium, protected): non-interactive (low loss), with key priority CT 4 (basic, protected): non-interactive (low loss), with normal priority CT 5 (unprotected): best effort CT 6 (premium, protected): control traffic, with key priority, preemption of all user traffic allowed in overload or failed condition to minimize loss or delay to control traffic Restoration priority is a way of giving preference to protect higher priority LSPs ahead of lower priority LSPs. A premium service LSP can be protected in preference over a basic service LSP. Admission control priority is a way of giving preference to admit higher priority LSPs ahead of lower priority LSPs. A key service LSP can be admitted in preference over a normal service LSP. For both restoration and admission control, no preemption of existing LSPs is assumed beyond what is specified for CT6. Ash, et al Category - Expiration [5] Alternative Solution for MPLS DiffServ TE July 2001 1. CT-Bandwidth allocated and protected by MPLS connection admission control, but done without additional LSA extensions. a. at ingress LER: - CTs 1 and 3 given unrestricted access to bandwidth on any candidate LSP up to 10% of total traffic load; beyond 10% of total traffic load, bandwidth allocated only when > 5% bandwidth is idle (reservation signaled in the latter cases, perhaps using Setup Priority parameter); - CTs 2 and 4 given unrestricted access to bandwidth only on primary LSP up to the protected-CT-bandwidth level; otherwise (on alternate LSPs and/or when protected-CT- bandwidth exceeded) bandwidth allocated only when > 5% bandwidth is idle (reservation signaled in the latter cases, perhaps using Setup Priority parameter); - CT 5 allocated up to maximum protected-CT-bandwidth of 1% only on primary LSP, no alternate LSPs allowed; - ingress LER signals class type (perhaps using L-LSP parameter) and bandwidth allocation to transit LSRs in LSP. b. at transit LSRs - bandwidth allocation protected by QoS mechanisms (DiffServ priority, policing, etc.) according to signaled class type; - reservation not signaled (perhaps using Setup Priority parameter): bandwidth allocation unrestricted, if bandwidth unavailable, crankback to ingress LER; - reservation signaled (perhaps using Setup Priority parameter): bandwidth allocation restricted to when > 5% bandwidth is idle, if bandwidth unavailable, crankback to ingress LER; c. CAC is applied for bandwidth allocation per-aggregated- bandwidth-CT, not per microflow. 2. Protected-CT-bandwidth limit can be pre-provisioned per node-pair 3. Protected-CT-bandwidth can be dynamically computed per node-pair, for example: PBWi = protected bandwidth for CT i PBWi(w) = .5 x PBWi(w-1) + .5 x BWIPi(w) BWIPi = average bandwidth-in-progress across a load set period on CT i The quantities PBWi are computed periodically, such as every week w, per node-pair. 4. MPLS LSP restoration a. assigns a minimum of 5 diverse LSP backup path per premium-CT LSP b. assigns a minimum of 2 diverse LSP backup path per basic-CT LSP c. triggers redirecting all flows to backup LSPs upon specified triggers (e.g., LOS, LOF) d. sequentially hunts backup LSPs for available bandwidth to redirect flows e. alternatively, dynamically compute and hunt backup LSPs Ash, et al Category - Expiration [6] Alternative Solution for MPLS DiffServ TE July 2001 5. Transport link restoration a. assigns a minimum of 5 diverse backup transport paths per premium-CT transport link b. assigns a minimum of 2 diverse backup transport paths per premium-CT transport link c. triggers redirecting all LSPs to backup transport paths upon specified triggers (e.g., LOS, LOF) d. sequentially hunts backup transport paths for available bandwidth to redirect transport links e. alternatively, dynamically compute and hunt backup transport paths 6. No preemption of MPLS-LSPs and/or transport links across CTs, except for control-traffic CT. 4. Security Considerations There are no new security considerations based on proposals in this draft. 5. References [TE-REQ] Awduche et al, Requirements for Traffic Engineering over MPLS, RFC2702, September 1999. [TEWG-FW] Awduche et al, A Framework for Internet Traffic Engineering, draft-ietf-tewg-framework-04.txt, April 2001. [OSPF-TE] Katz, Yeung, Traffic Engineering Extensions to OSPF, draft-katz-yeung-ospf-traffic-04.txt, August 2001. [ISIS-TE] Smit, Li, IS-IS extensions for Traffic Engineering, draft- ietf-isis-traffic-02.txt, September 2000. [RSVP-TE] Awduche et al, "RSVP-TE: Extensions to RSVP for LSP Tunnels", draft-ietf-mpls-rsvp-lsp-tunnel-08.txt, February 2001. [DIFF-MPLS] Le Faucheur et al, "MPLS Support of Diff-Serv", draft- ietf-mpls-diff-ext-09.txt, April 2001 [CR-LDP] Jamoussi et al., "Constraint-Based LSP Setup using LDP", draft-ietf-mpls-cr-ldp-05.txt, February 2001 [DIFF-NEW] Grossman, "New Terminology for Diffserv", work in progress, draft-ietf-diffserv-new-terms-04.txt, March 2001. [IGP-SCALE] Ash, G., et. al., Proposed Mechanisms for Congestion Control/Failure Recovery in OSPF & ISIS Networks draft-ash-ospf- isis-congestion-control-00.txt, July 2001. [af-pnni-0055.000] "Private Network-Network Interface Specification Version 1.0 (PNNI 1.0)," March 1996. Ash, et al Category - Expiration [7] Alternative Solution for MPLS DiffServ TE July 2001 [ash] Ash, G. R., "Dynamic Routing in Telecommunications Networks," McGraw Hill. [atmf00-0249] "Scalability and Reliability of large ATM networks." [atm00-0257] "Signaling Congestion Control in PNNI Networks: The Need and Proposed Solution Outline." [atm00-0480] "Congestion Control/Failure Recovery in PNNI Networks." [atm01-0101] "Proposed Mechanisms for Congestion Control/Failure Recovery in PNNI Networks." [att] "AT&T announces cause of frame-relay network outage," AT&T Press Release, April 22, 1998. [btd-cs-congestion-02.00] "Signaling Congestion Control Version 1.0", Baseline Text [cholewka] Cholewka, K., "MCI Outage Has Domino Effect," Inter@ctive Week, August 20, 1999. [choudhury] Choudhury, G., Maunder, A. S., Sapozhnikova, V., "Faster Link-State Convergence and Improving Network Scalabiity and Stability," sumitted for presentation at LCN 2001. [hosein1] Hosein, P., "An Improved ACC Algorithm for Telecommunication Networks," Telecommunication Systems 0, 1998. [hosein2] Hosein, P., "Overload Control for Real-Time Telecommunication Databases," International Teletraffic Congress - 16, Edinburgh, Scotland, June 1999. [jander] Jander, M., "In Qwest Outage, ATM Takes Some Heat," Light Reading, April 6, 2001. [maunder] Maunder, A. S., Choudhury, G., "Explicit Marking and Prioritized Treatment of Specific IGP Packets for Faster IGP Convergence and Improved Network Scalability and Stability," draft- ietf-ospf-scalability-00, March 2001. [mummert] Mummert, V. S., "Network Management and its Implementation on the No. 4ESS," International Switching Symposium, Japan, 1976. [moy1] Moy, J., "Hitless OSPF Restart", draft-ietf-ospf-hitless- restart-00.txt, February 2001. [moy2] Moy, J., "Flooding over parallel point-to-point links," draft-ietf-ospf-ppp-flood-01.txt, February 2001. [moy3] Moy, J., "Flooding Over a Subset Topology," draft-ietf-ospf- subset-flood-00.txt, February 2001. draft-ietf-ospf-subset-flood-00.txt Ash, et al Category - Expiration [8] Alternative Solution for MPLS DiffServ TE July 2001 [murphy] Murphy, P., "OSPF Floodgates," draft-ietf-ospf-floodgates- 01.txt, December 2000. [pappalardo1] Pappalardo, D., "Can one rogue switch buckle AT&T's network?," Network World Fusion, February 23, 2001. [pappalardo2] Pappalardo, D., "AT&T, customers grapple with ATM net outage," Network World, February 26, 2001. [Q.764] "Signalling System No. 7 - ISDN user part signalling procedures," December 1999. [whitehead] Whitehead, Martin, "A class of overload controls based on controlling call reject rates," ITU-T contribution D.19, Feburary 2001. [zinin] Zinin, A., et. al., "OSPF Restart Signaling," draft-ietf- ospf-restart-01.txt, February 2001. [DS-TE-REQ] Le Faucheur, F., et al, "Requirements for support of Diff-Serv-aware MPLS Traffic Engineering," draft-ietf-tewg-diff-te- reqts-01.txt, June 2001. [boyle] Boyle, Jim, "Accomplishing Diffserv TE needs with Current Specifications," draft-boyle-tewg-ds-nop-00.txt, July 2001. 6. Authors' Addresses Jerry Ash AT&T Room MT D5-2A01 200 Laurel Avenue Middletown, NJ 07748, USA Phone: +1-(732)-420-4578 Fax: +1-(732)-368-8659 Email: gash@att.com Luyuan Fang AT&T Room C2-3B35 200 S.Laurel Avenue Middletown, NJ 07748 Phone: + 1 732 420 1921 Email: luyuanfang@att.com Wai Sum Lai AT&T Room D5-3D18 200 S. Laurel Avenue Middletown, NJ 07748 Phone: +1 732 420-3712 Fax:+1 732 368-1919 Ash, et al Category - Expiration [9] Alternative Solution for MPLS DiffServ TE July 2001 Email: wlai@att.com Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Ash, et al Category - Expiration [10]