Network Working Group H.Berkowitz Internet Draft A.Retana Expires Febuaary 2002 S.Hares draft-ietf-bmwg-bgpbas-00.txt P.Krishnaswamy M. Lepp June 2001 Benchmarking Methodology for Basic BGP Convergence Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026[1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This draft establishes standards for measuring BGP convergence performance. Its initial emphasis is on the control plane of single BGP routers. We do not address forwarding plane performance. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC-2119]. [2]. Table of Contents 1 1. Introduction 2 1.1 Overview and Roadmap 2 1.2 Scope 3 1.3. Types of Single-Router Convergence 3 2. Reference Configurations 4 3. Basic eBGP tests 4 3.1 Connection Conditions 5 3.2 Test Streams 5 3.3 Order of Received Updates 5 3.4 Initial Convergence 6 3.4.1 Single Peer Initial Convergence Time 6 3.4.2 Multiple Peers 7 3.5 Incremental Re-convergence with a Single Peer 7 3.5.1 Explicit add of single new route 7 3.5.2 Sequential withdraw and reannounce 7 3.5.3 Time to Change to Alternate Path after Explicit Withdrawal 7 3.6 Incremental Re-convergence with Multiple Peers 8 4. Flaps 8 4.1 Flap Isolation Test 8 4.2 Authentication 8 5. Acknowledgements 8 6. References 8 Appendix A. Representative Scenarios 10 A.1 Default-free interprovider peering 10 A.2 Interprovider peering with transit 10 A.3 Provider edge router 10 A.4 Multihomed subscriber edge router 10 1. Introduction This document describes a specific set of tests aimed at characterizing the convergence performance of BGP-4 processes in routers or other boxes that incorporate BGP functionality. A key objective is to propose methodology that will standardize the conducting and reporting of convergence-related measurements. Although both convergence and forwarding are essential to basic router operation, this document does not consider the forwarding performance in the Device Under Test (DUT),for two reasons. Forwarding performance is the primary focus in [RFC 2544] and it is expected to be dealt with in work that ensues from [Trotter]. Further, as convergence characterization is a complex process, we deliberately restrict this document basic measurements towards characterizing BGP convergence. Subsequent documents will explore the more intricate aspects of convergence measurement, such as the presence of policy processing, simultaneous traffic on the control and data paths within the DUT, and other realistic performance modifiers. Convergence of Interior Gateway Protocols will be considered in separate drafts. 1.1 Overview and Roadmap Measurements of protocols can be classified either as internal or external. Internal measurements are time-stamped within the Device Under Test (DUT). External measurements infer the timing of a process in the DUT to have converged after a downstream measurement device indicates the corresponding advertisement has been received. An alternative type of external measurement is to test for data forwarded to the downstream device that relies upon the new route just computed by the Device Under Test. Internal measurements are plagued with time synchronization issues, since the Network Time Protocol (NTP) hooks may be missing from products or improperly implemented. Of course in a self-contained lab setting or the self-contained measurement of internal processes themselves, synchronized timing is not an issue. For the purposes of this paper, external technique are more readily applicable. However, external measurements have their own problems because they include the time to advertise the new route downstream and transmission times for the advertisement within the device under test. If data forwarding were to feature in the measurement methodology it too would include some extraneous latency- that of the forwarding lookup process in the DUT at the minimum. This document deals only with external measurements limited to route propagation. A characterization of the BGP convergence performance of a device must take into account all distinct stages and aspects of BGP functionality. This requires that the relevant terms and metrics be as specific as possible. A terminology that meets this objective was presented in draft-ietf-bmwg-conterm-00.txt 1.2 Scope This document deals with eBGP convergence of a single router Device Under Test (DUT). It restricts the measurement of convergence to events in the control plane, and does not consider the interactions of convergence and forwarding. Convergence measurements among multiple iBGP-connected routers in an AS, and Internet-wide convergence measurements, are outside the scope of this document as well. These additional topics are unquestionably of interest, and it is the intention of this document to form a stepping stone toward them 1.3. Types of Single-Router Convergence Two significantly different types of convergence time tend to be lumped together in product specifications. The first is the time needed for a BGP speaker to build a full table after initialization, or for a particular peering session to rebuild its table after a hard reset. The second is the time needed for a router to respond to a new announcement or withdrawal. As stated in the Roadmap, measurements can be defined either as internal or external. Internal measurements examine the RIB/FIB of the DUT directly. While they are more accurate in principle, they require measurement hooks in the implementation, as described in [Ahuja et al]. External measurements start with a stimulus from one or more "upstream" routers and end with a specific event causing an advertisement to be sent to a "downstream" peer. In the reference configuration above, external measurements are defined with respect to TR3 as the downstream router. 2. Reference Configurations For tests when the number of peers is not a performance parameter of interest, use the configuration in Figure 1: TR1==========+---------+==========TR3 | | | D1 | | | | DUT | TR2==========| | +---------+ Figure 1. Basic Test Configuration. D1 is a prefix reachable by both TR1 and TR2. Neither TR1 or TR2 is the originating AS for the announcement of D1. More complex peering arrangements will involve up to n Test Routers, as shown in Figure 2. It is recommended that the Figure 1 configuration always be tested as a baseline, and then additional reports made that show the effect on performance of increasing the number of peers. TR1==========+---------+==========TR3 | | | D1 | | | | DUT | TR2==========| | | | ... TRn==========+---------+ Figure 2. Test Configuration with n Peers. Interface speeds must be specified as part of the test report. At least 100 Mbps is recommended, so media delays are not a significant component of convergence times. In the absence of other route selection criteria, TR1 shall have an IP address that makes it most preferred. 3. Basic eBGP tests All routers in this configuration shall have a policy of ADVERTISE ALL/ACCEPT ALL [RPSL]. Tests with prefix filtering, community-based preferences, authentication, etc., as well as performance under flap are TBD. Not all eBGP applications are alike. While the tests in this section are applicable to a wide range of configurations, testers may select configurations that are most relevant to the intended product use. Such configurations include: 1. Interprovider peering, characterized by an exchange of customer routes,which, in the case of major providers, may be in the tens of thousands of routes but smaller than the full default-free table. 2. Provider/Subscriber edge peering, where transit service implies the subscriber advertises relatively few routes to the provider but may take, variously, full default-free routes, a limited subset therein, or default only from the provider. 3.1 Connection Conditions The DUT should be physically connected to the test routers over a medium sufficiently fast that propagation time is not a significant factor. A medium of at least 100 Mbps is recommended. Multiple peers may be connected to a single physical interface using 802.1q VLANs or another appropriate multiplexing scheme. TCP connections shall use slow start. Any nonstandard initial or maximum window sizes shall be indicated in the test report. 3.2 Test Streams Packet trains presented to the DUT shall be random with respect to prefix length or order of specificity. The degree of update packing shall be specified. When long packet trains are being sent, the usual case will be that maximum packing up to the MTU size will be used. 3.3 Order of Received Updates Within a set of updates, there is a potential for ordering among the prefixes. For the fairest testing of update trains randomize the order of prefixes, so no particular RIB data structure benefits by the ordering. Assume we have a Adj-RIB-out that consists of 1.0.0.0/8 2.0.0.0/8 3.0.0.0/8 1.1.0.0/16 2.1.0.0/16 3.1.0.0/16 3.2.0.0/16 1.1.1.0/24 1.1.2.0/24 2.1.2.0/24 If it were sent in this order, top to bottom, it would be sorted by prefix size and prefix value within size. A radix tree implementation might like to receive this very much. But if it were sent out in the following order 1.0.0.0/8 1.1.0.0/16 1.1.1.0/24 1.1.2.0/24 2.0.0.0/8 2.1.0.0/16 2.1.2.0/24 3.0.0.0/8 3.1.0.0/16 3.2.0.0/16 It would make the day for an implementation that orders its routing table as a strict tree, implemented as a linked list. The optimal test train would be 1.0.0.0/8 2.1.0.0/16 1.1.0.0/16 3.0.0.0/8 1.1.1.0/24 2.0.0.0/8 1.1.2.0/24 3.1.0.0/16 2.1.2.0/24 3.2.0.0/16 which is random, and does not favor any particular implementation. Measurement units: A metric of randomness,TBD 3.4 Initial Convergence While this is relatively simple to measure, and often is the basis of product specifications, it is operationally far less significant than reconvergence after changes. A "carrier-grade" router should not initialize often, and the soft reset option reduces the need to rebuild views. The initialization time, therefore, can be amortized over a long period of time and may disappear into the noise when compared to reconvergence. 3.4.1 Single Peer Initial Convergence Time This basic reference test uses a representatively sized and populated target RIB and no other variable influences (eg authentication off, filters off, no policy). The test begins with OPEN requests sent from TR1 and TR2 to the DUT. Each Test Router sends a standard routing table of TBD routes. The test ends when the DUT begins to advertise the last route in the routing table to TR3. 3.4.2 Multiple Peers TBD 3.5 Incremental Re-convergence with a Single Peer For all of these measurements, report any route filters, authentication, and reverse path verification used. It is recommended that these not be used for initial testing. 3.5.1 Explicit add of single new route This test measures the time required to add a route newly advertised by a peer. Such a route does not exist in the DUT's RIB, and will not displace a route in the RIB. The DUT has been initialized, with no path to D1. Measurement time begins when TR1 announces D1 to the DUT. Measurement time stops when the DUT advertises D1 to TR3. 3.5.2 Sequential withdraw and reannounce The DUT has been initialized and has a path to D1 via TR1, not TR2. Simultaneously, TR1 sends TDown(TR1) and TR2 announces the new route with Tbest(TR2). Measurement begins when Tbest is received at the DUT. Measurement time stops when the DUT advertises D1 to TR3. 3.5.3 Time to Change to Alternate Path after Explicit Withdrawal The DUT has been initialized and has paths to D1 via both TR1 and TR2. TR1's path is preferred, but TR1 withdraws it with TDown(TR1). Re- convergence occurs when the TR2 advertised path(s) becomes active. Measurement time stops when the DUT advertises D1 to TR3. 3.6 Incremental Re-convergence with Multiple Peers The number of routes per BGP peer is an obvious stressor to the convergence process. The number, and relative proportion, of multiple route instances and distinct routes being added or withdrawn by each peer will affect the convergence process, as will the mix of overlapping route instances, and IGP routes. 4. Flaps The following tests evaluate convergence when route flap exists. Let TRF be a router that will generate only flapping routes. TR1==========+---------+==========TR3 | | | D1 | | | | DUT | TR2==========| | | | ... TRF==========+---------+ Figure 3. Test Diagram with a Router, TRF, flapping. 4.1 Flap Isolation Test TRF will advertise a continuously flapping route. Repeat the eBGP convergence tests. The objective is to determine whether one route flapping affects the operation of the router. 4.2 Authentication Repeat all tests above with MD5 authentication. 5. Acknowledgements Thanks to Francis Ovenden for review and Abha Ahuja for encouragement. Much appreciation to Jeff Haas, Matt Richardson, and Shane Wright at Nexthop for comments and input. 6. References [Ahuja 2000a] "An Experimental Study of Delayed Internet Routing Convergence." Abha Ahuja, Farnam Jahanian, Abhijit Bose, Craig Labovits, RIPE 37 - Routing WG. [RFC 2119] "Key words for use in RFCs to Indicate Requirement Levels." S Bradner, March 1997. [RFC 2539] "BGP Route Flap Damping" C. Villamizar, R. Chandra, R. Govindan. November 1998. [RFC 2544] "Benchmarking Methodology for Network Interconnect Devices." S. Bradner, J. McQuaid. March 1999. [RFC 2622] Routing Policy Specification Language (RPSL)." C. Alaettinoglu, C. Villamizar, E. Gerich, D. Kessens, D. Meyer, T. Bates, D. Karrenberg, M. Terpstra. June 1999. [RFC 2827] Network Ingress Filtering: Defeating Denial of Service Attacks which employ IP Source Address Spoofing. P. Ferguson, D. Senie. May 2000. [RFC 2928] "Route Refresh Capability for BGP-4". E. Chen. [Trotter] "Terminology for Forwarding Information Based (FIB) based Router Performance Benchmarking", Work in Progress, IETF draft-ietf- bmwg-fib-term-00.txt 12. Authors' Addresses Howard Berkowitz Nortel Networks 5012 S. 25th St Arlington VA 22206 Phone: +1 703 998-5819 (ESN 451-5819) Fax: +1 703 998-5058 EMail: hberkowi@nortelnetworks.com hcb@clark.net Alvaro Retana Cisco Systems, Inc. 7025 Kit Creek Rd. Research Triangle Park, NC 27709 Email: aretana@cisco.com Susan Hares Nexthop Technologies 517 W. William Ann Arbor, Mi 48103 Phone: Email: skh@nexthop.com Padma Krishnaswamy Nexthop Technologies 517 W William Ann Arbor, Mi 48103 Phone: 734 936 2656 Email: kri@nexthop.com Marianne Lepp Juniper Networks 51 Sawyer Road Waltham, MA 02453 Phone: 617 645 9019 Email: mlepp@juniper.net Appendix A. Representative Scenarios The following describes sample BGP applications positioned at various points in the network. A.1 Default-free interprovider peering The DUT exchanges 0.3 to 0.5 D with a small number of peers. Typically, routers in this application are limited by bandwidth rather than route processing A.2 Interprovider peering with transit The DUT exchanges 1.3 D routes with a small number of peers. A.3 Provider edge router The DUT has a large number (>10) of eBGP peers. To 10% of the peers, the DUT advertises 1.3 D. To 20% of the peers, the DUT advertises 0.3 D. To 70% of the peers, the DUT advertises default. 50% of the peers advertise an aggregate and a more-specific route to the DUT. 20% of the peers advertise 10 or more routes to the DUT. 30% of the peers advertise a single route to the DUT. A.4 Multihomed subscriber edge router The DUT connects to 2 peers. It advertises an aggregate and a more- specific to each. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.