Benchmarking Methodology Working S. Poretsky Group Allot Communications Internet-Draft V. Gurbani Expires: January 13, 2011 Bell Laboratories, Alcatel-Lucent C. Davids Illinois Institute of Technology July 12, 2010 Terminology for Benchmarking Session Initiation Protocol (SIP) Networking Devices draft-ietf-bmwg-sip-bench-term-02 Abstract This document provides a terminology for benchmarking SIP performance in networking devices. Terms are included for test components, test setup parameters, and performance benchmark metrics for black-box benchmarking of SIP networking devices. The performance benchmark metrics are obtained for the SIP control plane and media plane. The terms are intended for use in a companion methodology document for complete performance characterization of a device in a variety of conditions making it possible to compare performance of different devices. It is critical to provide test setup parameters and a methodology document for SIP performance benchmarking because SIP allows a wide range of configuration and operational conditions that can influence performance benchmark measurements. It is necessary to have terminology and methodology standards to ensure that reported benchmarks have consistent definition and were obtained following the same procedures. Benchmarks can be applied to compare performance of a variety of SIP networking devices. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at Poretsky, et al. Expires January 13, 2011 [Page 1] Internet-Draft SIP Benchmarking Terminology July 2010 http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on January 13, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the BSD License. Poretsky, et al. Expires January 13, 2011 [Page 2] Internet-Draft SIP Benchmarking Terminology July 2010 Table of Contents 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Benchmarking Models . . . . . . . . . . . . . . . . . . . 8 3. Term Definitions . . . . . . . . . . . . . . . . . . . . . . . 11 3.1. Protocol Components . . . . . . . . . . . . . . . . . . . 11 3.1.1. Session . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.2. Signaling Plane . . . . . . . . . . . . . . . . . . . 14 3.1.3. Media Plane . . . . . . . . . . . . . . . . . . . . . 15 3.1.4. Associated Media . . . . . . . . . . . . . . . . . . . 15 3.1.5. Overload . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.6. Session Attempt . . . . . . . . . . . . . . . . . . . 17 3.1.7. Established Session . . . . . . . . . . . . . . . . . 17 3.1.8. Invite-initiated Session (IS) . . . . . . . . . . . . 18 3.1.9. Non-INVITE-initiated Session (NS) . . . . . . . . . . 18 3.1.10. Session Attempt Failure . . . . . . . . . . . . . . . 19 3.1.11. Standing Sessions Count . . . . . . . . . . . . . . . 19 3.2. Test Components . . . . . . . . . . . . . . . . . . . . . 20 3.2.1. Emulated Agent . . . . . . . . . . . . . . . . . . . . 20 3.2.2. Signaling Server . . . . . . . . . . . . . . . . . . . 20 3.2.3. SIP-Aware Stateful Firewall . . . . . . . . . . . . . 21 3.2.4. SIP Transport Protocol . . . . . . . . . . . . . . . . 21 3.3. Test Setup Parameters . . . . . . . . . . . . . . . . . . 22 3.3.1. Session Attempt Rate . . . . . . . . . . . . . . . . . 22 3.3.2. IS Media Attempt Rate . . . . . . . . . . . . . . . . 22 3.3.3. Establishment Threshold Time . . . . . . . . . . . . . 23 3.3.4. Session Duration . . . . . . . . . . . . . . . . . . . 24 3.3.5. Media Packet Size . . . . . . . . . . . . . . . . . . 24 3.3.6. Media Offered Load . . . . . . . . . . . . . . . . . . 25 3.3.7. Media Session Hold Time . . . . . . . . . . . . . . . 25 3.3.8. Loop Detection Option . . . . . . . . . . . . . . . . 26 3.3.9. Forking Option . . . . . . . . . . . . . . . . . . . . 26 3.4. Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.1. Registration Rate . . . . . . . . . . . . . . . . . . 27 3.4.2. Session Establishment Rate . . . . . . . . . . . . . . 28 3.4.3. Session Capacity . . . . . . . . . . . . . . . . . . . 29 3.4.4. Session Overload Capacity . . . . . . . . . . . . . . 30 3.4.5. Session Establishment Performance . . . . . . . . . . 30 3.4.6. Session Attempt Delay . . . . . . . . . . . . . . . . 31 3.4.7. IM Rate . . . . . . . . . . . . . . . . . . . . . . . 31 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 5. Security Considerations . . . . . . . . . . . . . . . . . . . 32 6. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 33 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7.1. Normative References . . . . . . . . . . . . . . . . . . . 33 7.2. Informational References . . . . . . . . . . . . . . . . . 33 Poretsky, et al. Expires January 13, 2011 [Page 3] Internet-Draft SIP Benchmarking Terminology July 2010 Appendix A. White Box Benchmarking Terminology . . . . . . . . . 34 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 34 Poretsky, et al. Expires January 13, 2011 [Page 4] Internet-Draft SIP Benchmarking Terminology July 2010 1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC2119 [RFC2119]. RFC 2119 defines the use of these key words to help make the intent of standards track documents as clear as possible. While this document uses these keywords, this document is not a standards track document. The term Throughput is defined in RFC2544 [RFC2544]. For the sake of clarity and continuity, this document adopts the template for definitions set out in Section 2 of RFC 1242 [RFC1242]. Definitions are indexed and grouped together for ease of reference. This document uses existing terminology defined in other BMWG work. Examples include, but are not limited to: Device under test (DUT) (c.f., Section 3.1.1 RFC 2285 [RFC2285]). System under test (SUT) (c.f., Section 3.1.2, RFC 2285 [RFC2285]). Many commonly used SIP terms in this document are defined in RFC 3261 [RFC3261]. For convenience the most important of these are reproduced below. Use of these terms in this document is consistent with their corresponding definition in [RFC3261]. o Call Stateful: A proxy is call stateful if it retains state for a dialog from the initiating INVITE to the terminating BYE request. A call stateful proxy is always transaction stateful, but the converse is not necessarily true. o Stateful Proxy: A logical entity that maintains the client and server transaction state machines defined by this specification during the processing of a request, also known as a transaction stateful proxy. The behavior of a stateful proxy is further defined in Section 16. A (transaction) stateful proxy is not the same as a call stateful proxy. o Stateless Proxy: A logical entity that does not maintain the client or server transaction state machines defined in this specification when it processes requests. A stateless proxy forwards every request it receives downstream and every response it receives upstream. o Back-to-back User Agent: A back-to-back user agent (B2BUA) is a logical entity that receives a request and processes it as a user agent server (UAS). In order to determine how the request should be answered, it acts as a user agent client (UAC) and generates requests. Unlike a proxy server, it maintains dialog state and must participate in all requests sent on the dialogs it has established. Since it is a concatenation of a UAC and a UAS, no explicit definitions are needed for its behavior. Poretsky, et al. Expires January 13, 2011 [Page 5] Internet-Draft SIP Benchmarking Terminology July 2010 o Loop: A request that arrives at a proxy, is forwarded, and later arrives back at the same proxy. When it arrives the second time, its Request-URI is identical to the first time, and other header fields that affect proxy operation are unchanged, so that the proxy will make the same processing decision on the request it made the first time. Looped requests are errors, and the procedures for detecting them and handling them are described by the protocol. 2. Introduction Service Providers are now planning Voice Over IP (VoIP) and Multimedia network deployments using the IETF developed Session Initiation Protocol (SIP) [RFC3261]. SIP is a signaling protocol originally intended to be used for the dynamic establishment, disconnection and modification of streams of media between end users. As it has evolved it has been adopted for use in a growing number of applications and features. Many of these result in the creation of a media stream, but some do not. Instead, they create other services tailored to the end-users' immediate needs or preferences. The set of benchmarking terms provided in this document is intended for use with any SIP-enabled device performing SIP functions in the interior of the network. The performance of end-user devices is outside the scope of this document. VoIP with SIP has led to the development of new networking devices including SIP Server, Session Border Controllers (SBC), Back-to-back user agents (B2BUA) and SIP-Aware Stateful Firewall. The mix of voice and IP functions in these various devices has produced inconsistencies in vendor reported performance metrics and has caused confusion in the service provider community. SIP allows a wide range of configuration and operational conditions that can influence performance benchmark measurements. When the device under test terminates or relays both media and signaling, for example, it is important to be able to correlate a signaling measurement with the media plane measurements to determine the system performance. As devices and their functions proliferate, the need to have a consistent set of metrics to compare their performance becomes increasingly urgent. This document and its companion methodology document [I-D.ietf-bmwg-sip-bench-meth] provide a set of black-box benchmarks for describing and comparing the performance of devices that incorporate the SIP User Agent Client and Server functions and that operate in the network's core. The definition of SIP performance benchmarks necessarily includes definitions of Test Setup Parameters and a test methodology. These enable the Tester to perform benchmarking tests on different devices Poretsky, et al. Expires January 13, 2011 [Page 6] Internet-Draft SIP Benchmarking Terminology July 2010 and to achieve comparable and repeatable results. This document provides a common set of well-defined terms for Test Components, Test Setup Parameters, and Benchmarks. All the benchmarks defined are black-box measurements of the SIP Control (Signaling) plane. The Test Setup Parameters and Benchmarks defined in this document are intended for use with the companion Methodology document. Benchmarks of internal DUT characteristics (also known as white-box benchmarks) such as Session Attempt Arrival Rate, which is measured at the DUT, are described in Appendix A to allow additional characterization of DUT behavior with different distribution models. 2.1. Scope The scope of this work item is summarized as follows: o This terminology document describes SIP signaling (control- plane) performance benchmarks for black-box measurements of SIP networking devices. Stress and debug scenarios are not addressed in this work item. o The DUT must be an RFC 3261 capable network equipment. This may be a Registrar, Redirect Server, Stateless Proxy or Stateful Proxy. A DUT MAY also include a B2BUA, SBC functionality (this is referred to as the "Signaling Server".) The DUT MAY be a multi- port SIP-to-switched network gateway implemented as a SIP UAC or UAS. o The DUT MAY have an internal SIP Application Level Gateway (ALG), firewall, and/or a Network Address Translator (NAT). This is referred to as the "SIP Aware Stateful Firewall." o The DUT or SUT MUST NOT be end user equipment, such as personal digital assistant, a computer-based client, or a user terminal. o The Tester acts as multiple "Emulated Agents" that initiate (or respond to) SIP messages as session endpoints and source (or receive) associated media for established connections. o Control Signaling in presence of Media * The media performance is not benchmarked in this work item. * It is RECOMMENDED that control plane benchmarks are performed with media present, but this is optional. * The SIP INVITE requests MUST include the SDP body. * The type of DUT dictates whether the associated media streams traverse the DUT or SUT. Both scenarios are within the scope of this work item. * SIP is frequently used to create media streams; the control plane and media plane are treated as orthogonal to each other in this document. While many devices support the creation of media streams, benchmarks that measure the performance of these streams are outside the scope of this document and its companion methodology document [I-D.ietf-bmwg-sip-bench-meth]. Tests may be performed with or without the creation of media streams. The presence or absence of media streams MUST be Poretsky, et al. Expires January 13, 2011 [Page 7] Internet-Draft SIP Benchmarking Terminology July 2010 noted as a condition of the test as the performance of SIP devices may vary accordingly. Even if the media is used during benchmarking, only the SIP performance will be benchmarked, not the media performance or quality. o Both INVITE and non-INVITE scenarios (such as Instant Messages or IM) are addressed in this document. However, benchmarking SIP presence is not a part of this work item. o Different transport mechanisms -- such as UDP, TCP, SCTP, or TLS -- may be used; however, the specific transport mechanism MUST be noted as a condition of the test as the performance of SIP devices may vary accordingly. o Looping and forking options are also considered since they impact processing at SIP proxies. o REGISTER and INVITE requests may be challenged or remain unchallenged for authentication purpose as this may impact the performance benchmarks. Any observable performance degradation due to authentication is of interest to the SIP community. Whether or not the REGISTER and INVITE requests are challenged is a condition of test and will be recorded and reported. o Re-INVITE requests are not considered in scope of this work item. o Only session establishment is considered for the performance benchmarks. Session disconnect is not considered in the scope of this work item. o SIP Overload [I-D.ietf-sipping-overload-reqs] is within the scope of this work item. We test to failure and then can continue to observe and record the behavior of the system after failures are recorded. The cause of failure is not within the scope of this work. We note the failure and may continue to test until a different failure or condition is encountered. Considerations on how to handle overload are deferred to work progressing in the SIPPING working group [I-D.ietf-sipping-overload-design]. Vendors are, of course, free to implement their specific overload control behavior as the expected test outcome if it is different from the IETF recommendations. However, such behavior MUST be documented and interpreted appropriately across multiple vendor implementations. This will make it more meaningful to compare the performance of different SIP overload implementations. o IMS-specific scenarios are not considered, but test cases can be applied with 3GPP-specific SIP signaling and the P-CSCF as a DUT. 2.2. Benchmarking Models This section shows the five models to be used when benchmarking SIP performance of a networking device. Figure 1 shows a configuration in which the Tester acting as the Emulated agents is in loopback testing itself for the purpose of baselining its performance. Poretsky, et al. Expires January 13, 2011 [Page 8] Internet-Draft SIP Benchmarking Terminology July 2010 +--------+ Signaling request +--------+ | +----------------------------->| | | Tester | | Tester | | EA | Signaling response | EA | | |<-----------------------------+ | +--------+ +--------+ Figure 1: Test topology 1 - Emulated agent (EA) baseline performance measurement Figure 2 shows the basic configuration for benchmarking the Registration of the DUT/SUT. +--------+ Registration +--------+ | +----------------------------->| | | Tester | | DUT | | EA | Response | | | |<-----------------------------+ | +--------+ +--------+ Figure 2: Test topology 2 - Emulated agent (EA) registration to DUT/ SUT Figure 3 shows that the Tester acts as the initiating and responding Emulated Agents as the DUT/SUT forwards Session Attempts. +--------+ Session +--------+ Session +--------+ | | Attempt | | Attempt | | | |<------------+ |<------------+ | | | | | | | | | Response | | Response | | | Tester +------------>| DUT +------------>| Tester | | (EA) | | | | (EA) | | | | | | | +--------+ +--------+ +--------+ Figure 3: Test topology 3 - DUT/SUT performance benchmark for session establishment without media Figure 4 is used when performing those same benchmarks with Associated Media traversing the DUT/SUT. Poretsky, et al. Expires January 13, 2011 [Page 9] Internet-Draft SIP Benchmarking Terminology July 2010 +--------+ Session +--------+ Session +--------+ | | Attempt | | Attempt | | | |<------------+ |<------------+ | | | | | | | | | Response | | Response | | | Tester +------------>| DUT +------------>| Tester | | | | | | (EA) | | | Media | | Media | | | |<===========>| |<===========>| | +--------+ +--------+ +--------+ Figure 4: Test topology 4 - DUT/SUT performance benchmark for session establishment with media traversing the DUT Figure 5 is to be used when performing those same benchmarks with Associated Media, but the media does not traverse the DUT/SUT. Again, the benchmarking of the media is not within the scope of this work item. The SIP control signaling is benchmarked in the presence of Associated Media to determine if the SDP body of the signaling and the handling of media impacts the performance of the DUT/SUT. +--------+ Session +--------+ Session +--------+ | | Attempt | | Attempt | | | |<------------+ |<------------+ | | | | | | | | | Response | | Response | | | Tester +------------>| DUT +------------>| Tester | | | | | | (EA) | | | | | | | +--------+ +--------+ +--------+ /|\ /|\ | Media | +=============================================+ Figure 5: Test topology 5 - DUT/SUT performance benchmark for session establishment with media external to the DUT Figure 6 illustrates the SIP signaling for an Established Session. The Tester acts as the Emulated Agent(s) and initiates a Session Attempt with the DUT/SUT. When the Emulated Agent (EA) receives a 200 OK from the DUT/SUT that session is considered to be an Established Session. The illustration indicates three states of the session bring created by the EA - Attempting, Established, and Disconnecting. Sessions can be one of two type: Invite-Initiated Session (IS) or Non-Invite Initiated Session (NS). Failure for the Poretsky, et al. Expires January 13, 2011 [Page 10] Internet-Draft SIP Benchmarking Terminology July 2010 DUT/SUT to successfully respond within the Establishment Threshold Time is considered a Session Attempt Failure. SIP Invite messages MUST include the SDP body to specify the Associated Media. Use of Associated Media, to be sourced from the EA, is optional. When Associated Media is used, it may traverse the DUT/SUT depending upon the type of DUT/SUT. The Associated Media is shown in Figure 6 as "Media" connected to media ports M1 and M2 on the EA. After the EA sends a BYE, the session disconnects. Performance test cases for session disconnects are not considered in this work item (the BYE request is shown for completeness.) EA DUT/SUT M1 M2 | | | | | INVITE | | | --------+--------------->| | | | | | | Attempting | | | | 200 OK | | | --------|<-------------- | | | | | | | | | | | | | | Media | Established | |<=====>| | | | | | BYE | | | --------+--------------> | | | | | | | Disconnecting | | | | 200 OK | | | --------|<-------------- | | | | | | | Figure 6: Basic SIP test topology 3. Term Definitions 3.1. Protocol Components 3.1.1. Session Definition: Poretsky, et al. Expires January 13, 2011 [Page 11] Internet-Draft SIP Benchmarking Terminology July 2010 The combination of signaling and media messages and processes that enable two or more participants to communicate. Discussion: SIP messages in the signaling plane can be used to create and manage applications for one or more end users. SIP is often used to create and manage media streams in support of applications. A session always has a signaling component and may have a media component. Therefore, a Session may be defined as signaling only or a combination of signaling and media (c.f. Associated Media, see Section 3.1.4). SIP includes definitions of a Call-ID, a dialogue and a transaction that support this application. A growing number of usages and applications do not require the creation of associated media. The first such usage was the REGISTER. Applications that use the MESSAGE and SUBSCRIBE/NOTIFY methods also do not require SIP to manage media streams. The terminology Invite-initiated Session (IS) and Non-invite initiated Session (NS) are used to distinguish between these different usages. A Session in the context of this document, is considered to be a vector with three components: 1. A component in the signaling plane (SIP messages), sess.sig; 2. A media component in the media plane (RTP and SRTP streams for example), sess.med (which may be null); 3. A control component in the media plane (RTCP messages for example), sess.medc (which may be null). An IS is expected to have non-null sess.sig and sess.med components. The use of control protocols in the media component is media dependent, thus the expected presence or absence of sess.medc is media dependent and test-case dependent. An NS is expected to have a non-null sess.sig component, but null sess.med and sess.medc components. Packets in the Signaling Plane and Media Plane will be handled by different processes within the DUT. They will take different paths within a SUT. These different processes and paths may produce variations in performance. The terminology and benchmarks defined in this document and the methodology for their use are designed to enable us to compare performance of the DUT/SUT with reference to the type of SIP-supported application it is handling. Poretsky, et al. Expires January 13, 2011 [Page 12] Internet-Draft SIP Benchmarking Terminology July 2010 Note that one or more sessions can simultaneously exist between any participants. This can be the case, for example, when the EA sets up both an IM and a voice call through the DUT/SUT. These sessions are represented as an array session[x]. Sessions will be represented as a vector array with three components, as follows: session-> session[x].sig, the signaling component session[x].medc, the media control component (e.g. RTCP) session[x].med[y], an array of associated media streams (e.g. RTP, SRTP, RTSP, MSRP). This media component may consist of zero or more media streams. Figure 7 models the vectors of the session. Measurement Units: N/A. Issues: None. See Also: Media Plane Signaling Plane Associated Media Invite-initiated Session (IS) Non-invite-initiated Session (NS) Poretsky, et al. Expires January 13, 2011 [Page 13] Internet-Draft SIP Benchmarking Terminology July 2010 |\ | | \ sess.sig| | \ | | \ | o | / | / | | / | / | | / | / | | / | / | sess.medc |/_____________________ / / / | / / sess.med / | /_ _ _ _ _ _ _ _/ / / / / Figure 7: Application or session components 3.1.2. Signaling Plane Definition: The control plane in which SIP messages [RFC3261] are exchanged between SIP Agents [RFC3261] to establish a connection for media exchange. Discussion: SIP messages are used to establish sessions in several ways: directly between two User Agents [RFC3261], through a Proxy Server [RFC3261], or through a series of Proxy Servers. The Signaling Plane MUST include the Session Description Protocol (SDP). The Signaling Plane for a single Session is represented by session.sig. Poretsky, et al. Expires January 13, 2011 [Page 14] Internet-Draft SIP Benchmarking Terminology July 2010 Measurement Units: N/A. Issues: None. See Also: Media Plane Emulated Agents 3.1.3. Media Plane Definition: The data plane in which one or more media streams and their associated media control protocols are exchanged after a media connection has been created by the exchange of signaling messages in the Signaling Plane. Discussion: Media may also be known as the "bearer channel". The Media Plane MUST include the media control protocol, if one is used, and the media stream(s). Examples of media are audio, video, whiteboard, and instant messaging service. The media stream is described in the SDP of the Signaling Plane. The media for a single Session is represented by session.med. The media control protocol is represented by session.medc. Measurement Units: N/A. Issues: None. See Also: Signaling Plane 3.1.4. Associated Media Definition: Media that corresponds to an 'm' line in the SDP payload of the Signaling Plane. Discussion: Poretsky, et al. Expires January 13, 2011 [Page 15] Internet-Draft SIP Benchmarking Terminology July 2010 Any media protocol MAY be used. For any session's signaling component, represented as session.sig, there may be one or multiple associated media streams which are represented be a vector array session.med[y], which is referred to as the Associated Media. Measurement Units: N/A. Issues: None. 3.1.5. Overload Definition: Overload is defined as the state where a SIP server does not have sufficient resources to process all incoming SIP messages [I-D.ietf-sipping-overload-reqs]. The distinction between an overload condition and other failure scenarios is outside the scope of this document which is blackbox testing. Discussion: Under overload conditions, all or a percentage of Session Attempts will fail due to lack of resources. SIP server resources may include CPU processing capacity, network bandwidth, input/output queues, or disk resources. Any combination of resources may be fully utilized when a SIP server (the DUT/SUT) is in the overload condition. For proxy-only type of devices, overload issues will be dominated by the number of signaling messages they can handle in a unit time before their throughput starts to drop. For UA-type of network devices (e.g., gateways), overload must necessarily include both the signaling traffic and media streams. It is expected that the amount of signaling that a UA can handle is inversely proportional to the amount of media streams currently handled by that UA. Measurement Units: N/A. Issues: The issue of overload in SIP networks is currently a topic of discussion in the SIPPING WG. The normal response to an overload stimulus -- sending a 503 response -- is considered inadequate and new response codes and behaviors may be specified in the future. From the perspective of this document, all these responses will be considered to be failures. There is thus no dependency between this document and the ongoing work on the treatment of overload Poretsky, et al. Expires January 13, 2011 [Page 16] Internet-Draft SIP Benchmarking Terminology July 2010 failure. 3.1.6. Session Attempt Definition: A SIP Session for which the Emulated Agent has sent the SIP INVITE or SUBSCRIBE NOTIFY and has not yet received a message response from the DUT/SUT. Discussion: The attempted session may be an IS or an NS. The Session Attempt includes SIP INVITEs and SUBSCRIBE/NOTIFY messages. It also includes all INVITEs that are rejected for lack of authentication information. Measurement Units: N/A. Issues: None. See Also: Session Session Attempt Rate Invite-initiated Session Non-Invite initiated Session 3.1.7. Established Session Definition: A SIP session for which the Emulated Agent acting as the UE/UA has received a 200 OK message from the DUT/SUT. Discussion: An Established Session MAY be type INVITE-Session (IS) or Non- INVITE Session (NS). Measurement Units: N/A. Issues: None. See Also: Poretsky, et al. Expires January 13, 2011 [Page 17] Internet-Draft SIP Benchmarking Terminology July 2010 Invite-initiated Session Session Attempting State Session Disconnecting State 3.1.8. Invite-initiated Session (IS) Definition: A Session that is created by an exchange of messages in the Signaling Plane, the first of which is a SIP INVITE request. Discussion: An IS is identified by the Call-ID, To-tag, and From-tag of the SIP message that establishes the session. These three fields are used to identify a SIP Dialog (RFC3261 [RFC3261]). An IS may have Associated Media description in the SDP body. An IS may have multiple Associated Media streams. The inclusion of media is test case dependent. An IS is successfully established if the following two conditions are met: 1. Sess.sig is established by the end of Establishment Threshold Time (c.f. Section 3.3.3), and 2. If a media session is described in the SDP body of the signaling message, then the media session is established by the end of Establishment Threshold Time (c.f. Section 3.3.3). Measurement Units: N/A. Issues: None. See Also: Session Non-Invite initiated Session Associated Media 3.1.9. Non-INVITE-initiated Session (NS) Definition: A session that is created by an exchange of messages in the Signaling Plane that does not include an initial SIP INVITE message. Discussion: An NS is successfully established if the Session Attempt via a non- INVITE request results in the EA receiving a 2xx reply from the DUT/SUT before the expiration of the Establishment Threshold timer (c.f., Section 3.3.3). An example of a NS is a session created by the SUBSCRIBE request. Poretsky, et al. Expires January 13, 2011 [Page 18] Internet-Draft SIP Benchmarking Terminology July 2010 Measurement Units: N/A. Issues: None. See Also: Session Invite-initiated Session 3.1.10. Session Attempt Failure Definition: A session attempt that does not result in an Established Session. Discussion: The session attempt failure may be indicated by the following observations at the Emulated Agent: 1. Receipt of a SIP 4xx, 5xx, or 6xx class response to a Session Attempt. 2. The lack of any received SIP response to a Session Attempt within the Establishment Threshold Time (c.f. Section 3.3.3). Measurement Units: N/A. Issues: None. See Also: Session Attempt 3.1.11. Standing Sessions Count Definition: The number of Sessions currently established on the DUT/SUT at any instant. Discussion: The number of Standing Sessions is influenced by the Session Duration and the Session Attempt Rate. Benchmarks MUST be reported with the maximum and average Standing Sessions for the DUT/SUT. In order to determine the maximum and average Standing Sessions on the DUT/SUT for the duration of the test it is necessary to make periodic measurements of the number of Standing Sessions on the DUT/SUT. The recommended value for the Poretsky, et al. Expires January 13, 2011 [Page 19] Internet-Draft SIP Benchmarking Terminology July 2010 measurement period is 1 second. Measurement Units: Number of sessions Issues: None. See Also: Session Duration Session Attempt Rate Session Attempt Rate 3.2. Test Components 3.2.1. Emulated Agent Definition: A device in test topology that initiates/responds to SIP messages as one or more session endpoints and, wherever applicable, sources/receives Associated Media for Established Sessions. Discussion: The Emulated Agent functions in the signaling and media planes. The Tester may act as multiple Emulated Agents. Measurement Units: N/A Issues: None. See Also: Media Plane Signaling Plane Established Session Associated Media 3.2.2. Signaling Server Definition: Device in test topology that acts to create sessions between Emulated Agents in the media plane. This device is either a DUT or component of a SUT. Poretsky, et al. Expires January 13, 2011 [Page 20] Internet-Draft SIP Benchmarking Terminology July 2010 Discussion: The DUT MUST be a RFC 3261 capable network equipment such as a Registrar, Redirect Server, User Agent Server, Stateless Proxy, or Stateful Proxy. A DUT MAY also include B2BUA or SBC. Measurement Units: NA Issues: None. See Also: Signaling Plane 3.2.3. SIP-Aware Stateful Firewall Definition: Device in test topology that provides Denial-of-Service (DoS) Protection to the Signaling and Media Planes for the Emulated Agents and Signaling Server Discussion: The SIP-Aware Stateful Firewall MAY be an internal component or function of the Session Server. The SIP-Aware Stateful Firewall MAY be a standalone device. If it is a standalone device it MUST be paired with a Signaling Server. If it is a standalone device it MUST be benchmarked as part of a SUT. SIP-Aware Stateful Firewalls MAY include Network Address Translation (NAT) functionality. Ideally, the inclusion of the SIP-Aware Stateful Firewall as a SUT has no degradation to the measured performance benchmarks. Measurement Units: N/A Issues: None. See Also: 3.2.4. SIP Transport Protocol Definition: Poretsky, et al. Expires January 13, 2011 [Page 21] Internet-Draft SIP Benchmarking Terminology July 2010 The protocol used for transport of the Signaling Plane messages. Discussion: Performance benchmarks may vary for the same SIP networking device depending upon whether TCP, UDP, TLS, SCTP, or another transport layer protocol is used. For this reason it MAY be necessary to measure the SIP Performance Benchmarks using these various transport protocols. Performance Benchmarks MUST report the SIP Transport Protocol used to obtain the benchmark results. Measurement Units: TCP,UDP, SCTP, TLS over TCP, TLS over UDP, or TLS over SCTP Issues: None. See Also: 3.3. Test Setup Parameters 3.3.1. Session Attempt Rate Definition: Configuration of the Emulated Agent for the number of sessions that the Emulated Agent attempts to establish with the DUT/SUT over a specified time interval. Discussion: The Session Attempt Rate can cause variation in performance benchmark measurements. Since this is the number of sessions configured on the Tester, some sessions may not be successfully established on the DUT. A session may be either an IS or an NS. Measurement Units: Session attempts per second Issues: None. See Also: Session Session Attempt 3.3.2. IS Media Attempt Rate Poretsky, et al. Expires January 13, 2011 [Page 22] Internet-Draft SIP Benchmarking Terminology July 2010 Definition: Configuration on the Emulated Agent for number of ISs with Associated Media to be established at the DUT per continuous one- second time intervals. Discussion: Note that a Media Session MUST be associated with an IS. In this document we assume that there is a one to one correspondence between IS session attempts and Media Session attempts. By including this definition we leave open the possibility that there may be an IS that does not include a media description. Also note that the IS Media Attempt Rate defines the number of media sessions we are trying to create, not the number of media sessions that are actually created. Variations in the Media Session Attempt Rate might cause variations in performance benchmark measurements. Some attempts might not result in successful sessions established on the DUT. Measurement Units: session attempts per second (saps) Issues: None. See Also: IS 3.3.3. Establishment Threshold Time Definition: Configuration of the Emulated Agent for representing the amount of time that an Emulated Agent will wait before declaring a Session Attempt Failure. Discussion: This time duration is test dependent. It is RECOMMENDED that the Establishment Threshold Time value be set to Timer B (for ISs) or Timer F (for NSs) as specified in RFC 3261, Table 4 [RFC3261]. Following the default value of T1 (500ms) specified in the table and a constant multiplier of 64 gives a value of 32 seconds for this timer (i.e., 500ms * 64 = 32s). Measurement Units: Poretsky, et al. Expires January 13, 2011 [Page 23] Internet-Draft SIP Benchmarking Terminology July 2010 seconds Issues: None. See Also: session establishment failure 3.3.4. Session Duration Definition: Configuration of the Emulated Agent that represents the amount of time that the SIP dialog is intended to exist between the two EAs associated with the test. Discussion: The time at which the BYE is sent will control the Session Duration Normally the Session Duration will be the same as the Media Session Hold Time. However, it is possible that the dialog established between the two EAs can support different media sessions at different points in time. Providing both parameters allows the testing agency to explore this possibility. Measurement Units: seconds Issues: None. See Also: Media Session Hold Time 3.3.5. Media Packet Size Definition: Configuration on the Emulated Agent for a fixed size of packets used for media streams. Discussion: For a single benchmark test, all sessions use the same size packet for media streams. The size of packets can cause variation in performance benchmark measurements. Poretsky, et al. Expires January 13, 2011 [Page 24] Internet-Draft SIP Benchmarking Terminology July 2010 Measurement Units: bytes Issues: None. See Also: 3.3.6. Media Offered Load Definition: Configuration of the Emulated Agent for the constant rate of Associated Media traffic offered by the Emulated Agent to the DUT/ SUT for one or more Established Sessions of type IS. Discussion: The Media Offered Load to be used for a test MUST be reported with three components: 1. per Associated Media stream; 2. per IS; 3. aggregate. For a single benchmark test, all sessions use the same Media Offered Load per Media Stream. There may be multiple Associated Media streams per IS. The aggregate is the sum of all Associated Media for all IS. Measurement Units: packets per second (pps) Issues: None. See Also: Established Session Invite Initiated Session Associated Media 3.3.7. Media Session Hold Time Definition: Parameter configured at the Emulated Agent, that represents the amount of time that the Associated Media for an Established Session of type IS will last. Poretsky, et al. Expires January 13, 2011 [Page 25] Internet-Draft SIP Benchmarking Terminology July 2010 Discussion: The Associated Media streams may be bi-directional or uni- directional as indicated in the test methodology. Normally the Media Session Hold Time will be the same as the Session Duration. However, it is possible that the dialog established between the two EAs can support different media sessions at different points in time. Providing both parameters allows the testing agency to explore this possibility. Measurement Units: seconds Issues: None. See Also: Associated Media Established Session Invite-initiated Session (IS) 3.3.8. Loop Detection Option Definition: An option that causes a Proxy to check for loops in the routing of a SIP request before forwarding the request. Discussion: This is an optional process that a SIP proxy may employ; the process is described under Proxy Behavior in RFC 3261 in Section 16.3 Request Validation and that section also contains suggestions as to how the option could be implemented. Any procedure to detect loops will use processor cycles and hence could impact the performance of a proxy. Measurement Units: NA Issues: None. See Also: 3.3.9. Forking Option Poretsky, et al. Expires January 13, 2011 [Page 26] Internet-Draft SIP Benchmarking Terminology July 2010 Definition: An option that enables a Proxy to fork requests to more than one destination. Discussion: This is an process that a SIP proxy may employ to find the UAS. The option is described under Proxy Behavior in RFC 3261 in Section 16.1. A proxy that uses forking must maintain state information and this will use processor cycles and memory. Thus the use of this option could impact the performance of a proxy and different implementations could produce different impacts. SIP supports serial or parallel forking. When performing a test, the type of forking mode MUST be indicated. Measurement Units: The number of endpoints that will receive the forked invitation. A value of 1 indicates that the request is destined to only one endpoint, a value of 2 indicates that the request is forked to two endpoints, and so on. This is an integer value ranging between 1 and N inclusive, where N is the maximum number of endpoints to which the invitation is sent. Type of forking used, namely parallel or serial. Issues: None. See Also: 3.4. Benchmarks 3.4.1. Registration Rate Definition: The maximum number of registrations that can be successfully completed by the DUT/SUT in a given time period. Discussion: This benchmark is obtained with zero failure in which 100% of the registrations attempted by the Emulated Agent are successfully completed by the DUT/SUT. The maximum value is obtained by testing to failure. This means that the registration rate provisioned on the EA is raised progressively until a registration attempt failure is observed. Poretsky, et al. Expires January 13, 2011 [Page 27] Internet-Draft SIP Benchmarking Terminology July 2010 Measurement Units: registrations per second (rps) Issues: None. See Also: 3.4.2. Session Establishment Rate Definition: The average maximum rate at which the DUT/SUT can successfully establish sessions. Discussion: This metric is an average of maxima. Each maximum is measured in a separate sample. The Session Establishment Rate is the average of the maximas established in each individual sample. In each sample, the maximum in question is the number of sessions successfully established in continuous one-second intervals with prior sessions remaining active. This maximum is designated in the equation below as "rate in sample i". The session establishment rate is calculated using the following equation (n = number of samples): n -- \ rate at sample i / -- i = 1 --------------------- (n) In each sample, the maximum is obtained by testing to failure. With zero failure, 100% of the sessions introduced by the Emulated Agent are successfully established. The maximum value is obtained by testing to failure. This means that the Session Attempt Rate provisioned on the EA is raised progressively until a Session Attempt Failure is observed. The maximum rate is the rate acheived in the interval prior to the interval in which the failure is observed. Sessions may be IS or NS or a a mix of both and will be defined in the particular test. Poretsky, et al. Expires January 13, 2011 [Page 28] Internet-Draft SIP Benchmarking Terminology July 2010 Measurement Units: sessions per second (sps) Issues: None. See Also: Invite-initiated Sessions Non-INVITE initiated Sessions Session Attempt Rate 3.4.3. Session Capacity Definition: The maximum value of Standing Sessions Count achieved by the DUT/ SUT during the process of steadily increasing the number of Session Attempts per unit time, before the first Session Attempt Failure occurs. Discussion: When benchmarking Session Capacity for sessions with media it is required that these sessions be permanently established (i.e., they remain active for the duration of the test.) This can be achieved by causing the EA not to send a BYE for the duration of the testing. In the signaling plane, this requirement means that the dialog lasts as long as the test lasts. In order to test Session Capacity for sessions with media, the Media Session Hold Time MUST be set to infinity so that sessions remain established for the duration of the test. If the DUT/SUT is dialog-stateful, then we expect its performance will be impacted by setting Media Session Hold Time to infinity, since the DUT/SUT will need to allocate resources to process and store the state information. The Session Capacity must be reported with the Session Attempt Rate used to reach the maximum. Since Session Attempt Rate is a zero-loss measurement, there must be zero failures to achieve the Session Capacity. The maximum is indicated at the Emulated Agent by arrival of a SIP 4xx, 5xx, or 6xx response from the DUT/SUT. Sessions may be IS or NS. Measurement Units: sessions Issues: None. Poretsky, et al. Expires January 13, 2011 [Page 29] Internet-Draft SIP Benchmarking Terminology July 2010 See Also: Established Session Session Attempt Rate Session Attempt Failure 3.4.4. Session Overload Capacity Definition: The maximum number of Established Sessions that can exist simultaneously on the DUT/SUT until it stops responding to Session Attempts. Discussion: Session Overload Capacity is measured after the Session Capacity is measured. The Session Overload Capacity is greater than or equal to the Session Capacity. When benchmarking Session Overload Capacity, continue to offer Session Attempts to the DUT/SUT after the first Session Attempt Failure occurs and measure Established Sessions until no there is no SIP message response for the duration of the Establishment Threshold. It is worth noting that the Session Establishment Performance is expected to decrease after the first Session Attempt Failure occurs. Units: Sessions Issues: None. See Also: Overload Session Capacity Session Attempt Failure 3.4.5. Session Establishment Performance Definition: The percent of Session Attempts that become Established Sessions over the duration of a benchmarking test. Discussion: Session Establishment Performance is a benchmark to indicate session establishment success for the duration of a test. The duration for measuring this benchmark is to be specified in the Methodology. The Session Duration SHOULD be configured to infinity so that sessions remain established for the entire test duration. Poretsky, et al. Expires January 13, 2011 [Page 30] Internet-Draft SIP Benchmarking Terminology July 2010 Session Establishment Performance is calculated as shown in the following equation: Session Establishment = Total Established Sessions Performance -------------------------- Total Session Attempts Session Establishment Performance may be monitored real-time during a benchmarking test. However, the reporting benchmark MUST be based on the total measurements for the test duration. Measurement Units: Percent (%) Issues: None. See Also: Established Session Session Attempt 3.4.6. Session Attempt Delay Definition: The average time measured at the Emulated Agent for a Session Attempt to result in an Established Session. Discussion: Time is measured from when the EA sends the first INVITE for the call-ID in the case of an IS. Time is measured from when the EA sends the first non-INVITE message in the case of an NS. Session Attempt Delay MUST be measured for every established session to calculate the average. Session Attempt Delay MUST be measured at the Maximum Session Establishment Rate. Measurement Units: Seconds Issues: None. See Also: Maximum Session Establishment Rate 3.4.7. IM Rate Poretsky, et al. Expires January 13, 2011 [Page 31] Internet-Draft SIP Benchmarking Terminology July 2010 Definition: Maximum number of IM messages completed by the DUT/SUT. Discussion: For a UAS, the definition of success is the receipt of an IM request and the subsequent sending of a final response. For a UAC, the definition of success is the sending of an IM request and the receipt of a final response to it. For a proxy, the definition of success is as follows: A. the number of IM requests it receives from the upstream client MUST be equal to the number of IM requests it sent to the downstream server; and B. the number of IM responses it receives from the downstream server MUST be equal to the number of IM requests sent to the downstream server; and C. the number of IM responses it sends to the upstream client MUST be equal to the number of IM requests it received from the upstream client. Measurement Units: IM messages per second Issues: None. See Also: 4. IANA Considerations This document requires no IANA considerations. 5. Security Considerations Documents of this type do not directly affect the security of Internet or corporate networks as long as benchmarking is not performed on devices or systems connected to production networks. Security threats and how to counter these in SIP and the media layer is discussed in RFC3261 [RFC3261], RFC 3550 [RFC3550], RFC3711 [RFC3711] and various other drafts. This document attempts to formalize a set of common terminology for benchmarking SIP networks. Packets with unintended and/or unauthorized DSCP or IP precedence values may present security issues. Determining the security consequences of such packets is out of scope for this document. Poretsky, et al. Expires January 13, 2011 [Page 32] Internet-Draft SIP Benchmarking Terminology July 2010 6. Acknowledgments The authors would like to thank Keith Drage, Cullen Jennings, Daryl Malas, Al Morton, and Henning Schulzrinne for invaluable contributions to this document. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for Network Interconnect Devices", RFC 2544, March 1999. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [I-D.ietf-bmwg-sip-bench-meth] Poretsky, S., Gurbani, V., and C. Davids, "Methodology for Benchmarking SIP Networking Devices", draft-ietf-bmwg-sip-bench-meth-02 (work in progress), July 2010. 7.2. Informational References [RFC2285] Mandeville, R., "Benchmarking Terminology for LAN Switching Devices", RFC 2285, February 1998. [RFC1242] Bradner, S., "Benchmarking terminology for network interconnection devices", RFC 1242, July 1991. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [I-D.ietf-sipping-overload-design] Hilt, V., Noel, E., Shen, C., and A. Abdelal, "Design Considerations for Session Initiation Protocol (SIP) Overload Control", draft-ietf-sipping-overload-design-02 Poretsky, et al. Expires January 13, 2011 [Page 33] Internet-Draft SIP Benchmarking Terminology July 2010 (work in progress), July 2009. [I-D.ietf-sipping-overload-reqs] Rosenberg, J., "Requirements for Management of Overload in the Session Initiation Protocol", draft-ietf-sipping-overload-reqs-05 (work in progress), July 2008. Appendix A. White Box Benchmarking Terminology Session Attempt Arrival Rate Definition: The number of Session Attempts received at the DUT/SUT over a specified time period. Discussion: Sessions Attempts are indicated by the arrival of SIP INVITES OR SUBSCRIBE NOTIFY messages. Session Attempts Arrival Rate distribution can be any model selected by the user of this document. It is important when comparing benchmarks of different devices that same distribution model was used. Common distributions are expected to be Uniform and Poisson. Measurement Units: Session attempts/sec Issues: None. See Also: Session Attempt Authors' Addresses Scott Poretsky Allot Communications 300 TradeCenter, Suite 4680 Woburn, MA 08101 USA Phone: +1 508 309 2179 Email: sporetsky@allot.com Poretsky, et al. Expires January 13, 2011 [Page 34] Internet-Draft SIP Benchmarking Terminology July 2010 Vijay K. Gurbani Bell Laboratories, Alcatel-Lucent 1960 Lucent Lane Rm 9C-533 Naperville, IL 60566 USA Phone: +1 630 224 0216 Email: vkg@alcatel-lucent.com Carol Davids Illinois Institute of Technology 201 East Loop Road Wheaton, IL 60187 USA Phone: +1 630 682 6024 Email: davids@iit.edu Poretsky, et al. Expires January 13, 2011 [Page 35]