idnits 2.17.1 

draft-ietf-bmwg-benchres-method-00.txt:
  ** The Abstract section seems to be numbered


  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Looks like you're using RFC 2026 boilerplate.  This must be updated to
     follow RFC 3978/3979, as updated by RFC 4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (July 2001) is 8319 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. '1'

  -- Possible downref: Non-RFC (?) normative reference: ref. '3'

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  ** Downref: Normative reference to an Historic RFC: RFC 1819 (ref. '5')

  -- Possible downref: Non-RFC (?) normative reference: ref. '6'

  -- Possible downref: Non-RFC (?) normative reference: ref. '7'

  -- Possible downref: Non-RFC (?) normative reference: ref. '8'

  -- Possible downref: Non-RFC (?) normative reference: ref. '9'

  ** Downref: Normative reference to an Informational RFC: RFC 2544 (ref.
     '10')

  -- Possible downref: Non-RFC (?) normative reference: ref. '11'


     Summary: 8 errors (**), 0 flaws (~~), 1 warning (==), 10 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                  Gabor Feher, BUTE
3	INTERNET-DRAFT                                     Istvan Cselenyi, TRAB
4	Expiration Date: January 2002                          Andras Korn, BUTE

6	                                                               July 2001

8	   Benchmarking Methodology for Routers Supporting Resource Reservation
9	                  <draft-ietf-bmwg-benchres-method-00.txt>

11	1. Status of this Memo

13	   This document is an Internet-Draft and is in full conformance with
14	   all provisions of Section 10 of RFC2026.

16	   Internet-Drafts are working documents of the Internet Engineering
17	   Task Force (IETF), its areas, and its working groups. Note that other
18	   groups may also distribute working documents as Internet-
19	   Drafts.

21	   Internet-Drafts are draft documents valid for a maximum of six months
22	   and may be updated, replaced, or obsoleted by other documents at any
23	   time.  It is inappropriate to use Internet-Drafts as reference
24	   material or to cite them other than as "work in progress."

26	   The list of current Internet-Drafts can be accessed at
27	   http://www.ietf.org/ietf/1id-abstracts.txt

29	   The list of Internet-Draft shadow directories can be accessed at
30	   http://www.ietf.org/shadow.html

32	   This memo provides information for the Internet community. This memo
33	   does not specify an Internet standard of any kind. Distribution of
34	   this memo is unlimited.

36	2. Table of contents

38	   1. Status of this Memo.............................................1
39	   2. Table of contents...............................................1
40	   3. Abstract........................................................2
41	   4. Introduction....................................................2
42	   5. Existing definitions............................................2
43	   6. Methodology.....................................................3
44	      6.1 Evaluating the Results......................................3
45	      6.2 Test Setup..................................................3
46	         6.2.1 Testing Unicast Resource Reservation Sessions..........5
47	         6.2.2 Testing Multicast Resource Reservation Sessions........5
48	         6.2.3 Signaling Flow.........................................6
49	         6.2.4 Signaling Message Verification.........................6
50	      6.3 Scalability Tests...........................................6
51	         6.3.1 Maximum Signaling Message Burst Size...................7
52	         6.3.2 Maximum Signaling Load.................................8
53	         6.3.3 Maximum Session Load...................................9

55	      6.4 Benchmarking Tests.........................................11
56	         6.4.1 Performing the Benchmarking Measurements..............12
57	   7. Acknowledgement................................................14
58	   8. References.....................................................14
59	   9. Authors' Addresses:............................................15

61	3. Abstract

63	   The purpose of this document is to define benchmarking methodology
64	   measuring performance metrics related to IP routers supporting
65	   resource reservation signaling. Apart from the definition and
66	   discussion of these tests, this document also specifies formats for
67	   reporting the benchmarking results.

69	4. Introduction

71	   The IntServ over DiffServ framework [1] outlines a heterogeneous
72	   Quality of Service (QoS) architecture for multi domain Internet
73	   services. Signaling based resource reservation (e.g. via RSVP [2]) is
74	   an integral part of that model. While this significantly lightens the
75	   load on most of the core routers, the performance of border routers
76	   that handle the QoS signaling is still crucial. Therefore network
77	   operators, who are planning to deploy this model, shall scrutinize
78	   the scalability limitations in reservation capable routers and the
79	   impact of signaling on the forwarding performance of the routers.

81	   An objective way for quantifying the scalability constraints of QoS
82	   signaling is to perform measurements on routers that are capable of
83	   resource reservation. This document defines a specific set of tests
84	   that vendors or network operators can use to measure and report the
85	   signaling performance characteristics of router devices that support
86	   resource reservation protocols. The results of these tests will
87	   provide comparable data for different products supporting the
88	   decision process before purchase. Moreover, these measurements
89	   provide input characteristics for the dimensioning of a network in
90	   which resources are provisioned dynamically by signaling. Finally,
91	   these tests are applicable for characterizing the impact of control
92	   plane signaling on the forwarding performance of routers.

94	   This benchmarking methodology document is based on the knowledge
95	   gained by examination of (and experimentation with) several very
96	   different resource reservation protocols: RSVP [2], Boomerang [3],
97	   YESSIR [4], ST2+ [5], SDP [6], Ticket [7] and Load Control [8].
98	   Nevertheless, this document aspires to compose terms that are valid
99	   in general and not restricted to these protocols.

101	5. Existing definitions

103	   A previous document, "Benchmarking Terminology for Routers Supporting
104	   Resource Reservation" [9] defines performance metrics and other terms
105	   that are used in this document. To understand the test methodologies
106	   defined here, that terminology document must be consulted first.

108	6. Methodology

110	6.1 Evaluating the Results

112	   RFC2544 [10] describes considerations regarding the implementation
113	   and evaluation of benchmarking tests, which are certainly valid for
114	   this test suite also. Namely, the authors intended to create a system
115	   from commercially available measurement instruments and devices for
116	   the sake of easy implementation of the described tests. Simple test
117	   scripts and benchmarking utilities for Linux are publicly available
118	   from the Boomerang homepage [11].

120	   During the benchmarking tests, care should be taken for selecting the
121	   proper set of tests for a specific router device, since not all of
122	   the tests applicable to a particular Devices Under Test (DUT).

124	   Finally, the selection of the relevant measurement results and their
125	   evaluation requires experience and it must be done with an
126	   understanding of generally accepted testing practices regarding
127	   repeatability, variance and statistical significance of small numbers
128	   of trials.

130	6.2 Test Setup

132	   The ideal way to perform the measurements is to connect a passive
133	   tester device (or, in short, passive tester) to all network
134	   interfaces of the DUT, enabling the tester to capture all signaling
135	   and data traffic that enters into or leaves from the DUT. Based on
136	   the captured data packets and signaling messages along with the
137	   proper time stamps the investigated performance metrics can be
138	   computed. In addition to the passive tester there are signaling and
139	   data traffic end-points that are responsible to generate and
140	   terminate the required signaling and data flows going through the
141	   DUT. These flows are used to generate router load in the DUT and the
142	   measurements are also performed using them. This scenario is
143	   illustrated in Figure 1.

145	   Probably, the best solution is to connect the tester via network
146	   traffic repeater devices (e.g. hubs) to the network interfaces of the
147	   DUT. These repeaters cause very small delay in the ongoing packets,
148	   and therefore their effect is insignificant in the measurements.

150	                              +------------+
151	                              |            |
152	                         +--->|  Passive   |<---+
153	                         |    |   tester   |    |
154	                         |    +------------+    |
155	                         |                      |
156	    +---------------+    |    +------------+    |    +---------------+
157	    | Signaling and |    |    |            |    |    | Signaling and |
158	    |  data traffic |----+--->|    DUT     |----+--->|  data traffic |
159	    |   end-point   |         |            |         |   end-point   |
160	    +---------------+         +------------+         +---------------+
161	                                 Figure 1

163	   Moreover, tester devices should not have to be passive during the
164	   measurement, rather they can generate the signaling and data flows as
165	   well. This way the signaling and data traffic end-point and the
166	   traffic capturing device can be combined into a single tester device,
167	   called active tester. In this case the signaling and traffic flow,
168	   the initiator tester device is the driver of the input network
169	   interfaces of the DUT, while the second one, the signaling and
170	   traffic terminator tester device is connected to the output network
171	   interfaces of the tested device and captures signaling messages and
172	   data packets leaving the DUT. Figure 2 shows this scenario.

174	        +---------------+      +-----------+      +---------------+
175	        |               |      |           |      |               |
176	        | Active tester |----->|    DUT    |----->| Active tester |
177	        |               |      |           |      |               |
178	        +---------------+      +-----------+      +---------------+
179	                                 Figure 2

181	   In this scenario, the performance metrics are calculated from the log
182	   of initiated packets and their initiation time in the first active
183	   tester device and the log of captured packets and their capture time
184	   in the second active tester. Obviously, the measurements do worth
185	   nothing if the two testers are not clock-synchronized, since the
186	   difference of the packet initiation times and packet capture times is
187	   biased by the clock skew of the testers. For this reason, the clock
188	   of the testers must be synchronized before the measurements are
189	   performed. Nevertheless, scalability tests do not depend on the clock
190	   synchronization and therefore they can be performed without any
191	   preparation on the testers.

193	   It is also possible to use only one active tester, which is the
194	   signaling and traffic flow initiator and terminator device in the
195	   same time. Although, this way the clock synchronization problem can
196	   be avoided, but the tester should be powerful enough to generate and
197	   capture all the test flows required by the measurements.

199	   During the benchmarking tests, if the clocks are properly
200	   synchronized when it is necessary, each test configuration is
201	   suitable for the measurements. For this reason, we have not defined
202	   different test methodologies for each test scenarios. Instead, we use
203	   terms "initiator tester" and "terminator tester", which have their
204	   equivalent appliances in each test configuration.

206	   Initiator tester is the device that generates the signaling and data
207	   flows, while terminator tester is the device that terminates the
208	   signaling and data flow. In addition, the performance metrics
209	   measurement is also performed by the tester(s). Evidently, in the
210	   case of the configuration, where there is only one active tester, the
211	   initiator tester and the terminator tester is the same appliance.

213	6.2.1 Testing Unicast Resource Reservation Sessions

215	   Testing unicast resource reservation sessions requires that the
216	   initial tester is connected to one of the network interfaces of the
217	   DUT and the terminator tester is connected to a different network
218	   interface of the tested device.

220	   During the benchmarking tests, the initiator tester must use unicast
221	   addresses for data traffic flows and the resource reservation
222	   requests must refer to unicast resource reservation sessions. In
223	   order to be able to compute the performance metrics, all data packets
224	   and signaling messages transmitted by the DUT must be perceivable for
225	   the tester.

227	6.2.2 Testing Multicast Resource Reservation Sessions

229	   Testing multicast resource reservation sessions requires the initial
230	   tester to be connected to more than one network interfaces of the
231	   DUT, while the terminator tester is connected to more than one
232	   network interfaces of the tested device whose interfaces are
233	   different from the previous ones.

235	   Furthermore, during the measurements, the data traffic flows
236	   originated from the initiator tester must be sent to multicast
237	   addresses and the reservation sessions must refer to one or more of
238	   the multicast flows. Of course, just like in the case of unicast
239	   resource reservation sessions, all data packets and signaling
240	   messages transmitted by the DUT must be perceivable for the tester.

242	   Since there are protocols supporting more than one resource
243	   reservation schemes for multicast reservations (e.g. RSVP SE/FF/WF);
244	   and in a view of the fact that the number of incoming and outgoing
245	   network interface combinations of the DUT might be almost countless;
246	   the benchmarking tests, described here, do not require measuring all
247	   imaginable setup situation. Still, routers supporting multicast
248	   resource reservations must be tested against the performance metrics
249	   and scalability limits on at least one multicast scenario. Moreover,
250	   there is a suggested multicast test configuration that consists of a
251	   multicast group with four signaling end-points including one traffic
252	   originator and three traffic destinations residing on different
253	   network interfaces of the DUT.

255	   The benchmarking test reports taken on DUTs supporting multicast
256	   resource reservation sessions always have to contain the proper
257	   multicast scenario description.

259	6.2.3 Signaling Flow

261	   This document often refers to signaling flows. A signaling flow is
262	   sequence of signaling messages.

264	   In the case of the measurements defined in this document there are
265	   two types of signaling flows: First, there is a signaling flow that
266	   is constructed from signaling primitives of the same type. Second,
267	   there is a signaling flow that is constructed from signaling
268	   primitive pairs. Signaling primitive pairs are needed in situations
269	   where one of the signaling primitive alters the states of the DUT,
270	   but the test demand constant DUT conditions during the test. In this
271	   case, to avoid the effect of the state modification, the second
272	   signaling primitive should restore the states modification in the
273	   DUT. A typical example for the second type of signaling flow is a
274	   flow of alternating reservation set-up and tear-down messages.

276	   Moreover, the signaling messages should be equally spaced on the time
277	   scale when they are forming a signaling flow. This is mandatory in
278	   order to obtain measurements that can be repeated later. Since modern
279	   resource reservation protocols are designed to avoid message
280	   synchronization, thus, equally spaced signaling messages are not
281	   unrealistic in the real life.

283	   The signaling flow is characterized with the type of the signaling
284	   primitive or the pair of signaling primitives along with the period
285	   time of the signaling messages.

287	6.2.4 Signaling Message Verification

289	   Although, the conformance testing of the resource reservation is
290	   beyond the scope of this document, defective signaling message
291	   processing can be expected in an overloaded router. Therefore, during
292	   the benchmarking tests, when signaling messages are processed in the
293	   DUT, the terminator device must validate the messages whether they
294	   are fully conform to the message format of the resource reservation
295	   protocol specification and whether they are the expected signaling
296	   messages at the given situation. If any of the messages are against
297	   the protocol specification then the benchmarking test report must
298	   indicate the situation of the failure.

300	   Verifying data traffic packets are not required, since the signaling
301	   performance benchmarking of reservation capable routers should not
302	   deal with data traffic. For this purpose there are other benchmarking
303	   methodologies that verify data traffic during the measurements, like
304	   the one described in RFC 2544.

306	6.3 Scalability Tests
307	   Scalability tests are defined to explore the scalability limits of a
308	   reservation capable router. This investigation focuses on the
309	   scalability limits related only to signaling message handling and
310	   therefore examination of the data forwarding engine is out of the
311	   scope of this document.

313	6.3.1 Maximum Signaling Message Burst Size

315	   Objective:
316	   Determine the maximum signaling burst size, which is the number of
317	   the signaling messages in a signaling burst that the DUT is able to
318	   handle without signaling loss.

320	   Procedure:
321	   1. Select a signaling primitive or a signaling primitive pair and
322	   construct a signaling flow. The signaling messages should follow each
323	   other back-to-back in the flow and after "n" number of messages the
324	   flow should be terminated. In the first test sequence the number "n"
325	   should be set to one.

327	   Additionally, all the signaling messages in the signaling flow must
328	   conform to the resource reservation protocol definition and must be
329	   parameterized in a way to avoid signaling message processing errors
330	   in the DUT.

332	   2. Send the signaling flow to the DUT and count the signaling
333	   messages received by the terminator tester.

335	   3. When the number of sent signaling messages ("n") equals to the
336	   number of received messages, then the number of messages forming the
337	   signaling flow ("n") should be increased by one; and the test
338	   sequence has to be repeated. However, if the receiver receives less
339	   signaling messages than the number of sent messages, it indicates
340	   that the DUT is beyond its scalability limit. The measured
341	   scalability limit for the maximum signaling message burst size is the
342	   length of the signaling flow in the previous test sequence ("n"-1).

344	   In order to avoid transient test failures, the whole test must be
345	   repeated at least 30 times and the report should indicate the median
346	   of the measured maximum signaling message burst size values as the
347	   result of the test. Among the test runs, the DUT should be reset to
348	   its initial state.

350	   There are signaling primitives, such as signaling messages indicating
351	   errors, which are not suitable for this kind of scalability tests.
352	   However, each signaling primitive suitable for the test should be
353	   investigated.

355	   Reporting format:
356	   The report should indicate the type of the signaling primitive or
357	   signaling primitive pair and the determined maximum signaling message
358	   burst size.

360	   Note:
361	   In the case of routers supporting multicast resource reservation
362	   sessions, the signaling burst can be also constructed by sending
363	   signaling messages to multiple network interfaces of the DUT at the
364	   same time.

366	6.3.2 Maximum Signaling Load

368	   Objective:
369	   Determine the maximum signaling load, which is the maximum number of
370	   signaling messages within a time unit that the DUT is able to handle
371	   without signaling loss.

373	   Procedure:
374	   1. Select a signaling primitive or a signaling primitive pair and
375	   construct a signaling flow. The period of the signaling flow should
376	   be adjusted in a way that exactly "s" signaling messages arrive
377	   within one second. In the first test sequence the number "s" should
378	   be set to one (i.e. 1 message per second).

380	   Additionally, all the signaling messages in the signaling flow must
381	   conform to the resource reservation protocol definition and must be
382	   parameterized in a way to avoid signaling message processing errors
383	   in the DUT.

385	   2. Send the signaling flow to the DUT for at least one minute, and
386	   count the signaling messages received by the terminator tester.

388	   3. When the number of sent signaling messages ("s" times the duration
389	   of the signaling flow) equals to the number of received messages, the
390	   signaling flow period should be decreased in a way that one more
391	   signaling message fits into a one second interval of the signaling
392	   flow ("s" should be increased by one). But, if the receiver receives
393	   less signaling messages than the number of sent messages, it
394	   indicates that the DUT is beyond its scalability limit. The measured
395	   scalability limit for the maximum signaling load is the number of
396	   signaling messages fitting into one second of the signaling flow in
397	   the previous test sequence ("s"-1).

399	   In order to avoid transient test failures, the whole test must be
400	   repeated at least 30 times and the report should indicate the median
401	   of the measured maximum signaling load values as the result of the
402	   test. Among the test runs, the DUT should be reset to its initial
403	   state.

405	   In the case of this test, there are also signaling primitives which
406	   are not suitable for this kind of scalability tests. However, each
407	   signaling primitive that is suitable for the test should be
408	   investigated just like in the case of the maximum signaling burst
409	   size test.

411	   Reporting format:

413	   The report should indicate the type of the signaling primitive or
414	   signaling primitive pair and the determined maximum signaling load
415	   value.

417	6.3.3 Maximum Session Load

419	   Objective:
420	   Determine the maximum session load, which is the maximum number of
421	   resource reservation sessions that can be maintained simultaneously
422	   in a reservation capable router. The maximum number of session relies
423	   on two architectural components of the DUT. First, the DUT should
424	   have enough memory space to store the attributes of the different
425	   resource reservation sessions. Second, the DUT has to be powerful
426	   enough to maintain all the reservation sessions if they require
427	   actions during the lifetime of the sessions.

429	   In the case of hard-state protocols we cannot speak of reservation
430	   session maintenance, therefore in this situation the available memory
431	   space is the only limit for the session number. Moreover, there are
432	   also resource reservation protocols that handle only the aggregates
433	   of reservation sessions (e.g. Load Control [8]) and do not
434	   distinguish the separate traffic flows referring to reserved
435	   resources. Of course, in this situation there is no session
436	   maintenance either, since there are no reservation sessions, plus the
437	   memory allocation for the aggregates is limited. In this latter case,
438	   the maximum session load is defined to be unlimited and the test can
439	   be skipped.

441	   According to the dual limits of the measurement, the benchmarking
442	   procedure is separated into two tests. The first test investigates
443	   the session number limit due to the memory space, while the second
444	   test explores the reservation session maintenance capability of the
445	   DUT.

447	   The first test is applied to every resource reservation protocol,
448	   which stores reservation sessions separately and not only an
449	   aggregate of them. Resource reservation protocols that are capable
450	   for session aggregation, but still have the capability to handle
451	   separate sessions (e.g. Boomerang [3]) are still subject of this
452	   test.

454	   Procedure:
455	   1. Set up a reservation session in the reservation capable router by
456	   sending the appropriate signaling messages to the DUT.

458	   2. Establish one more reservation session in the DUT using the
459	   appropriate signaling messages. In the case of soft-state protocols,
460	   all the reservation sessions existing in the DUT must be maintained
461	   using refresh messages.

463	   3. Repeat step 2 until the router signs that there is not enough
464	   memory space to establish the new reservation session. In this case,
465	   the test is finished and the maximum memory capacity available to
466	   store the sessions is reached.

468	   Note:
469	   Not all the resource reservation protocols support to signal the
470	   overrun of the maximum memory capacity limit directly. However,
471	   certain behavior of the router may also indicate the memory overrun.

473	   The second test is applied to those resource reservation capable
474	   routers only that run reservation session maintenance mechanisms to
475	   refresh internal states belonging to reservation sessions. Here, we
476	   investigate the DUT whether it is able to cope with the refresh
477	   signaling message handling that shows also the capability to refresh
478	   the internally stored reservation sessions.

480	   Procedure:
481	   1. Set up "n" number of reservation session in the reservation
482	   capable router by sending the appropriate signaling messages to the
483	   DUT. In the first test sequence the number "n" should be set to one.
484	   Beside the reservation session generation, the initiator tester must
485	   also take care of the reservation session refreshes.

487	   2. Capture the refresh signaling messages leaving the DUT for a
488	   specified amount of time ("T") while still maintaining the
489	   established reservations with refresh signaling messages. Time "T"
490	   must be at least as long as the protocol specifies as reservation
491	   time out.

493	   3. Check whether each reservation session is refreshed during the
494	   refresh period that was examined in step 2. The proof of the session
495	   refresh is a leaving refresh signaling message referring to the
496	   corresponding reservation session. If all sessions that were set up
497	   in step 1 are refreshed during step 2, then repeat the test sequence
498	   by increasing the number of reservations by one ("n"+1). However,
499	   when any of the reservations was dropped by the DUT, then the test
500	   sequence should be cancelled and the determined maximum session load
501	   is the number of resource reservation sessions maintained
502	   successfully in the previous test sequence ("n"-1).

504	   In order to avoid transient test failures, the whole test must be
505	   repeated at least 30 times and the report should indicate the median
506	   of the measured maximum signaling load values as the result of the
507	   test. Among the test runs, the DUT should be reset to its initial
508	   state.

510	   Reporting format:
511	   The report should indicate determined maximum session load value,
512	   which is the lowest value between the two test results.

514	   Note:
515	   When the number of reserved sessions grows over a number that counts
516	   to a very high value in the given technology conditions, then the
517	   test can be canceled and the report can state that the resource
518	   reservation protocol implementation performs the maximum number of
519	   reservation sessions over that limit (e.g. "Over 100.000 sessions").

521	   Also note, that testing the DUT in the case of multicast and unicast
522	   scenario, it may result different maximum session load values.

524	6.4 Benchmarking Tests

526	   Benchmarking tests are defined to measure the QoS signaling related
527	   performance metrics on the resource reservation capable router
528	   device.

530	   Since the objective of the benchmarking is to characterize routers
531	   performing resource reservation in real-life situations, therefore
532	   during the tests the DUT must not bump into its scalability limits
533	   determined by the previous test.

535	   Each performance metric is measured when the DUT is under different
536	   router load conditions. The router load is generated and
537	   characterized using combinations of independent load types:

539	   a. Signaling load
540	   b. Session load
541	   c. Premium traffic load
542	   d. Best-effort traffic load

544	   The initiator tester device generates the signaling load on the DUT
545	   by sending a signaling flow to the terminator tester. This signaling
546	   flow is constructed from a specific signaling primitive or a
547	   signaling primitive pair and has the appropriate period parameter.

549	   The session load is generated by the signaling end-points setting up
550	   resource reservation sessions in the DUT via signaling. In the case
551	   of soft-state protocols, the initiator tester device must also
552	   maintain the reservation sessions with refresh signaling messages
553	   periodically.

555	   The initiator tester device generates the premium traffic load by
556	   sending a data traffic flow to the terminator tester across the DUT.
557	   This traffic flow should have dedicated resourced in the DUT, set up
558	   previously using signaling messages. The traffic must consist of
559	   equally spaced and equally sized data packets. Although any transfer
560	   protocol is suitable for traffic generation, it is highly recommended
561	   to use UDP packets, since this data flow is totally controllable,
562	   unlike TCP that uses congestion avoidance mechanism. The premium
563	   traffic must be characterized by its traffic parameters: data packet
564	   size in octets, the calculated bandwidth of the stream in kbps unit
565	   and the transfer protocol type. The data packet size should include
566	   both the payload and the header of the IP packet.

568	   The initiator tester device generates the best-effort traffic load by
569	   sending a data traffic flow (that refers to no resource reservation
570	   sessions) to the terminator tester across the DUT. Any other
571	   attributes of the traffic flow must meet the conditions described
572	   previously in the case of premium traffic load.

574	   Note, that these four load types have influence on each other from
575	   their nature that may spoil the measurements. Therefore, in order to
576	   have accurate results these cross-effects must be minimized during
577	   the benchmarking tests. The signaling load can cause interference
578	   with the session load, when certain signaling messages alter the
579	   number of reservation session in the DUT. To cancel this influence
580	   the signaling flow should contain signaling message pairs, where the
581	   message pairs has opposite effect restoring the changes caused in the
582	   DUT. On the other hand, in the case of soft-state protocols, sessions
583	   must be refreshed by periodically sent signaling messages. Although
584	   refresh messages are used to maintain the reservation sessions, still
585	   they are counted as signaling messages. Furthermore, signaling
586	   messages are realized as data packets. Such way signaling messages
587	   must be taken into account in the traffic flow calculation as well.

589	6.4.1 Performing the Benchmarking Measurements

591	   Objective:
592	   The goal is to take measurements on the DUT running a resource
593	   reservation protocol implementation in the case of different load
594	   conditions. The load on the DUT is always the combination of the four
595	   load components described before.

597	   Procedure:
598	   The procedure is to load the router with each load component at a
599	   desired level and measure the investigated performance metrics. The
600	   load condition on the DUT should not change during the test. Once,
601	   the measurement is complete, repeat the test with different load
602	   distributions.

604	   During the test sequences, in order to avoid transient flow behavior
605	   influencing the measurements, the measurements should begin after a
606	   delay of at least "T" time and after the setup of the common load on
607	   the DUT. The value of "T" depends on the parameters of the load
608	   components and the resource reservation protocol implementations,
609	   but, as a rule of thumb, it should be enough for at least 10 packets
610	   from the traffic flows and 10 signaling messages from the signaling
611	   flow to pass through the DUT and at least one refresh period to
612	   expire in the case of soft-state protocols.

614	   During the measurement of the performance metrics in a practical load
615	   setup, not just one, but 100 measurement samples should be collected.
616	   Normally, the empirical distribution function of the tests is similar
617	   to the curve of a Gaussian distribution, and therefore the modus and
618	   the median are in the same location. Such case, the result of the
619	   test sequence is the median of the samples. In the case of different
620	   shaped empirical distribution functions, the curve must be further
621	   analyzed and the result should describe the curve well enough.

623	   In order to avoid transient test run failures that may cause invalid
624	   results for the entire test, the whole test must be repeated at least
625	   10 times and the report should indicate the median of the measured
626	   values filtering out the extreme results. Moreover, after each test
627	   run the DUT should be reset to its initial state.

629	   In order to perform a complete benchmarking test, every performance
630	   metrics must be measured using signaling flows made of every
631	   applicable signaling primitives or primitive pairs.

633	   Since the test methodology is the same for all the different
634	   performance metric benchmarking procedure, it is also recommended to
635	   perform the measurements for all performance metrics at the same time
636	   in one test cycle.

638	   At first sight, this procedure may look easy to carry out, but in
639	   fact there are lots of difficulties to overcome. The following
640	   guidelines may help in reducing the complexity of creating a
641	   conforming measurement setup.

643	   1. It is reasonable to define different amounts for each load
644	   component (load levels) before benchmarking and then measure the
645	   performance metrics with all possible combinations of these
646	   individual load levels.

648	   2. The number of different load combinations depends on the number of
649	   different load levels defined for a load component. Working with too
650	   much number of load levels is very time-consuming and therefore not
651	   suggested. Instead, there are proposed levels and parameters for each
652	   load component.

654	   The data traffic parameters for the traffic load components have to
655	   be selected from generally used traffic parameters. It is recommended
656	   to choose a packet size of: 54, 64, 128, 256, 1024, 1518, 2048 and
657	   4472 bytes (these are the same values that are used in RFC 2544 that
658	   introduces methodology for benchmarking network interconnect
659	   devices). Additionally, the size of the packets should always remain
660	   below the MTU of the network segment. The packet rate is recommended
661	   to be one of 0, 10, 500, 1000 or 5000 packets/s. Since the number of
662	   combinations for these traffic parameters is still large, the highly
663	   recommended values are 64, 128 and 1024 bytes for the packet size and
664	   10 and 1000 packets/s packet rate. These values adequately represent
665	   a wide range of traffic types common in today's Internet.

667	   The number of session load levels should be at least 4 and it is
668	   recommended to share them equally between 0 and the maximum session
669	   load value.

671	   The number of signaling load levels should be at least 4 as well, and
672	   the actual value of the signaling load is also recommended to be
673	   equally distributed between 0 and the maximum signaling load value.

675	   Zero load level means that the actual load component is not involved
676	   in the router load.

678	   Reporting format:
679	   As the whole report description requires a four-dimension table (four
680	   load components plus the results), which is hard to visualize for a
681	   human being, therefore the results are extracted into ordinary two-
682	   dimensional tables. Each table has two fixed load component
683	   quantities and the other two load component levels are the row and
684	   column for the table. Such way, one set of such tables describe the
685	   benchmarking results for one certain type of signaling flow used in
686	   the generation of the signaling load. Naturally, each different
687	   signaling flow requires separate tables.

689	   Note:
690	   Of course in the case of multicast resource reservation sessions, the
691	   combination number of the different multicast scenarios multiplies
692	   the number benchmarking tests also.

694	7. Acknowledgement

696	   The authors would like to thank the following individuals for their
697	   help in forming this document: Norbert Vegh and Anders Bergsten from
698	   Telia Research AB, Sweden, Krisztian Nemeth, Peter Vary, Balazs Szabo
699	   and Gabor Kovacs from High Speed Networks Laboratory of Budapest
700	   University of Technology and Economics.

702	8. References

704	   [1]  Y. Bernet, et. al., "A Framework For Integrated Services
705	        Operation Over Diffserv Networks", Internet Draft, work in
706	        progress, May 2000, <draft-ietf-issll-diffserv-rsvp-05.txt>

708	   [2]  B. Braden, Ed., et. al., "Resource Reservation Protocol (RSVP) -
709	        Version 1 Functional Specification", RFC 2205, September 1997.

711	   [3]  J. Bergkvist, I. Cselenyi, D. Ahlard, "Boomerang - A Simple
712	        Resource Reservation Framework for IP", Internet Draft, work in
713	        progress, November 2000, <draft-bergkvist-boomerang-framework-
714	        00.txt>

716	   [4]  P. Pan, H. Schulzrinne, "YESSIR: A Simple Reservation Mechanism
717	        for the Internet", Computer Communication Review, on-line
718	        version, volume 29, number 2, April 1999

720	   [5]  L. Delgrossi, L. Berger, "Internet Stream Protocol Version 2
721	        (ST2) Protocol Specification - Version ST2+", RFC 1819, August
722	        1995

724	   [6]  P. White, J. Crowcroft, "A Case for Dynamic Sender-Initiated
725	        Reservation in the Internet", Journal on High Speed Networks,
726	        Special Issue on QoS Routing and Signaling, Vol 7 No 2, 1998

728	   [7]  A. Eriksson, C. Gehrmann, "Robust and Secure Light-weight
729	        Resource Reservation for Unicast IP Traffic", International WS
730	        on QoS'98, IWQoS'98, May 18-20, 1998

732	   [8]  L. Westberg, Z. R. Turanyi, D. Partain, Load Control of Real-
733	        Time Traffic, A Two-bit Resource Allocation Scheme, Internet
734	        Draft, work in progress, April 2000, <draft-westberg-loadcntr-
735	        03.txt>

737	   [9]  G. Feher, I. Cselenyi, A. Korn, "Benchmarking Terminology for
738	        Routers Supporting Resource Reservation", Internet Draft, work
739	        in progress, July 2001, <draft-ietf-bmwg-benchres-term-00.txt>

741	   [10] S. Bradner, J. McQuaid, "Benchmarking Methodology for Network
742	        Interconnect Devices", RFC 2544, March 1999

744	   [11] Boomerang Team, "Boomerang homepage - Benchmarking Tools",
745	        http://boomerang.ttt.bme.hu

747	9. Authors' Addresses:

749	   Gabor Feher
750	   Budapest University of Technology and Economics (BUTE)
751	   Department of Telecommunications and Telematics
752	   Pazmany Peter Setany 1/D, H-1117, Budapest,
753	   Phone: +36 1 463-3110
754	   Email: feher@ttt-atm.ttt.bme.hu

756	   Istvan Cselenyi
757	   Telia Research AB
758	   Vitsandsgatan 9B
759	   SE 12386, Farsta
760	   SWEDEN,
761	   Phone: +46 8 713-8173
762	   Email: istvan.i.cselenyi@telia.se

764	   Andras Korn
765	   Budapest University of Technology and Economics (BUTE)
766	   Institute of Mathematics, Department of Analysis
767	   Egry Jozsef u. 2, H-1111 Budapest, Hungary
768	   Phone: +36 1 463-2475
769	   Email: korn@math.bme.hu