idnits 2.17.1 

draft-vpolak-bmwg-plrsearch-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The abstract seems to contain references ([RFC2544]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 13, 2018) is 1984 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC8174' is defined on line 256, but no explicit
     reference was found in the text


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Benchmarking Working Group                       M. Konstantynowicz, Ed.
3	Internet-Draft                                             V. Polak, Ed.
4	Intended status: Informational                             Cisco Systems
5	Expires: May 17, 2019                                  November 13, 2018

7	   Probabilistic Loss Ratio Search for Packet Throughput (PLRsearch)
8	                     draft-vpolak-bmwg-plrsearch-00

10	Abstract

12	   This document addresses challenges while applying methodologies
13	   described in [RFC2544] to benchmarking NFV (Network Function
14	   Virtualization) over an extended period of time, sometimes referred
15	   to as "soak testing".  More specifically to benchmarking software
16	   based implementations of NFV data planes.  Packet throughput search
17	   approach proposed by this document assumes that system under test is
18	   probabilistic in nature, and not deterministic.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at https://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on May 17, 2019.

37	Copyright Notice

39	   Copyright (c) 2018 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (https://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   2
55	   2.  Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
56	   3.  Poisson Distribution  . . . . . . . . . . . . . . . . . . . .   3
57	   4.  Fitting Function Coefficients Distribution  . . . . . . . . .   4
58	   5.  Algorithm Formulas  . . . . . . . . . . . . . . . . . . . . .   5
59	     5.1.  Integration . . . . . . . . . . . . . . . . . . . . . . .   5
60	     5.2.  Optimizations . . . . . . . . . . . . . . . . . . . . . .   5
61	   6.  Known Implementations . . . . . . . . . . . . . . . . . . . .   5
62	     6.1.  FD.io CSIT Implementation Specifics . . . . . . . . . . .   5
63	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
64	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   6
65	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   6
66	   10. Normative References  . . . . . . . . . . . . . . . . . . . .   6
67	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

69	1.  Motivation

71	   Network providers are interested in throughput a device can sustain.

73	   RFC 2544 assumes loss ratio is given by a deterministic function of
74	   offered load.  But NFV software devices are not deterministic
75	   (enough).  This leads for deterministic algorithms (such as MLRsearch
76	   with single trial) to return results, which when repeated show
77	   relatively high standard deviation, thus making it harder to tell
78	   what "the throughput" actually is.

80	   We need another algorithm, which takes this indeterminism into
81	   account.

83	2.  Model

85	   Each algorithm searches for an answer to a precisely formulated
86	   question.  When the question involves indeterministic systems, it has
87	   to specify probabilities (or prior distributions) which are tied to a
88	   specific probabilistic model.  Different models will have different
89	   number (and meaning) of parameters.  Complicated (but more realistic)
90	   models have many parameters, and the math involved can be very
91	   complicated.  It is better to start with simpler probabilistic model,
92	   and only change it when the output of the simpler algorithm is not
93	   stable or useful enough.

95	   TODO: Refer to packet forwarding terminology, such as "offered load"
96	   and "loss ratio".

98	   TODO: Mention that no packet duplication is expected (or is filtered
99	   out).

101	   TODO: Define critical load and critical region earlier.

103	   This document is focused on algorithms related to packet loss count
104	   only.  No latency (or other information) is taken into account.  For
105	   simplicity, only one type of measurement is considered: dynamically
106	   computed offered load, constant within trial measurement of
107	   predetermined trial duration.

109	   Also, running longer trials (in some situations) could be more
110	   efficient, but in order to perform trial at multiple offered loads
111	   withing critical region, trial durations should be kept as short as
112	   possible.

114	3.  Poisson Distribution

116	   TODO: Give link to more officially published literature about Poisson
117	   distribution.

119	   Note-1: that the algorithm makes an assumption that packet traffic
120	   generator detects duplicate packets on receive detection, and reports
121	   this as an error.

123	   Note-2: Binomial distribution is a better fit compared to Poisson
124	   distribution (acknowledging that the number of packets lost cannot be
125	   higher than the number of packets offered), but the difference tends
126	   to be relevant in loads far above the critical region, so using
127	   Poisson distribution helps the algorithm focus on critical region
128	   better.

130	   When comparing different offered loads, the average loss per second
131	   is assumed to increase, but the (deterministic) function from offered
132	   load into average loss rate is otherwise unknown.

134	   Given a loss target (configurable, by default one packet lost per
135	   second), there is an unknown offered load when the average is exactly
136	   that.  We call that the "critical load".  If critical load seems
137	   higher than maximum offerable load, we should use the maximum
138	   offerable load to make search output more stable.

140	   Of course, there are great many increasing functions.  The offered
141	   load has to be chosen for each trial, and the computed posterior
142	   distribution of critical load can change with each trial result.

144	   To make the space of possible functions more tractable, some other
145	   simplifying assumption is needed.  As the algorithm will be examining
146	   (also) loads close to the critical load, linear approximation to the
147	   function (TODO: name the function) in the critical region is
148	   important.  But as the search algorithm needs to evaluate the
149	   function also far away from the critical region, the approximate
150	   function has to be well- behaved for every positive offered load,
151	   specifically it cannot predict non-positive packet loss rate.

153	   Within this document, "fitting function" is the name for such well-
154	   behaved function which approximates the unknown function in the
155	   critical region.

157	   Results from trials far from the critical region are likely to affect
158	   the critical rate estimate negatively, as the fitting function does
159	   not need to be a good approximation there.  Instead of discarding
160	   some results, or "suppressing" their impact with ad-hoc methods
161	   (other than using Poisson distribution instead of binomial) is not
162	   used, as such methods tend to make the overall search unstable.  We
163	   rely on most of measurements being done (eventually) within the
164	   critical region, and overweighting far-off measurements (eventually)
165	   for well-behaved fitting functions.

167	4.  Fitting Function Coefficients Distribution

169	   To accomodate systems with different behaviours, the fitting function
170	   is expected to have few numeric parameters affecting its shape
171	   (mainly affecting the linear approximation in the critical region).

173	   The general search algorithm can use whatever increasing fitting
174	   function, some specific functions can be described later.

176	   TODO: Describe sigmoid-based and erf-based functions.

178	   It is up to implementer to chose a fitting function and prior
179	   distribution of its parameters.  The rest of this document assumes
180	   each parameter is independently and uniformly distributed over common
181	   interval.  Implementers are to add non-linear transformations into
182	   their fitting functions if their prior is different.

184	   TODO: Move the following sentence into more appropriate place.

186	   Speaking about new trials, each next trial will be done at offered
187	   load equal to the current average of the critical load.

189	   Exit condition is either critical load stdev becoming small enough,
190	   or overal search time becoming long enough.

192	   The algorithm should report both avg and stdev for critical load.  If
193	   the reported averages follow a trend (without reaching equilibrium),
194	   avg and stdev should refer to the equilibrium estibated based on the
195	   trend, not to immediate posterior values.

197	   TODO: Explicitly mention the iterative character of the search.

199	5.  Algorithm Formulas

201	5.1.  Integration

203	   The posterior distributions for fitting function parameters will not
204	   be integrable in general.

206	   The search algorithm utilises the fact that trial measurement takes
207	   some time, so this time can be used for numeric integration (using
208	   suitable method, such as Monte Carlo) to achieve sufficient
209	   precision.

211	5.2.  Optimizations

213	   After enough trials, the posterior distribution will be concentrated
214	   in a narrow area of parameter space.  The integration method could
215	   take advantage of that.

217	   Even in the concentrated area, the likelihood can be quite small, so
218	   the integration algorithm should track the logarithm of the
219	   likelihood, and also avoid underflow errors bu ther means.

221	6.  Known Implementations

223	   The only known working implementatin of Probabilistic Loss Ratio
224	   Search for Packet Throughput is in Linux Foundation FD.io CSIT
225	   project. https://wiki.fd.io/view/CSIT. https://git.fd.io/csit/.

227	6.1.  FD.io CSIT Implementation Specifics

229	   In a sample implemenation in FD.io CSIT project, there is around 0.5
230	   second delay between trials due to restrictons imposed by packet
231	   traffic generator in use (T-Rex), avoiding that delay is out of scope
232	   of this document.

234	   TODO: Describe how the current integration algorithm finds the
235	   concentrated area.

237	7.  IANA Considerations

239	   ..

241	8.  Security Considerations

243	   ..

245	9.  Acknowledgements

247	   ..

249	10.  Normative References

251	   [RFC2544]  Bradner, S. and J. McQuaid, "Benchmarking Methodology for
252	              Network Interconnect Devices", RFC 2544,
253	              DOI 10.17487/RFC2544, March 1999,
254	              <https://www.rfc-editor.org/info/rfc2544>.

256	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
257	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
258	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

260	Authors' Addresses

262	   Maciek Konstantynowicz (editor)
263	   Cisco Systems

265	   Email: mkonstan@cisco.com

267	   Vratko Polak (editor)
268	   Cisco Systems

270	   Email: vrpolak@cisco.com