idnits 2.17.1 

draft-vpolak-plrsearch-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Introduction section.

  ** The abstract seems to contain references ([RFC2544]), which it
     shouldn't.  Please replace those with straight textual mentions of the
     documents in question.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 22, 2018) is 2013 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC8174' is defined on line 269, but no explicit
     reference was found in the text


     Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Benchmarking Working Group                       M. Konstantynowicz, Ed.
3	Internet-Draft                                             V. Polak, Ed.
4	Intended status: Informational                             Cisco Systems
5	Expires: April 25, 2019                                 October 22, 2018

7	   Probabilistic Loss Ratio Search for Packet Throughput (PLRsearch)
8	                       draft-vpolak-plrsearch-00

10	Abstract

12	   This document addresses challenges while applying methodologies
13	   described in [RFC2544] to benchmarking NFV (Network Function
14	   Virtualization) over an extended period of time, sometimes referred
15	   to as "soak testing".  More specifically to benchmarking software
16	   based implementations of NFV data planes.  Packet throughput search
17	   approach proposed by this document assumes that system under test is
18	   probabilistic in nature, and not deterministic.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at https://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on April 25, 2019.

37	Copyright Notice

39	   Copyright (c) 2018 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (https://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of

50	Internet-DrafProbabilistic Loss Ratio Search for Packet Thr October 2018

52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   2
58	   2.  Model . . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
59	   3.  Poisson Distribution  . . . . . . . . . . . . . . . . . . . .   3
60	   4.  Fitting Function Coefficients Distribution  . . . . . . . . .   4
61	   5.  Algorithm Formulas  . . . . . . . . . . . . . . . . . . . . .   5
62	     5.1.  Integration . . . . . . . . . . . . . . . . . . . . . . .   5
63	     5.2.  Optimizations . . . . . . . . . . . . . . . . . . . . . .   5
64	   6.  Known Implementations . . . . . . . . . . . . . . . . . . . .   5
65	     6.1.  FD.io CSIT Implementation Specifics . . . . . . . . . . .   5
66	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   5
67	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .   6
68	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .   6
69	   10. Normative References  . . . . . . . . . . . . . . . . . . . .   6
70	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .   6

72	1.  Motivation

74	   Network providers are interested in throughput a device can sustain.

76	   RFC 2544 assumes loss ratio is given by a deterministic function of
77	   offered load.  But NFV software devices are not deterministic
78	   (enough).  This leads for deterministic algorithms (such as MLRsearch
79	   with single trial) to return results, which when repeated show
80	   relatively high standard deviation, thus making it harder to tell
81	   what "the throughput" actually is.

83	   We need another algorithm, which takes this indeterminism into
84	   account.

86	2.  Model

88	   Each algorithm searches for an answer to a precisely formulated
89	   question.  When the question involves indeterministic systems, it has
90	   to specify probabilities (or prior distributions) which are tied to a
91	   specific probabilistic model.  Different models will have different
92	   number (and meaning) of parameters.  Complicated (but more realistic)
93	   models have many parameters, and the math involved can be very
94	   complicated.  It is better to start with simpler probabilistic model,
95	   and only change it when the output of the simpler algorithm is not
96	   stable or useful enough.

98	   TODO: Refer to packet forwarding terminology, such as "offered load"
99	   and "loss ratio".

101	Internet-DrafProbabilistic Loss Ratio Search for Packet Thr October 2018

103	   TODO: Mention that no packet duplication is expected (or is filtered
104	   out).

106	   TODO: Define critical load and critical region earlier.

108	   This document is focused on algorithms related to packet loss count
109	   only.  No latency (or other information) is taken into account.  For
110	   simplicity, only one type of measurement is considered: dynamically
111	   computed offered load, constant within trial measurement of
112	   predetermined trial duration.

114	   Also, running longer trials (in some situations) could be more
115	   efficient, but in order to perform trial at multiple offered loads
116	   withing critical region, trial durations should be kept as short as
117	   possible.

119	3.  Poisson Distribution

121	   TODO: Give link to more officially published literature about Poisson
122	   distribution.

124	   Note-1: that the algorithm makes an assumption that packet traffic
125	   generator detects duplicate packets on receive detection, and reports
126	   this as an error.

128	   Note-2: Binomial distribution is a better fit compared to Poisson
129	   distribution (acknowledging that the number of packets lost cannot be
130	   higher than the number of packets offered), but the difference tends
131	   to be relevant in loads far above the critical region, so using
132	   Poisson distribution helps the algorithm focus on critical region
133	   better.

135	   When comparing different offered loads, the average loss per second
136	   is assumed to increase, but the (deterministic) function from offered
137	   load into average loss rate is otherwise unknown.

139	   Given a loss target (configurable, by default one packet lost per
140	   second), there is an unknown offered load when the average is exactly
141	   that.  We call that the "critical load".  If critical load seems
142	   higher than maximum offerable load, we should use the maximum
143	   offerable load to make search output more stable.

145	   Of course, there are great many increasing functions.  The offered
146	   load has to be chosen for each trial, and the computed posterior
147	   distribution of critical load can change with each trial result.

149	   To make the space of possible functions more tractable, some other
150	   simplifying assumption is needed.  As the algorithm will be examining

152	Internet-DrafProbabilistic Loss Ratio Search for Packet Thr October 2018

154	   (also) loads close to the critical load, linear approximation to the
155	   function (TODO: name the function) in the critical region is
156	   important.  But as the search algorithm needs to evaluate the
157	   function also far away from the critical region, the approximate
158	   function has to be well- behaved for every positive offered load,
159	   specifically it cannot predict non-positive packet loss rate.

161	   Within this document, "fitting function" is the name for such well-
162	   behaved function which approximates the unknown function in the
163	   critical region.

165	   Results from trials far from the critical region are likely to affect
166	   the critical rate estimate negatively, as the fitting function does
167	   not need to be a good approximation there.  Instead of discarding
168	   some results, or "suppressing" their impact with ad-hoc methods
169	   (other than using Poisson distribution instead of binomial) is not
170	   used, as such methods tend to make the overall search unstable.  We
171	   rely on most of measurements being done (eventually) within the
172	   critical region, and overweighting far-off measurements (eventually)
173	   for well-behaved fitting functions.

175	4.  Fitting Function Coefficients Distribution

177	   To accomodate systems with different behaviours, the fitting function
178	   is expected to have few numeric parameters affecting its shape
179	   (mainly affecting the linear approximation in the critical region).

181	   The general search algorithm can use whatever increasing fitting
182	   function, some specific functions can be described later.

184	   TODO: Describe sigmoid-based and erf-based functions.

186	   It is up to implementer to chose a fitting function and prior
187	   distribution of its parameters.  The rest of this document assumes
188	   each parameter is independently and uniformly distributed over common
189	   interval.  Implementers are to add non-linear transformations into
190	   their fitting functions if their prior is different.

192	   TODO: Move the following sentence into more appropriate place.

194	   Speaking about new trials, each next trial will be done at offered
195	   load equal to the current average of the critical load.

197	   Exit condition is either critical load stdev becoming small enough,
198	   or overal search time becoming long enough.

200	   The algorithm should report both avg and stdev for critical load.  If
201	   the reported averages follow a trend (without reaching equilibrium),

203	Internet-DrafProbabilistic Loss Ratio Search for Packet Thr October 2018

205	   avg and stdev should refer to the equilibrium estibated based on the
206	   trend, not to immediate posterior values.

208	   TODO: Explicitly mention the iterative character of the search.

210	5.  Algorithm Formulas

212	5.1.  Integration

214	   The posterior distributions for fitting function parameters will not
215	   be integrable in general.

217	   The search algorithm utilises the fact that trial measurement takes
218	   some time, so this time can be used for numeric integration (using
219	   suitable method, such as Monte Carlo) to achieve sufficient
220	   precision.

222	5.2.  Optimizations

224	   After enough trials, the posterior distribution will be concentrated
225	   in a narrow area of parameter space.  The integration method could
226	   take advantage of that.

228	   Even in the concentrated area, the likelihood can be quite small, so
229	   the integration algorithm should track the logarithm of the
230	   likelihood, and also avoid underflow errors bu ther means.

232	6.  Known Implementations

234	   The only known working implementatin of Probabilistic Loss Ratio
235	   Search for Packet Throughput is in Linux Foundation FD.io CSIT
236	   project. https://wiki.fd.io/view/CSIT. https://git.fd.io/csit/.

238	6.1.  FD.io CSIT Implementation Specifics

240	   In a sample implemenation in FD.io CSIT project, there is around 0.5
241	   second delay between trials due to restrictons imposed by packet
242	   traffic generator in use (T-Rex), avoiding that delay is out of scope
243	   of this document.

245	   TODO: Describe how the current integration algorithm finds the
246	   concentrated area.

248	7.  IANA Considerations

250	   ..

252	Internet-DrafProbabilistic Loss Ratio Search for Packet Thr October 2018

254	8.  Security Considerations

256	   ..

258	9.  Acknowledgements

260	   ..

262	10.  Normative References

264	   [RFC2544]  Bradner, S. and J. McQuaid, "Benchmarking Methodology for
265	              Network Interconnect Devices", RFC 2544,
266	              DOI 10.17487/RFC2544, March 1999,
267	              <https://www.rfc-editor.org/info/rfc2544>.

269	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
270	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
271	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

273	Authors' Addresses

275	   Maciek Konstantynowicz (editor)
276	   Cisco Systems

278	   Email: mkonstan@cisco.com

280	   Vratko Polak (editor)
281	   Cisco Systems

283	   Email: vrpolak@cisco.com